# Summary

This document contains [DuckDB's official documentation and guides](https://duckdb.org/) in a single-file easy-to-search form.
If you find any issues, please report them [as a GitHub issue](https://github.com/duckdb/duckdb-web/issues).
Contributions are very welcome in the form of [pull requests](https://github.com/duckdb/duckdb-web/pulls).
If you are considering submitting a contribution to the documentation, please consult our [contributor guide](https://github.com/duckdb/duckdb-web/blob/main/CONTRIBUTING.md).

Code repositories:

* DuckDB source code: [github.com/duckdb/duckdb](https://github.com/duckdb/duckdb)
* DuckDB documentation source code: [github.com/duckdb/duckdb-web](https://github.com/duckdb/duckdb-web)

# Connect {#connect}

## Connect {#docs:current:connect:overview}

#### Connect or Create a Database {#docs:current:connect:overview::connect-or-create-a-database}

To use DuckDB, you must first create a connection to a database. The exact syntax varies between the [client APIs](#docs:current:clients:overview) but it typically involves passing an argument to configure persistence.

#### Persistence {#docs:current:connect:overview::persistence}

DuckDB can operate in both persistent mode, where the data is saved to disk, and in in-memory mode, where the entire dataset is stored in the main memory.

> **Tip.** Both persistent and in-memory databases use spilling to disk to facilitate larger-than-memory workloads (i.e., out-of-core-processing).

##### Persistent Database {#docs:current:connect:overview::persistent-database}

To create or open a persistent database, set the path of the database file, e.g., `my_database.duckdb`, when creating the connection.
This path can point to an existing database or to a file that does not yet exist and DuckDB will open or create a database at that location as needed.
The file may have an arbitrary extension, but `.db` or `.duckdb` are two common choices with `.ddb` also used sometimes.

Starting with v0.10, DuckDB's storage format is [backwards-compatible](#docs:current:internals:storage::backward-compatibility), i.e., DuckDB is able to read database files produced by an older version of DuckDB.
For example, DuckDB v0.10 can read and operate on files created by the previous DuckDB version, v0.9.
For more details on DuckDB's storage format, see the [storage page](#docs:current:internals:storage).

##### In-Memory Database {#docs:current:connect:overview::in-memory-database}

DuckDB can operate in in-memory mode. In most clients, this can be activated by passing the special value `:memory:` as the database file or omitting the database file argument. In in-memory mode, no data is persisted to disk, therefore, all data is lost when the process finishes.

## Concurrency {#docs:current:connect:concurrency}

#### Handling Concurrency {#docs:current:connect:concurrency::handling-concurrency}

DuckDB has two configurable options for concurrency:

1. **Read-write mode:** one process can both read and write to the database.
2. **Read-only mode:** multiple processes can read from the database, but no processes can write ([`access_mode = 'READ_ONLY'`](#docs:current:configuration:overview::configuration-reference)).

When using read-write mode, DuckDB supports multiple writer threads using a combination of [MVCC (Multi-Version Concurrency Control)](https://en.wikipedia.org/wiki/Multiversion_concurrency_control) and optimistic concurrency control (see [Concurrency within a Single Process](#::concurrency-within-a-single-process)), but all within that single writer process. The reason for this concurrency model is to allow for the caching of data in RAM for faster analytical queries, rather than going back and forth to disk during each query. It also allows the caching of function pointers, the database catalog, and other items so that subsequent queries on the same connection are faster.

> DuckDB is optimized for bulk operations, so executing many small transactions is not a primary design goal.

#### Concurrency within a Single Process {#docs:current:connect:concurrency::concurrency-within-a-single-process}

DuckDB supports concurrency within a single process according to the following rules. As long as there are no write conflicts, multiple concurrent writes will succeed. Appends will never conflict, even on the same table. Multiple threads can also simultaneously update separate tables or separate subsets of the same table. Optimistic concurrency control comes into play when two threads attempt to edit (update or delete) the same row at the same time. In that situation, the second thread to attempt the edit will fail with a conflict error.

#### Writing to DuckDB from Multiple Processes {#docs:current:connect:concurrency::writing-to-duckdb-from-multiple-processes}

Writing to DuckDB's native database format from multiple processes is not currently supported (see [Handling Concurrency](#::handling-concurrency)).

If you would like to have read-write access to the same database, consider storing it in the [DuckLake format with PostgreSQL as the catalog database](https://ducklake.select/). By coordinating through a central PostgreSQL database, you can achieve concurrent read-writes on the same database.

#### Optimistic Concurrency Control {#docs:current:connect:concurrency::optimistic-concurrency-control}

DuckDB uses [optimistic concurrency control](https://en.wikipedia.org/wiki/Optimistic_concurrency_control), an approach generally considered to be the best fit for read-intensive analytical database systems as it speeds up read query processing. As a result any transactions that modify the same rows at the same time will cause a transaction conflict error:

```console
Transaction conflict: cannot update a table that has been altered!
```

> **Tip.** A common workaround when a transaction conflict is encountered is to rerun the transaction.

#### Troubleshooting {#docs:current:connect:concurrency::troubleshooting}

**File locks.**
DuckDB handles concurrent database access requests using file locks.
Exercise extra caution when accessing a DuckDB database file in a shared directory (e.g., from different operating systems using different file systems or on network attached storage).

# Data Import and Export {#data}

## Importing Data {#docs:current:data:overview}

The first step to using a database system is to insert data into that system.
DuckDB can directly connect to [many popular data sources](#docs:current:data:data_sources) and offers several data ingestion methods that allow you to easily and efficiently fill up the database.
On this page, we provide an overview of these methods so you can select which one is best suited for your use case.

#### `INSERT` Statements {#docs:current:data:overview::insert-statements}

`INSERT` statements are the standard way of loading data into a database system. They are suitable for quick prototyping, but should be avoided for bulk loading as they have significant per-row overhead.

```sql
INSERT INTO people VALUES (1, 'Mark');
```

For a more detailed description, see the [page on the `INSERT` statement](#docs:current:data:insert).

#### File Loading: Relative Paths {#docs:current:data:overview::file-loading-relative-paths}

Use the configuration option [`file_search_path`](#docs:current:configuration:overview::local-configuration-options) to configure to which “root directories” relative paths are expanded on.
If `file_search_path` is not set, the working directory is used as the basis for relative paths.

#### File Formats {#docs:current:data:overview::file-formats}

##### CSV Loading {#docs:current:data:overview::csv-loading}

Data can be efficiently loaded from CSV files using several methods. The simplest is to use the CSV file's name:

```sql
SELECT * FROM 'test.csv';
```

Alternatively, use the [`read_csv` function](#docs:current:data:csv:overview) to pass along options:

```sql
SELECT * FROM read_csv('test.csv', header = false);
```

Or use the [`COPY` statement](#docs:current:sql:statements:copy::copy--from):

```sql
COPY tbl FROM 'test.csv' (HEADER false);
```

It is also possible to read data directly from **compressed CSV files** (e.g., compressed with [gzip](https://www.gzip.org/)):

```sql
SELECT * FROM 'test.csv.gz';
```

DuckDB can create a table from the loaded data using the [`CREATE TABLE ... AS SELECT` statement](#docs:current:sql:statements:create_table::create-table--as-select-ctas):

```sql
CREATE TABLE test AS
    SELECT * FROM 'test.csv';
```

For more details, see the [page on CSV loading](#docs:current:data:csv:overview).

##### Parquet Loading {#docs:current:data:overview::parquet-loading}

Parquet files can be efficiently loaded and queried using their filename:

```sql
SELECT * FROM 'test.parquet';
```

Alternatively, use the [`read_parquet` function](#docs:current:data:parquet:overview):

```sql
SELECT * FROM read_parquet('test.parquet');
```

Or use the [`COPY` statement](#docs:current:sql:statements:copy::copy--from):

```sql
COPY tbl FROM 'test.parquet';
```

For more details, see the [page on Parquet loading](#docs:current:data:parquet:overview).

##### JSON Loading {#docs:current:data:overview::json-loading}

JSON files can be efficiently loaded and queried using their filename:

```sql
SELECT * FROM 'test.json';
```

Alternatively, use the [`read_json_auto` function](#docs:current:data:json:overview):

```sql
SELECT * FROM read_json_auto('test.json');
```

Or use the [`COPY` statement](#docs:current:sql:statements:copy::copy--from):

```sql
COPY tbl FROM 'test.json';
```

For more details, see the [page on JSON loading](#docs:current:data:json:overview).

##### Returning the Filename {#docs:current:data:overview::returning-the-filename}

Since DuckDB v1.3.0, the CSV, JSON and Parquet readers support the `filename` virtual column:

```sql
COPY (FROM (VALUES (42), (43)) t(x)) TO 'test.parquet';
SELECT *, filename FROM 'test.parquet';
```

#### Appender {#docs:current:data:overview::appender}

In several APIs (C, C++, Go, Java and Rust), the [Appender](#docs:current:data:appender) can be used as an alternative for bulk data loading.
This class can be used to efficiently add rows to the database system without using SQL statements.

## Data Sources {#docs:current:data:data_sources}

DuckDB supports several data sources, including file formats, network protocols, and database systems:

* [AWS S3 buckets and storage with S3-compatible API](#docs:current:core_extensions:httpfs:s3api)
* [Azure Blob Storage](#docs:current:core_extensions:azure)
* [Blob files](#docs:current:guides:file_formats:read_file::read_blob)
* [Cloudflare R2](#docs:current:guides:network_cloud_storage:cloudflare_r2_import)
* [CSV](#docs:current:data:csv:overview)
* [Delta Lake](#docs:current:core_extensions:delta)
* [Excel](#docs:current:core_extensions:excel)
* [httpfs](#docs:current:core_extensions:httpfs:https)
* [Iceberg](#docs:current:core_extensions:iceberg:overview)
* [JSON](#docs:current:data:json:overview)
* [MySQL](#docs:current:core_extensions:mysql)
* [Parquet](#docs:current:data:parquet:overview)
* [PostgreSQL](#docs:current:core_extensions:postgres)
* [SQLite](#docs:current:core_extensions:sqlite)
* [Text files](#docs:current:guides:file_formats:read_file::read_text)

## CSV Files {#data:csv}

### CSV Import {#docs:current:data:csv:overview}

#### Examples {#docs:current:data:csv:overview::examples}

The following examples use the [`flights.csv`](https://duckdb.org/data/flights.csv) file.

Read a CSV file from disk, auto-infer options:

```sql
SELECT * FROM 'flights.csv';
```

Use the `read_csv` function with custom options:

```sql
SELECT *
FROM read_csv('flights.csv',
    delim = '|',
    header = true,
    columns = {
        'FlightDate': 'DATE',
        'UniqueCarrier': 'VARCHAR',
        'OriginCityName': 'VARCHAR',
        'DestCityName': 'VARCHAR'
    });
```

Read a CSV from stdin, auto-infer options:

```batch
cat flights.csv | duckdb -c "SELECT * FROM read_csv('/dev/stdin')"
```

Read a CSV file into a table:

```sql
CREATE TABLE ontime (
    FlightDate DATE,
    UniqueCarrier VARCHAR,
    OriginCityName VARCHAR,
    DestCityName VARCHAR
);
COPY ontime FROM 'flights.csv';
```

Alternatively, create a table without specifying the schema manually using a [`CREATE TABLE ... AS SELECT` statement](#docs:current:sql:statements:create_table::create-table--as-select-ctas):

```sql
CREATE TABLE ontime AS
    SELECT * FROM 'flights.csv';
```

We can use the [`FROM`-first syntax](#docs:current:sql:query_syntax:from::from-first-syntax) to omit `SELECT *`.

```sql
CREATE TABLE ontime AS
    FROM 'flights.csv';
```

#### CSV Loading {#docs:current:data:csv:overview::csv-loading}

CSV loading, i.e., importing CSV files to the database, is a very common, and yet surprisingly tricky, task. While CSVs seem simple on the surface, there are a lot of inconsistencies found within CSV files that can make loading them a challenge. CSV files come in many different varieties, are often corrupt, and do not have a schema. The CSV reader needs to cope with all of these different situations.

The DuckDB CSV reader can automatically infer which configuration flags to use by analyzing the CSV file using the [CSV sniffer](https://duckdb.org/2023/10/27/csv-sniffer). This will work correctly in most situations, and should be the first option attempted. In rare situations where the CSV reader cannot figure out the correct configuration it is possible to manually configure the CSV reader to correctly parse the CSV file. See the [auto detection page](#docs:current:data:csv:auto_detection) for more information.

#### Parameters {#docs:current:data:csv:overview::parameters}

Below are parameters that can be passed to the [`read_csv` function](#::csv-functions). Where meaningfully applicable, these parameters can also be passed to the [`COPY` statement](#docs:current:sql:statements:copy::copy-to).

| Name | Description | Type | Default |
|:--|:-----|:-|:-|
| `all_varchar` | Skip type detection and assume all columns are of type `VARCHAR`. This option is only supported by the `read_csv` function. | `BOOL` | `false` |
| `allow_quoted_nulls` | Allow the conversion of quoted values to `NULL` values | `BOOL` | `true` |
| `auto_detect` | [Auto detect CSV parameters](#docs:current:data:csv:auto_detection). | `BOOL` | `true` |
| `auto_type_candidates` | Types that the sniffer uses when detecting column types. The `VARCHAR` type is always included as a fallback option. See [example](#::auto_type_candidates-details). | `TYPE[]` | [default types](#::auto_type_candidates-details) |
| `buffer_size` | Size of the buffers used to read files, in bytes. Must be large enough to hold four lines and can significantly impact performance. | `BIGINT` | `16 * max_line_size` |
| `columns` | Column names and types, as a struct (e.g., `{'col1': 'INTEGER', 'col2': 'VARCHAR'}`). Using this option disables auto detection of the schema. | `STRUCT` | (empty) |
| `comment` | Character used to initiate comments. Lines starting with a comment character (optionally preceded by space characters) are completely ignored; other lines containing a comment character are parsed only up to that point. | `VARCHAR` | (empty) |
| `compression` | Method used to compress CSV files. By default this is detected automatically from the file extension (e.g., `t.csv.gz` will use gzip, `t.csv` will use `none`). Options are `none`, `gzip`, `zstd`. | `VARCHAR` | `auto` |
| `dateformat` | [Date format](#docs:current:sql:functions:dateformat) used when parsing and writing dates. | `VARCHAR` | (empty) |
| `date_format` | Alias for `dateformat`; only available in the `COPY` statement. | `VARCHAR` | (empty) |
| `decimal_separator` | Decimal separator for numbers. | `VARCHAR` | `.` |
| `delim` | Delimiter character used to separate columns within each line, e.g., `,` `;` `\t`. The delimiter character can be up to 4 bytes, e.g., 🦆. Alias for `sep`. | `VARCHAR` | `,` |
| `delimiter` | Alias for `delim`; only available in the `COPY` statement. | `VARCHAR` | `,` |
| `escape` | String used to escape the `quote` character within quoted values. | `VARCHAR` | `"` |
| `encoding` | Encoding used by the CSV file. Options are `utf-8`, `utf-16`, `latin-1`. Not available in the `COPY` statement (which always uses `utf-8`). | `VARCHAR` | `utf-8` |
| `filename` | Add path of the containing file to each row, as a string column named `filename`. Relative or absolute paths are returned depending on the path or glob pattern provided to `read_csv`, not just filenames. Since DuckDB v1.3.0, the `filename` column is added automatically as a virtual column and this option is only kept for compatibility reasons. | `BOOL` | `false` |
| `force_not_null` | Do not match values in the specified columns against the `NULL` string. In the default case where the `NULL` string is empty, this means that empty values are read as zero-length strings instead of `NULL`s. | `VARCHAR[]` | `[]` |
| `header` | First line of each file contains the column names. | `BOOL` | `false` |
| `hive_partitioning` | Interpret the path as a [Hive partitioned path](#docs:current:data:partitioning:hive_partitioning). | `BOOL` | (auto-detected) |
| `ignore_errors` | Ignore any parsing errors encountered. | `BOOL` | `false` |
| `max_line_size` or `maximum_line_size`. Not available in the `COPY` statement. | Maximum line size, in bytes. | `BIGINT` | 2000000 |
| `names` or `column_names` | Column names, as a list. See [example](#docs:current:data:csv:tips::provide-names-if-the-file-does-not-contain-a-header). | `VARCHAR[]` | (empty) |
| `new_line` | New line character(s). Options are `'\r'`,`'\n'`, or `'\r\n'`. The CSV parser only distinguishes between single-character and double-character line delimiters. Therefore, it does not differentiate between `'\r'` and `'\n'`.| `VARCHAR` | (empty) |
| `normalize_names` | Normalize column names. This removes any non-alphanumeric characters from them. Column names that are reserved SQL keywords are prefixed with an underscore character (` _`). | `BOOL` | `false` |
| `null_padding` | Pad the remaining columns on the right with `NULL` values when a line lacks columns. | `BOOL` | `false` |
| `nullstr` or `null` | Strings that represent a `NULL` value. | `VARCHAR` or `VARCHAR[]` | (empty) |
| `parallel` | Use the parallel CSV reader. | `BOOL` | `true` |
| `quote` | String used to quote values. | `VARCHAR` | `"` |
| `rejects_scan` | Name of the [temporary table where information on faulty scans is stored](#docs:current:data:csv:reading_faulty_csv_files::reject-scans). | `VARCHAR` | `reject_scans` |
| `rejects_table` | Name of the [temporary table where information on faulty lines is stored](#docs:current:data:csv:reading_faulty_csv_files::reject-errors). | `VARCHAR` | `reject_errors` |
| `rejects_limit` | Upper limit on the number of faulty lines per file that are recorded in the rejects table. Setting this to `0` means that no limit is applied. | `BIGINT` | `0` |
| `sample_size` | Number of sample lines for [auto detection of parameters](#docs:current:data:csv:auto_detection). | `BIGINT` | 20480 |
| `sep` | Delimiter character used to separate columns within each line, e.g., `,` `;` `\t`. The delimiter character can be up to 4 bytes, e.g., 🦆. Alias for `delim`. | `VARCHAR` | `,` |
| `skip` | Number of lines to skip at the start of each file. | `BIGINT` | 0 |
| `store_rejects` | Skip any lines with errors and store them in the rejects table. | `BOOL` | `false` |
| `strict_mode` | Enforces the strictness level of the CSV Reader. When set to `true`, the parser will throw an error upon encountering any issues. When set to `false`, the parser will attempt to read structurally incorrect files. It is important to note that reading structurally incorrect files can cause ambiguity; therefore, this option should be used with caution. | `BOOL` | `true` |
| `thousands` | Character used to identify thousands separators in numeric values. It must be a single character and different from the `decimal_separator` option.| `VARCHAR` | (empty) |
| `timestampformat` | [Timestamp format](#docs:current:sql:functions:dateformat) used when parsing and writing timestamps. | `VARCHAR` | (empty) |
| `timestamp_format` | Alias for `timestampformat`; only available in the `COPY` statement. | `VARCHAR` | (empty) |
| `types` or `dtypes` or `column_types` | Column types, as either a list (by position) or a struct (by name). See [example](#docs:current:data:csv:tips::override-the-types-of-specific-columns). | `VARCHAR[]` or `STRUCT` | (empty) |
| `union_by_name` | Align columns from different files [by column name](#docs:current:data:multiple_files:combining_schemas::union-by-name) instead of position. Using this option increases memory consumption. | `BOOL` | `false` |

> **Tip.** DuckDB's CSV reader supports `UTF-8` (default), `UTF-16` and `Latin-1` encodings.
> For other encodings, you can either use [the `encodings` extension](#docs:current:core_extensions:encodings)
> or convert them e.g. using the [`iconv` command-line tool](https://linux.die.net/man/1/iconv):
>
> ```batch
> iconv -f ISO-8859-2 -t UTF-8 input.csv > input-utf-8.csv
> ```

##### `auto_type_candidates` Details {#docs:current:data:csv:overview::auto_type_candidates-details}

The `auto_type_candidates` option lets you specify the data types that should be considered by the CSV reader for [column data type detection](#docs:current:data:csv:auto_detection::type-detection).
Usage example:

```sql
SELECT * FROM read_csv('csv_file.csv', auto_type_candidates = ['BIGINT', 'DATE']);
```

The default value for the `auto_type_candidates` option is `['NULL', 'BOOLEAN', 'BIGINT', 'DOUBLE', 'TIME', 'DATE', 'TIMESTAMP', 'VARCHAR']`.

#### CSV Functions {#docs:current:data:csv:overview::csv-functions}

The `read_csv` automatically attempts to figure out the correct configuration of the CSV reader using the [CSV sniffer](https://duckdb.org/2023/10/27/csv-sniffer). It also automatically deduces types of columns. If the CSV file has a header, it will use the names found in that header to name the columns. Otherwise, the columns will be named `column0, column1, column2, ...`. An example with the [`flights.csv`](https://duckdb.org/data/flights.csv) file:

```sql
SELECT * FROM read_csv('flights.csv');
```

| FlightDate | UniqueCarrier | OriginCityName |  DestCityName   |
|------------|---------------|----------------|-----------------|
| 1988-01-01 | AA            | New York, NY   | Los Angeles, CA |
| 1988-01-02 | AA            | New York, NY   | Los Angeles, CA |
| 1988-01-03 | AA            | New York, NY   | Los Angeles, CA |

The path can either be a relative path (relative to the current working directory) or an absolute path.

We can use `read_csv` to create a persistent table as well:

```sql
CREATE TABLE ontime AS
    SELECT * FROM read_csv('flights.csv');
DESCRIBE ontime;
```

|  column_name   | column_type | null | key  | default | extra |
|----------------|-------------|------|------|---------|-------|
| FlightDate     | DATE        | YES  | NULL | NULL    | NULL  |
| UniqueCarrier  | VARCHAR     | YES  | NULL | NULL    | NULL  |
| OriginCityName | VARCHAR     | YES  | NULL | NULL    | NULL  |
| DestCityName   | VARCHAR     | YES  | NULL | NULL    | NULL  |

```sql
SELECT * FROM read_csv('flights.csv', sample_size = 20_000);
```

If we set `delim` / `sep`, `quote`, `escape`, or `header` explicitly, we can bypass the automatic detection of this particular parameter:

```sql
SELECT * FROM read_csv('flights.csv', header = true);
```

Multiple files can be read at once by providing a glob or a list of files. Refer to the [multiple files section](#docs:current:data:multiple_files:overview) for more information.

#### Writing Using the `COPY` Statement {#docs:current:data:csv:overview::writing-using-the-copy-statement}

The [`COPY` statement](#docs:current:sql:statements:copy::copy-to) can be used to load data from a CSV file into a table. This statement has the same syntax as the one used in PostgreSQL. To load the data using the `COPY` statement, we must first create a table with the correct schema (which matches the order of the columns in the CSV file and uses types that fit the values in the CSV file). `COPY` detects the CSV's configuration options automatically.

```sql
CREATE TABLE ontime (
    flightdate DATE,
    uniquecarrier VARCHAR,
    origincityname VARCHAR,
    destcityname VARCHAR
);
COPY ontime FROM 'flights.csv';
SELECT * FROM ontime;
```

| flightdate | uniquecarrier | origincityname |  destcityname   |
|------------|---------------|----------------|-----------------|
| 1988-01-01 | AA            | New York, NY   | Los Angeles, CA |
| 1988-01-02 | AA            | New York, NY   | Los Angeles, CA |
| 1988-01-03 | AA            | New York, NY   | Los Angeles, CA |

If we want to manually specify the CSV format, we can do so using the configuration options of `COPY`.

```sql
CREATE TABLE ontime (flightdate DATE, uniquecarrier VARCHAR, origincityname VARCHAR, destcityname VARCHAR);
COPY ontime FROM 'flights.csv' (DELIMITER '|', HEADER);
SELECT * FROM ontime;
```

#### Reading Faulty CSV Files {#docs:current:data:csv:overview::reading-faulty-csv-files}

DuckDB supports reading erroneous CSV files. For details, see the [Reading Faulty CSV Files page](#docs:current:data:csv:reading_faulty_csv_files).

#### Order Preservation {#docs:current:data:csv:overview::order-preservation}

The CSV reader respects the `preserve_insertion_order` [configuration option](#docs:current:configuration:overview) to [preserve insertion order](#docs:current:sql:dialect:order_preservation).
When `true` (the default), the order of the rows in the result set returned by the CSV reader is the same as the order of the corresponding lines read from the file(s).
When `false`, there is no guarantee that the order is preserved.

#### Writing CSV Files {#docs:current:data:csv:overview::writing-csv-files}

DuckDB can write CSV files using the [`COPY ... TO` statement](#docs:current:sql:statements:copy::copy--to).

### CSV Auto Detection {#docs:current:data:csv:auto_detection}

When using `read_csv`, the system tries to automatically infer how to read the CSV file using the [CSV sniffer](https://duckdb.org/2023/10/27/csv-sniffer).
This step is necessary because CSV files are not self-describing and come in many different dialects. The auto-detection works roughly as follows:

* Detect the dialect of the CSV file (delimiter, quoting rule, escape).
* Detect the types of each of the columns.
* Detect whether or not the file has a header row.

By default the system will try to auto-detect all options. However, options can be individually overridden by the user. This can be useful in case the system makes a mistake. For example, if the delimiter is chosen incorrectly, we can override it by calling the `read_csv` with an explicit delimiter (e.g., `read_csv('file.csv', delim = '|')`).

#### Sample Size {#docs:current:data:csv:auto_detection::sample-size}

The type detection works by operating on a sample of the file.
The size of the sample can be modified by setting the `sample_size` parameter.
The default sample size is 20,480 rows.
Setting the `sample_size` parameter to `-1` means the entire file is read for sampling:

```sql
SELECT * FROM read_csv('my_csv_file.csv', sample_size = -1);
```

The way sampling is performed depends on the type of file. If we are reading from a regular file on disk, we will jump into the file and try to sample from different locations in the file.
If we are reading from a file in which we cannot jump – such as a `.gz` compressed CSV file or `stdin` – samples are taken only from the beginning of the file.

#### `sniff_csv` Function {#docs:current:data:csv:auto_detection::sniff_csv-function}

It is possible to run the CSV sniffer as a separate step using the `sniff_csv(filename)` function, which returns the detected CSV properties as a table with a single row.
The `sniff_csv` function accepts an optional `sample_size` parameter to configure the number of rows sampled.

```sql
FROM sniff_csv('my_file.csv');
FROM sniff_csv('my_file.csv', sample_size = 1000);
```

| Column name        | Description                                   | Example                                                           |
|--------------------|-----------------------------------------------|-------------------------------------------------------------------|
| `Delimiter`        | Delimiter                                     | `,`                                                               |
| `Quote`            | Quote character                               | `"`                                                               |
| `Escape`           | Escape                                        | `\`                                                               |
| `NewLineDelimiter` | New-line delimiter                            | `\r\n`                                                            |
| `Comment`          | Comment character                             | `#`                                                               |
| `SkipRows`         | Number of rows skipped                        | 1                                                                 |
| `HasHeader`        | Whether the CSV has a header                  | `true`                                                            |
| `Columns`          | Column types encoded as a `LIST` of `STRUCT`s | `({'name': 'VARCHAR', 'age': 'BIGINT'})`                          |
| `DateFormat`       | Date format                                   | `%d/%m/%Y`                                                        |
| `TimestampFormat`  | Timestamp Format                              | `%Y-%m-%dT%H:%M:%S.%f`                                            |
| `UserArguments`    | Arguments used to invoke `sniff_csv`          | `sample_size = 1000`                                              |
| `Prompt`           | Prompt ready to be used to read the CSV       | `FROM read_csv('my_file.csv', auto_detect=false, delim=',', ...)` |

##### Prompt {#docs:current:data:csv:auto_detection::prompt}

The `Prompt` column contains a SQL command with the configurations detected by the sniffer.

```sql
-- use line mode in CLI to get the full command
.mode line
SELECT Prompt FROM sniff_csv('my_file.csv');
```

```text
Prompt = FROM read_csv('my_file.csv', auto_detect=false, delim=',', quote='"', escape='"', new_line='\n', skip=0, header=true, columns={...});
```

#### Detection Steps {#docs:current:data:csv:auto_detection::detection-steps}

##### Dialect Detection {#docs:current:data:csv:auto_detection::dialect-detection}

Dialect detection works by attempting to parse the samples using the set of considered values. The detected dialect is the dialect that has (1) a consistent number of columns for each row, and (2) the highest number of columns for each row.

The following dialects are considered for automatic dialect detection.



| Parameters | Considered values     |
|------------|-----------------------|
| `delim`    | `,` `|` `;` `\t`      |
| `quote`    | `"` `'` (empty)       |
| `escape`   | `"` `'` `\` (empty)   |



Consider the example file [`flights.csv`](https://duckdb.org/data/flights.csv):

```csv
FlightDate|UniqueCarrier|OriginCityName|DestCityName
1988-01-01|AA|New York, NY|Los Angeles, CA
1988-01-02|AA|New York, NY|Los Angeles, CA
1988-01-03|AA|New York, NY|Los Angeles, CA
```

In this file, the dialect detection works as follows:

* If we split by a `|` every row is split into `4` columns.
* If we split by a `,` rows 2-4 are split into `3` columns, while the first row is split into `1` column.
* If we split by `;`, every row is split into `1` column.
* If we split by `\t`, every row is split into `1` column.

In this example – the system selects the `|` as the delimiter. All rows are split into the same amount of columns, and there is more than one column per row meaning the delimiter was actually found in the CSV file.

##### Type Detection {#docs:current:data:csv:auto_detection::type-detection}

After detecting the dialect, the system will attempt to figure out the types of each of the columns. Note that this step is only performed if we are calling `read_csv`. In case of the `COPY` statement the types of the table that we are copying into will be used instead.

The type detection works by attempting to convert the values in each column to the candidate types. If the conversion is unsuccessful, the candidate type is removed from the set of candidate types for that column. After all samples have been handled – the remaining candidate type with the highest priority is chosen. The default set of candidate types is given below, in order of priority:



|   Types     |
|-------------|
| NULL        |
| BOOLEAN     |
| TIME        |
| DATE        |
| TIMESTAMP   |
| TIMESTAMPTZ |
| BIGINT      |
| DOUBLE      |
| VARCHAR     |

Everything can be cast to `VARCHAR`, therefore, this type has the lowest priority meaning that all columns are converted to `VARCHAR` as a fallback if they cannot be cast to anything else.
In [`flights.csv`](https://duckdb.org/data/flights.csv) the `FlightDate` column will be cast to a `DATE`, while the other columns will be cast to `VARCHAR`.

The set of candidate types that should be considered by the CSV reader can be specified explicitly using the [`auto_type_candidates`](#docs:current:data:csv:overview::auto_type_candidates-details) option. `VARCHAR` as the fallback type will always be considered as a candidate type whether you specify it or not.

Here are all additional candidate types that may be specified using the `auto_type_candidates` option, in order of priority:



|   Types   |
|-----------|
| TINYINT   |
| SMALLINT  |
| INTEGER   |
| DECIMAL   |
| FLOAT     |

Even though the set of data types that can be automatically detected may appear quite limited, the CSV reader can be configured to read arbitrarily complex types by using the `types`-option described in the next section.

Type detection can be entirely disabled by using the `all_varchar` option. If this is set all columns will remain as `VARCHAR` (as they originally occur in the CSV file).

Note that using quote characters vs. no quote characters (e.g., `"42"` and `42`) does not make a difference for type detection.
Quoted fields will not be converted to `VARCHAR`, instead, the sniffer will try to find the type candidate with the highest priority.

###### Overriding Type Detection {#docs:current:data:csv:auto_detection::overriding-type-detection}

The detected types can be individually overridden using the `types` option. This option takes either of two options:

* A list of type definitions (e.g., `types = ['INTEGER', 'VARCHAR', 'DATE']`). This overrides the types of the columns in-order of occurrence in the CSV file.
* Alternatively, `types` takes a `name` → `type` map which overrides options of individual columns (e.g., `types = {'quarter': 'INTEGER'}`).

The set of column types that may be specified using the `types` option is not as limited as the types available for the `auto_type_candidates` option: any valid type definition is acceptable to the `types`-option. (To get a valid type definition, use the [`typeof()`](#docs:current:sql:functions:utility::typeofexpression) function, or use the `column_type` column of the [`DESCRIBE`](#docs:current:guides:meta:describe) result.)

The `sniff_csv()` function's `Column` field returns a struct with column names and types that can be used as a basis for overriding types.

#### Header Detection {#docs:current:data:csv:auto_detection::header-detection}

Header detection works by checking if the candidate header row deviates from the other rows in the file in terms of types. For example, in [`flights.csv`](https://duckdb.org/data/flights.csv), we can see that the header row consists of only `VARCHAR` columns – whereas the values contain a `DATE` value for the `FlightDate` column. As such – the system defines the first row as the header row and extracts the column names from the header row.

In files that do not have a header row, the column names are generated as `column0`, `column1`, etc.

Note that headers cannot be detected correctly if all columns are of type `VARCHAR` – as in this case the system cannot distinguish the header row from the other rows in the file. In this case, the system assumes the file has a header. This can be overridden by setting the `header` option to `false`.

##### Dates and Timestamps {#docs:current:data:csv:auto_detection::dates-and-timestamps}

DuckDB supports the [ISO 8601 format](https://en.wikipedia.org/wiki/ISO_8601) by default for timestamps, dates and times. Unfortunately, not all dates and times are formatted using this standard. For that reason, the CSV reader also supports the `dateformat` and `timestampformat` options. Using this format the user can specify a [format string](#docs:current:sql:functions:dateformat) that specifies how the date or timestamp should be read.

As part of the auto-detection, the system tries to figure out if dates and times are stored in a different representation. This is not always possible – as there are ambiguities in the representation. For example, the date `01-02-2000` can be parsed as either January 2nd or February 1st. Often these ambiguities can be resolved. For example, if we later encounter the date `21-02-2000` then we know that the format must have been `DD-MM-YYYY`. `MM-DD-YYYY` is no longer possible as there is no 21st month.

If the ambiguities cannot be resolved by looking at the data the system has a list of preferences for which date format to use. If the system chooses incorrectly, the user can specify the `dateformat` and `timestampformat` options manually.

The system considers the following formats for dates (` dateformat`). Higher entries are chosen over lower entries in case of ambiguities (i.e., ISO 8601 is preferred over `MM-DD-YYYY`).



| dateformat |
|------------|
| ISO 8601   |
| %y-%m-%d   |
| %Y-%m-%d   |
| %d-%m-%y   |
| %d-%m-%Y   |
| %m-%d-%y   |
| %m-%d-%Y   |

The system considers the following formats for timestamps (` timestampformat`). Higher entries are chosen over lower entries in case of ambiguities.



|   timestampformat    |
|----------------------|
| ISO 8601             |
| %y-%m-%d %H:%M:%S    |
| %Y-%m-%d %H:%M:%S    |
| %d-%m-%y %H:%M:%S    |
| %d-%m-%Y %H:%M:%S    |
| %m-%d-%y %I:%M:%S %p |
| %m-%d-%Y %I:%M:%S %p |
| %Y-%m-%d %H:%M:%S.%f |

### Reading Faulty CSV Files {#docs:current:data:csv:reading_faulty_csv_files}

CSV files can come in all shapes and forms, with some presenting many errors that make the process of cleanly reading them inherently difficult. To help users read these files, DuckDB supports detailed error messages, the ability to skip faulty lines and the possibility of storing faulty lines in a temporary table to assist users with a data cleaning step.

#### Structural Errors {#docs:current:data:csv:reading_faulty_csv_files::structural-errors}

DuckDB supports the detection and skipping of several different structural errors. In this section, we will go over each error with an example.
For the examples, consider the following table:

```sql
CREATE TABLE people (name VARCHAR, birth_date DATE);
```

DuckDB detects the following error types:

* `CAST`: Casting errors occur when a column in the CSV file cannot be cast to the expected schema value. For example, the line `Pedro,The 90s` would cause an error since the string `The 90s` cannot be cast to a date.
* `MISSING COLUMNS`: This error occurs if a line in the CSV file has fewer columns than expected. In our example, we expect two columns; therefore, a row with just one value, e.g., `Pedro`, would cause this error.
* `TOO MANY COLUMNS`: This error occurs if a line in the CSV has more columns than expected. In our example, any line with more than two columns would cause this error, e.g., `Pedro,01-01-1992,pdet`.
* `UNQUOTED VALUE`: Quoted values in CSV lines must always be unquoted at the end; if a quoted value remains quoted throughout, it will cause an error. For example, assuming our scanner uses `quote='"'`, the line `"pedro"holanda, 01-01-1992` would present an unquoted value error.
* `LINE SIZE OVER MAXIMUM`: DuckDB has a parameter that sets the maximum line size a CSV file can have, which by default is set to 2,097,152 bytes. Assuming our scanner is set to `max_line_size = 25`, the line `Pedro Holanda, 01-01-1992` would produce an error, as it exceeds 25 bytes.
* `INVALID ENCODING`: DuckDB supports UTF-8 strings, UTF-16 and Latin-1 encodings. Lines containing other characters will produce an error. For example, the line `pedro\xff\xff, 01-01-1992` would be problematic.

##### Anatomy of a CSV Error {#docs:current:data:csv:reading_faulty_csv_files::anatomy-of-a-csv-error}

By default, when performing a CSV read, if any structural errors are encountered, the scanner will immediately stop the scanning process and throw the error to the user.
These errors are designed to provide as much information as possible to allow users to evaluate them directly in their CSV file.

This is an example for a full error message:

```console
Conversion Error:
CSV Error on Line: 5648
Original Line: Pedro,The 90s
Error when converting column "birth_date". date field value out of range: "The 90s", expected format is (DD-MM-YYYY)

Column date is being converted as type DATE
This type was auto-detected from the CSV file.
Possible solutions:
* Override the type for this column manually by setting the type explicitly, e.g., types={'birth_date': 'VARCHAR'}
* Set the sample size to a larger value to enable the auto-detection to scan more values, e.g., sample_size=-1
* Use a COPY statement to automatically derive types from an existing table.

  file= people.csv
  delimiter = , (Auto-Detected)
  quote = " (Auto-Detected)
  escape = " (Auto-Detected)
  new_line = \r\n (Auto-Detected)
  header = true (Auto-Detected)
  skip_rows = 0 (Auto-Detected)
  date_format = (DD-MM-YYYY) (Auto-Detected)
  timestamp_format =  (Auto-Detected)
  null_padding=0
  sample_size=20480
  ignore_errors=false
  all_varchar=0
```

The first block provides us with information regarding where the error occurred, including the line number, the original CSV line and which field was problematic:

```console
Conversion Error:
CSV Error on Line: 5648
Original Line: Pedro,The 90s
Error when converting column "birth_date". date field value out of range: "The 90s", expected format is (DD-MM-YYYY)
```

The second block provides us with potential solutions:

```console
Column date is being converted as type DATE
This type was auto-detected from the CSV file.
Possible solutions:
* Override the type for this column manually by setting the type explicitly, e.g., types={'birth_date': 'VARCHAR'}
* Set the sample size to a larger value to enable the auto-detection to scan more values, e.g., sample_size=-1
* Use a COPY statement to automatically derive types from an existing table.
```

Since the type of this field was auto-detected, it suggests defining the field as a `VARCHAR` or fully utilizing the dataset for type detection.

Finally, the last block presents some of the options used in the scanner that can cause errors, indicating whether they were auto-detected or manually set by the user.

#### Using the `ignore_errors` Option {#docs:current:data:csv:reading_faulty_csv_files::using-the-ignore_errors-option}

There are cases where CSV files may have multiple structural errors, and users simply wish to skip these and read the correct data. Reading erroneous CSV files is possible by utilizing the `ignore_errors` option. With this option set, rows containing data that would otherwise cause the CSV parser to generate an error will be ignored. In our example, we will demonstrate a CAST error, but note that any of the errors described in our Structural Error section would cause the faulty line to be skipped.

For example, consider the following CSV file, [`faulty.csv`](https://duckdb.org/data/faulty.csv):

```csv
Pedro,31
Oogie Boogie, three
```

If you read the CSV file, specifying that the first column is a `VARCHAR` and the second column is an `INTEGER`, loading the file would fail, as the string `three` cannot be converted to an `INTEGER`.

For example, the following query will throw a casting error.

```sql
FROM read_csv('faulty.csv', columns = {'name': 'VARCHAR', 'age': 'INTEGER'});
```

However, with `ignore_errors` set, the second row of the file is skipped, outputting only the complete first row. For example:

```sql
FROM read_csv(
    'faulty.csv',
    columns = {'name': 'VARCHAR', 'age': 'INTEGER'},
    ignore_errors = true
);
```

Outputs:

| name  | age |
|-------|-----|
| Pedro | 31  |

One should note that the CSV Parser is affected by the projection pushdown optimization. Hence, if we were to select only the name column, both rows would be considered valid, as the casting error on the age would never occur. For example:

```sql
SELECT name
FROM read_csv('faulty.csv', columns = {'name': 'VARCHAR', 'age': 'INTEGER'});
```

Outputs:

|     name     |
|--------------|
|     Pedro    |
| Oogie Boogie |

#### Retrieving Faulty CSV Lines {#docs:current:data:csv:reading_faulty_csv_files::retrieving-faulty-csv-lines}

Being able to read faulty CSV files is important, but for many data cleaning operations, it is also necessary to know exactly which lines are corrupted and what errors the parser discovered on them. For scenarios like these, it is possible to use DuckDB's CSV Rejects Table feature.
By default, this feature creates two temporary tables.

1. `reject_scans`: Stores information regarding the parameters of the CSV Scanner.
2. `reject_errors`: Stores information regarding each CSV faulty line and in which CSV Scanner they happened.

Note that any of the errors described in our Structural Error section will be stored in the rejects tables. Also, if a line has multiple errors, multiple entries will be stored for the same line, one for each error.

##### Reject Scans {#docs:current:data:csv:reading_faulty_csv_files::reject-scans}

The CSV Reject Scans Table returns the following information:

| Column name | Description | Type |
|:--|:-----|:-|
| `scan_id` | The internal ID used in DuckDB to represent that scanner | `UBIGINT` |
| `file_id` | A scanner might happen over multiple files, so the file_id represents a unique file in a scanner | `UBIGINT` |
| `file_path` | The file path | `VARCHAR` |
| `delimiter` | The delimiter used e.g., ; | `VARCHAR` |
| `quote` | The quote used e.g., " | `VARCHAR` |
| `escape` | The escape used e.g., " | `VARCHAR` |
| `newline_delimiter` | The newline delimiter used e.g., \r\n | `VARCHAR` |
| `skip_rows` | If any rows were skipped from the top of the file | `UINTEGER` |
| `has_header` | If the file has a header | `BOOLEAN` |
| `columns` | The schema of the file (i.e., all column names and types) | `VARCHAR` |
| `date_format` | The format used for date types | `VARCHAR` |
| `timestamp_format` | The format used for timestamp types| `VARCHAR` |
| `user_arguments` | Any extra scanner parameters manually set by the user | `VARCHAR` |

##### Reject Errors {#docs:current:data:csv:reading_faulty_csv_files::reject-errors}

The CSV Reject Errors Table returns the following information:

| Column name | Description | Type |
|:--|:-----|:-|
| `scan_id` | The internal ID used in DuckDB to represent that scanner, used to join with reject scans tables | `UBIGINT` |
| `file_id` | The file_id represents a unique file in a scanner, used to join with reject scans tables | `UBIGINT` |
| `line` | Line number, from the CSV File, where the error occurred. | `UBIGINT` |
| `line_byte_position` | Byte Position of the start of the line, where the error occurred. | `UBIGINT` |
| `byte_position` | Byte Position where the error occurred. | `UBIGINT` |
| `column_idx` | If the error happens in a specific column, the index of the column. | `UBIGINT` |
| `column_name` | If the error happens in a specific column, the name of the column. | `VARCHAR` |
| `error_type` | The type of the error that happened. | `ENUM` |
| `csv_line` | The original CSV line. | `VARCHAR` |
| `error_message` | The error message produced by DuckDB. | `VARCHAR` |

#### Parameters {#docs:current:data:csv:reading_faulty_csv_files::parameters}

The parameters listed below are used in the `read_csv` function to configure the CSV Rejects Table.

| Name | Description | Type | Default |
|:--|:-----|:-|:-|
| `store_rejects` | If set to true, any errors in the file will be skipped and stored in the default rejects temporary tables.| `BOOLEAN` | False |
| `rejects_scan` | Name of a temporary table where the information of the scan information of faulty CSV file are stored. | `VARCHAR` | reject_scans |
| `rejects_table` | Name of a temporary table where the information of the faulty lines of a CSV file are stored. | `VARCHAR` | reject_errors |
| `rejects_limit` | Upper limit on the number of faulty records from a CSV file that will be recorded in the rejects table. 0 is used when no limit should be applied. | `BIGINT` | 0 |

To store the information of the faulty CSV lines in a rejects table, the user must simply set the `store_rejects` option to true. For example:

```sql
FROM read_csv(
    'faulty.csv',
    columns = {'name': 'VARCHAR', 'age': 'INTEGER'},
    store_rejects = true
);
```

You can then query both the `reject_scans` and `reject_errors` tables, to retrieve information about the rejected tuples. For example:

```sql
FROM reject_scans;
```

Outputs:



| scan_id | file_id |             file_path             | delimiter | quote | escape | newline_delimiter | skip_rows | has_header |               columns                | date_format | timestamp_format |   user_arguments   |
|---------|---------|-----------------------------------|-----------|-------|--------|-------------------|-----------|-----------:|--------------------------------------|-------------|------------------|--------------------|
| 5       | 0       | faulty.csv | ,         | "     | "      | \n                | 0         | false      | {'name': 'VARCHAR','age': 'INTEGER'} |             |                  | store_rejects=true |

```sql
FROM reject_errors;
```

Outputs:



| scan_id | file_id | line | line_byte_position | byte_position | column_idx | column_name | error_type |      csv_line       |                                   error_message                                    |
|---------|---------|------|--------------------|---------------|------------|-------------|------------|---------------------|------------------------------------------------------------------------------------|
| 5       | 0       | 2    | 10                 | 23            | 2          | age         | CAST       | Oogie Boogie, three | Error when converting column "age". Could not convert string " three" to 'INTEGER' |

### CSV Import Tips {#docs:current:data:csv:tips}

Below is a collection of tips to help when attempting to import complex CSV files. In the examples, we use the [`flights.csv`](https://duckdb.org/data/flights.csv) file.

#### Override the Header Flag if the Header Is Not Correctly Detected {#docs:current:data:csv:tips::override-the-header-flag-if-the-header-is-not-correctly-detected}

If a file contains only string columns the `header` auto-detection might fail. Provide the `header` option to override this behavior.

```sql
SELECT * FROM read_csv('flights.csv', header = true);
```

#### Provide Names if the File Does Not Contain a Header {#docs:current:data:csv:tips::provide-names-if-the-file-does-not-contain-a-header}

If the file does not contain a header, names will be auto-generated by default. You can provide your own names with the `names` option.

```sql
SELECT * FROM read_csv('flights.csv', names = ['DateOfFlight', 'CarrierName']);
```

#### Override the Types of Specific Columns {#docs:current:data:csv:tips::override-the-types-of-specific-columns}

The `types` flag can be used to override types of only certain columns by providing a struct of `name` → `type` mappings.

```sql
SELECT * FROM read_csv('flights.csv', types = {'FlightDate': 'DATE'});
```

#### Use `COPY` When Loading Data into a Table {#docs:current:data:csv:tips::use-copy-when-loading-data-into-a-table}

The [`COPY` statement](#docs:current:sql:statements:copy) copies data directly into a table. The CSV reader uses the schema of the table instead of auto-detecting types from the file. This speeds up the auto-detection, and prevents mistakes from being made during auto-detection.

```sql
COPY tbl FROM 'test.csv';
```

#### Use `union_by_name` When Loading Files with Different Schemas {#docs:current:data:csv:tips::use-union_by_name-when-loading-files-with-different-schemas}

The `union_by_name` option can be used to unify the schema of files that have different or missing columns. For files that do not have certain columns, `NULL` values are filled in.

```sql
SELECT * FROM read_csv('flights*.csv', union_by_name = true);
```

To load data into _an existing table_ where the table has more columns than the CSV file, you can use the [`INSERT INTO ... BY NAME` clause](#docs:current:sql:statements:insert::insert-into--by-name):

```sql
INSERT INTO tbl BY NAME
    SELECT * FROM read_csv('input.csv');
```

#### Sample Size {#docs:current:data:csv:tips::sample-size}

If the [CSV sniffer](https://duckdb.org/2023/10/27/csv-sniffer) is not detecting the correct type, try increasing the sample size.
The option `sample_size = -1` forces the sniffer to read the entire file:

```sql
SELECT * FROM read_csv('my_csv_file.csv', sample_size = -1);
```

## JSON Files {#data:json}

### JSON Overview {#docs:current:data:json:overview}

DuckDB supports SQL functions that are useful for reading values from existing JSON and creating new JSON data.
JSON is supported with the `json` extension which is shipped with most DuckDB distributions and is auto-loaded on first use.
If you would like to install or load it manually, please consult the [“Installing and Loading” page](#docs:current:data:json:installing_and_loading).

#### About JSON {#docs:current:data:json:overview::about-json}

JSON is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other serializable values).
While it is not a very efficient format for tabular data, it is very commonly used, especially as a data interchange format.

#### JSONPath and JSON Pointer Syntax {#docs:current:data:json:overview::jsonpath-and-json-pointer-syntax}

DuckDB implements multiple interfaces for JSON extraction: [JSONPath](https://goessner.net/articles/JsonPath/) and [JSON Pointer](https://datatracker.ietf.org/doc/html/rfc6901). Both of them work with the arrow operator (` ->`) and the `json_extract` function call.

Note that DuckDB only supports lookups in JSONPath, i.e., extracting fields with `.<key>` or array elements with `[<index>]`.
Arrays can be indexed from the back and both approaches support the wildcard `*`.
DuckDB does _not_ support the full JSONPath syntax because SQL is readily available for any further transformations.

> It's best to pick either the JSONPath or the JSON Pointer syntax and use it in your entire application.



#### Indexing {#docs:current:data:json:overview::indexing}

> **Warning.** Following [PostgreSQL's conventions](#docs:current:sql:dialect:postgresql_compatibility), DuckDB uses 1-based indexing for its [`ARRAY`](#docs:current:sql:data_types:array) and [`LIST`](#docs:current:sql:data_types:list) data types but [0-based indexing for the JSON data type](https://www.postgresql.org/docs/17/functions-json.html#FUNCTIONS-JSON-PROCESSING).

#### Examples {#docs:current:data:json:overview::examples}

##### Loading JSON {#docs:current:data:json:overview::loading-json}

Read a JSON file from disk, auto-infer options:

```sql
SELECT * FROM 'todos.json';
```

Use the `read_json` function with custom options:

```sql
SELECT *
FROM read_json('todos.json',
               format = 'array',
               columns = {userId: 'UBIGINT',
                          id: 'UBIGINT',
                          title: 'VARCHAR',
                          completed: 'BOOLEAN'});
```

Read a JSON file from stdin, auto-infer options:

```batch
cat data/json/todos.json | duckdb -c "SELECT * FROM read_json('/dev/stdin')"
```

Read a JSON file into a table:

```sql
CREATE TABLE todos (userId UBIGINT, id UBIGINT, title VARCHAR, completed BOOLEAN);
COPY todos FROM 'todos.json' (AUTO_DETECT true);
```

Alternatively, create a table without specifying the schema manually with a [`CREATE TABLE ... AS SELECT` clause](#docs:current:sql:statements:create_table::create-table--as-select-ctas):

```sql
CREATE TABLE todos AS
    SELECT * FROM 'todos.json';
```

Since DuckDB v1.3.0, the JSON reader returns the `filename` virtual column:

```sql
SELECT filename, *
FROM 'todos-*.json';
```

##### Writing JSON {#docs:current:data:json:overview::writing-json}

Write the result of a query to a JSON file:

```sql
COPY (SELECT * FROM todos) TO 'todos.json';
```

##### JSON Data Type {#docs:current:data:json:overview::json-data-type}

Create a table with a column for storing JSON data and insert data into it:

```sql
CREATE TABLE example (j JSON);
INSERT INTO example VALUES
    ('{ "family": "anatidae", "species": [ "duck", "goose", "swan", null ] }');
```

##### Retrieving JSON Data {#docs:current:data:json:overview::retrieving-json-data}

Retrieve the family key's value:

```sql
SELECT j.family FROM example;
```

```text
"anatidae"
```

Extract the family key's value with a [JSONPath](https://goessner.net/articles/JsonPath/) expression as `JSON`:

```sql
SELECT j->'$.family' FROM example;
```

```text
"anatidae"
```

Extract the family key's value with a [JSONPath](https://goessner.net/articles/JsonPath/) expression as a `VARCHAR`:

```sql
SELECT j->>'$.family' FROM example;
```

```text
anatidae
```

##### Using Quotes for Special Characters {#docs:current:data:json:overview::using-quotes-for-special-characters}

JSON object keys that contain the special `[` and `.` characters can be used by surrounding them with double quotes (` "`):

```sql
SELECT '{"d[u]._\"ck":42}'->'$."d[u]._\"ck"' AS v;
```

```text
42
```

### Creating JSON {#docs:current:data:json:creating_json}

#### JSON Creation Functions {#docs:current:data:json:creating_json::json-creation-functions}

The following functions are used to create JSON.

| Function | Description |
|:--|:----|
| `to_json(any)` | Create `JSON` from a value of `any` type. Our `LIST` is converted to a JSON array, and our `STRUCT` and `MAP` are converted to a JSON object. |
| `json_quote(any)` | Alias for `to_json`. |
| `array_to_json(list)` | Alias for `to_json` that only accepts `LIST`. |
| `row_to_json(list)` | Alias for `to_json` that only accepts `STRUCT`. |
| `json_array(any, ...)` | Create a JSON array from the values in the argument lists. |
| `json_object(key, value, ...)` | Create a JSON object from `key`, `value` pairs in the argument list. Requires an even number of arguments. |
| `json_merge_patch(json, json)` | Merge two JSON documents together. |

Examples:

```sql
SELECT to_json('duck');
```

```text
"duck"
```

```sql
SELECT to_json([1, 2, 3]);
```

```text
[1,2,3]
```

```sql
SELECT to_json({duck : 42});
```

```text
{"duck":42}
```

```sql
SELECT to_json(MAP(['duck'], [42]));
```

```text
{"duck":42}
```

```sql
SELECT json_array('duck', 42, 'goose', 123);
```

```text
["duck",42,"goose",123]
```

```sql
SELECT json_object('duck', 42, 'goose', 123);
```

```text
{"duck":42,"goose":123}
```

```sql
SELECT json_merge_patch('{"duck": 42}', '{"goose": 123}');
```

```text
{"goose":123,"duck":42}
```

### Loading JSON {#docs:current:data:json:loading_json}

The DuckDB JSON reader can automatically infer which configuration flags to use by analyzing the JSON file. This will work correctly in most situations, and should be the first option attempted. In rare situations where the JSON reader cannot figure out the correct configuration, it is possible to manually configure the JSON reader to correctly parse the JSON file.

#### The `read_json` Function {#docs:current:data:json:loading_json::the-read_json-function}

The `read_json` is the simplest method of loading JSON files: it automatically attempts to figure out the correct configuration of the JSON reader. It also automatically deduces types of columns.
In the following example, we use the [`todos.json`](https://duckdb.org/data/json/todos.json) file,

```sql
SELECT *
FROM read_json('todos.json')
LIMIT 5;
```

| userId | id |                              title                              | completed |
|-------:|---:|-----------------------------------------------------------------|-----------|
| 1      | 1  | delectus aut autem                                              | false     |
| 1      | 2  | quis ut nam facilis et officia qui                              | false     |
| 1      | 3  | fugiat veniam minus                                             | false     |
| 1      | 4  | et porro tempora                                                | true      |
| 1      | 5  | laboriosam mollitia et enim quasi adipisci quia provident illum | false     |

We can use `read_json` to create a persistent table as well:

```sql
CREATE TABLE todos AS
    SELECT *
    FROM read_json('todos.json');
DESCRIBE todos;
```



| column_name | column_type | null | key  | default | extra |
|-------------|-------------|------|------|---------|-------|
| userId      | UBIGINT     | YES  | NULL | NULL    | NULL  |
| id          | UBIGINT     | YES  | NULL | NULL    | NULL  |
| title       | VARCHAR     | YES  | NULL | NULL    | NULL  |
| completed   | BOOLEAN     | YES  | NULL | NULL    | NULL  |

If we specify types for a subset of columns, `read_json` excludes columns that we don't specify:

```sql
SELECT *
FROM read_json(
        'todos.json',
        columns = {userId: 'UBIGINT', completed: 'BOOLEAN'}
    )
LIMIT 5;
```

Note that only the `userId` and `completed` columns are shown:

| userId | completed |
|-------:|----------:|
| 1      | false     |
| 1      | false     |
| 1      | false     |
| 1      | true      |
| 1      | false     |

Multiple files can be read at once by providing a glob or a list of files. Refer to the [multiple files section](#docs:current:data:multiple_files:overview) for more information.

#### Functions for Reading JSON Objects {#docs:current:data:json:loading_json::functions-for-reading-json-objects}

The following table functions are used to read JSON:

| Function | Description |
|:---|:---|
| `read_json_objects(filename)`      | Read a JSON object from `filename`, where `filename` can also be a list of files or a glob pattern. |
| `read_ndjson_objects(filename)`    | Alias for `read_json_objects` with the parameter `format` set to `newline_delimited`. |
| `read_json_objects_auto(filename)` | Alias for `read_json_objects` with the parameter `format` set to `auto` . |

##### Parameters {#docs:current:data:json:loading_json::parameters}

These functions have the following parameters:

| Name | Description | Type | Default |
|:--|:-----|:-|:-|
| `compression` | The compression type for the file. By default this will be detected automatically from the file extension (e.g., `t.json.gz` will use gzip, `t.json` will use none). Options are `none`, `gzip`, `zstd` and `auto_detect`. | `VARCHAR` | `auto_detect` |
| `filename` | Whether or not an extra `filename` column should be included in the result. Since DuckDB v1.3.0, the `filename` column is added automatically as a virtual column and this option is only kept for compatibility reasons. | `BOOL` | `false` |
| `format` | Can be one of `auto`, `unstructured`, `newline_delimited` and `array`. | `VARCHAR` | `array` |
| `hive_partitioning` | Whether or not to interpret the path as a [Hive partitioned path](#docs:current:data:partitioning:hive_partitioning). | `BOOL` | (auto-detected) |
| `ignore_errors` | Whether to ignore parse errors (only possible when `format` is `newline_delimited`). | `BOOL` | `false` |
| `maximum_sample_files` | The maximum number of JSON files sampled for auto-detection. | `BIGINT` | `32` |
| `maximum_object_size` | The maximum size of a JSON object (in bytes). | `UINTEGER` | `16777216` |

The `format` parameter specifies how to read the JSON from a file.
With `unstructured`, the top-level JSON is read, e.g., for `birds.json`:

```json
{
  "duck": 42
}
{
  "goose": [1, 2, 3]
}
```

```sql
FROM read_json_objects('birds.json', format = 'unstructured');
```

will result in two objects being read:

```text
┌──────────────────────────────┐
│             json             │
│             json             │
├──────────────────────────────┤
│ {\n    "duck": 42\n}         │
│ {\n    "goose": [1, 2, 3]\n} │
└──────────────────────────────┘
```

With `newline_delimited`, [NDJSON](https://github.com/ndjson/ndjson-spec) is read, where each JSON is separated by a newline (` \n`), e.g., for `birds-nd.json`:

```json
{"duck": 42}
{"goose": [1, 2, 3]}
```

```sql
FROM read_json_objects('birds-nd.json', format = 'newline_delimited');
```

will also result in two objects being read:

```text
┌──────────────────────┐
│         json         │
│         json         │
├──────────────────────┤
│ {"duck": 42}         │
│ {"goose": [1, 2, 3]} │
└──────────────────────┘
```

With `array`, each array element is read, e.g., for `birds-array.json`:

```json
[
    {
        "duck": 42
    },
    {
        "goose": [1, 2, 3]
    }
]
```

```sql
FROM read_json_objects('birds-array.json', format = 'array');
```

will again result in two objects being read:

```text
┌──────────────────────────────────────┐
│                 json                 │
│                 json                 │
├──────────────────────────────────────┤
│ {\n        "duck": 42\n    }         │
│ {\n        "goose": [1, 2, 3]\n    } │
└──────────────────────────────────────┘
```

#### Functions for Reading JSON as a Table {#docs:current:data:json:loading_json::functions-for-reading-json-as-a-table}

DuckDB also supports reading JSON as a table, using the following functions:

| Function | Description     |
|:---------|:----------------|
| `read_json(filename)`        | Read JSON from `filename`, where `filename` can also be a list of files, or a glob pattern. |
| `read_json_auto(filename)`   | Alias for `read_json`.                                                                      |
| `read_ndjson(filename)`      | Alias for `read_json` with parameter `format` set to `newline_delimited`.                 |
| `read_ndjson_auto(filename)` | Alias for `read_json` with parameter `format` set to `newline_delimited`.                 |

##### Parameters {#docs:current:data:json:loading_json::parameters}

Besides the `maximum_object_size`, `format`, `ignore_errors` and `compression`, these functions have additional parameters:

| Name | Description | Type | Default |
|:--|:------|:-|:-|
| `auto_detect` | Whether to auto-detect the names of the keys and data types of the values automatically | `BOOL` | `true` |
| `columns` | A struct that specifies the key names and value types contained within the JSON file (e.g., `{key1: 'INTEGER', key2: 'VARCHAR'}`). If `auto_detect` is enabled these will be inferred | `STRUCT` | `(empty)` |
| `dateformat` | Specifies the date format to use when parsing dates. See [Date Format](#docs:current:sql:functions:dateformat) | `VARCHAR` | `iso` |
| `maximum_depth` | Maximum nesting depth to which the automatic schema detection detects types. Set to -1 to fully detect nested JSON types | `BIGINT` | `-1` |
| `records` | Can be one of `auto`, `true`, `false` | `VARCHAR` | `auto` |
| `sample_size` | Option to define number of sample objects for automatic JSON type detection. Set to -1 to scan the entire input file | `UBIGINT` | `20480` |
| `timestampformat` | Specifies the date format to use when parsing timestamps. See [Date Format](#docs:current:sql:functions:dateformat). When set to `iso` (the default), ISO 8601 timestamps with timezone offsets (e.g., `2024-01-01T12:00:00+05:00`) and fractional seconds (e.g., `2024-01-01T12:00:00.123Z`) are automatically inferred as `TIMESTAMP`. | `VARCHAR` | `iso`|
| `union_by_name` | Whether the schemas of multiple JSON files should be [unified](#docs:current:data:multiple_files:combining_schemas) | `BOOL` | `false` |
| `map_inference_threshold` | Controls the threshold for number of columns whose schema will be auto-detected; if JSON schema auto-detection would infer a `STRUCT` type for a field that has _more_ than this threshold number of subfields, it infers a `MAP` type instead. Set to `-1` to disable `MAP` inference. | `BIGINT` | `200` |
| `field_appearance_threshold` | The JSON reader divides the number of appearances of each JSON field by the auto-detection sample size. If the average over the fields of an object is less than this threshold, it will default to using a `MAP` type with value type of merged field types. | `DOUBLE` | `0.1` |

Note that DuckDB can convert JSON arrays directly to its internal `LIST` type, and missing keys become `NULL`:

```sql
SELECT *
FROM read_json(
    ['birds1.json', 'birds2.json'],
    columns = {duck: 'INTEGER', goose: 'INTEGER[]', swan: 'DOUBLE'}
);
```



| duck |   goose   | swan |
|-----:|-----------|-----:|
| 42   | [1, 2, 3] | NULL |
| 43   | [4, 5, 6] | 3.3  |

DuckDB can automatically detect the types like so:

```sql
SELECT goose, duck FROM read_json('*.json.gz');
SELECT goose, duck FROM '*.json.gz'; -- equivalent
```

DuckDB can read (and auto-detect) a variety of formats, specified with the `format` parameter.
Querying a JSON file that contains an `array`, e.g.:

```json
[
  {
    "duck": 42,
    "goose": 4.2
  },
  {
    "duck": 43,
    "goose": 4.3
  }
]
```

Can be queried exactly the same as a JSON file that contains `unstructured` JSON, e.g.:

```json
{
    "duck": 42,
    "goose": 4.2
}
{
    "duck": 43,
    "goose": 4.3
}
```

Both can be read as the table:

```sql
SELECT
FROM read_json('birds.json');
```



| duck | goose |
|-----:|------:|
|   42 |   4.2 |
|   43 |   4.3 |

If your JSON file does not contain “records”, i.e., any other type of JSON than objects, DuckDB can still read it.
This is specified with the `records` parameter.
The `records` parameter specifies whether the JSON contains records that should be unpacked into individual columns.
DuckDB also attempts to auto-detect this.
For example, take the following file, `birds-records.json`:

```json
{"duck": 42, "goose": [1, 2, 3]}
{"duck": 43, "goose": [4, 5, 6]}
```

```sql
SELECT *
FROM read_json('birds-records.json');
```

The query results in two columns:



| duck | goose   |
|-----:|:--------|
|   42 | [1,2,3] |
|   43 | [4,5,6] |

You can read the same file with `records` set to `false`, to get a single column, which is a `STRUCT` containing the data:



| json |
|:-----|
| {'duck': 42, 'goose': [1,2,3]} |
| {'duck': 43, 'goose': [4,5,6]} |

For additional examples reading more complex data, please see the [“Shredding Deeply Nested JSON, One Vector at a Time” blog post](https://duckdb.org/2023/03/03/json).

#### Loading with the `COPY` Statement Using `FORMAT json` {#docs:current:data:json:loading_json::loading-with-the-copy-statement-using-format-json}

When the `json` extension is installed, `FORMAT json` is supported for `COPY FROM`, `IMPORT DATABASE`, as well as `COPY TO` and `EXPORT DATABASE`. See the [`COPY` statement](#docs:current:sql:statements:copy) and the [`IMPORT` / `EXPORT` clauses](#docs:current:sql:statements:export).

By default, `COPY` expects newline-delimited JSON. If you prefer copying data to/from a JSON array, you can specify `ARRAY true`, e.g.,

```sql
COPY (SELECT * FROM range(5) r(i))
TO 'numbers.json' (ARRAY true);
```

will create the following file:

```json
[
	{"i":0},
	{"i":1},
	{"i":2},
	{"i":3},
	{"i":4}
]
```

This can be read back to DuckDB as follows:

```sql
CREATE TABLE numbers (i BIGINT);
COPY numbers FROM 'numbers.json' (ARRAY true);
```

The format can be detected automatically like so:

```sql
CREATE TABLE numbers (i BIGINT);
COPY numbers FROM 'numbers.json' (AUTO_DETECT true);
```

We can also create a table from the auto-detected schema:

```sql
CREATE TABLE numbers AS
    FROM 'numbers.json';
```

##### Parameters {#docs:current:data:json:loading_json::parameters}

| Name | Description | Type | Default |
|:--|:-----|:-|:-|
| `auto_detect` | Whether to auto-detect the names of the keys and data types of the values automatically | `BOOL` | `false` |
| `columns` | A struct that specifies the key names and value types contained within the JSON file (e.g., `{key1: 'INTEGER', key2: 'VARCHAR'}`). If `auto_detect` is enabled these will be inferred | `STRUCT` | `(empty)` |
| `compression` | The compression type for the file. By default this will be detected automatically from the file extension (e.g., `t.json.gz` will use gzip, `t.json` will use none). Options are `uncompressed`, `gzip`, `zstd` and `auto_detect`. | `VARCHAR` | `auto_detect` |
| `convert_strings_to_integers` | Whether strings representing integer values should be converted to a numerical type. | `BOOL` | `false` |
| `dateformat` | Specifies the date format to use when parsing dates. See [Date Format](#docs:current:sql:functions:dateformat) | `VARCHAR` | `iso` |
| `filename` | Whether or not an extra `filename` column should be included in the result. | `BOOL` | `false` |
| `format` | Can be one of `auto, unstructured, newline_delimited, array` | `VARCHAR` | `array` |
| `hive_partitioning` | Whether or not to interpret the path as a [Hive partitioned path](#docs:current:data:partitioning:hive_partitioning). | `BOOL` | `false` |
| `ignore_errors` | Whether to ignore parse errors (only possible when `format` is `newline_delimited`) | `BOOL` | `false` |
| `maximum_depth` | Maximum nesting depth to which the automatic schema detection detects types. Set to -1 to fully detect nested JSON types | `BIGINT` | `-1` |
| `maximum_object_size` | The maximum size of a JSON object (in bytes) | `UINTEGER` | `16777216` |
| `records` | Can be one of `auto`, `true`, `false` | `VARCHAR` | `records` |
| `sample_size` | Option to define number of sample objects for automatic JSON type detection. Set to -1 to scan the entire input file | `UBIGINT` | `20480` |
| `timestampformat` | Specifies the date format to use when parsing timestamps. See [Date Format](#docs:current:sql:functions:dateformat) | `VARCHAR` | `iso`|
| `union_by_name` | Whether the schemas of multiple JSON files should be [unified](#docs:current:data:multiple_files:combining_schemas). | `BOOL` | `false` |

### Writing JSON {#docs:current:data:json:writing_json}

The contents of tables or the result of queries can be written directly to a JSON file using the `COPY` statement.
For example:

```sql
CREATE TABLE cities AS
    FROM (VALUES ('Amsterdam', 1), ('London', 2)) cities(name, id);
COPY cities TO 'cities.json';
```

This will result in `cities.json` with the following content:

```json
{"name":"Amsterdam","id":1}
{"name":"London","id":2}
```

See the [`COPY` statement](#docs:current:sql:statements:copy::copy-to) for more information.

### JSON Type {#docs:current:data:json:json_type}

DuckDB supports `json` via the `JSON` logical type. For example:

```sql
SELECT '[1, null, {"key": "value"}]'::JSON;
```

```text
[1, null, {"key": "value"}]
```

Logically, the `JSON` type is similar to a `VARCHAR`, but with the restriction that it must be valid JSON.
Physically, the data is stored as a `VARCHAR`.

For example, you can't parse invalid JSON:

```sql
SELECT 'unquoted'::JSON;
```

```console
Conversion Error: Malformed JSON at byte 0 of input: unexpected character.  Input: "unquoted"
```

Instead, what you probably want here is `SELECT '"quoted"'::JSON`.

Since the data is stored physically as a `VARCHAR`, whitespace is significant:

```sql
SELECT '{ "a": 5 }'::JSON = '{"a":5}'::JSON;
```

```text
false
```

Please note that whitespaces are kept in roundtrips:

```sql
SELECT '{  "a":5 }'::JSON::VARCHAR
```

```text
{  "a":5 }
```

The order of keys in objects is significant:

```sql
 SELECT '{"a":1,"b":2}'::JSON = '{"b":2,"a":1}'::JSON;
```

```text
false
```

Duplicate keys are allowed in JSON objects:

```sql
SELECT '{"a":1,"a":2}'::JSON;
```

```text
{"a":1,"a":2}
```

We allow any of DuckDB's types to be cast to JSON, and JSON to be cast back to any of DuckDB's types, for example, to cast `JSON` to DuckDB's `STRUCT` type, run:

```sql
SELECT '{"duck": 42}'::JSON::STRUCT(duck INTEGER);
```

```text
{'duck': 42}
```

And back:

```sql
SELECT {duck: 42}::JSON;
```

```text
{"duck":42}
```

This works for our nested types as shown in the example, but also for non-nested types:

```sql
SELECT '2023-05-12'::DATE::JSON;
```

```text
"2023-05-12"
```

The only exception to this behavior is the cast from `VARCHAR` to `JSON`, which does not alter the data, but instead parses and validates the contents of the `VARCHAR` as JSON.

### JSON Processing Functions {#docs:current:data:json:json_functions}

#### JSON Extraction Functions {#docs:current:data:json:json_functions::json-extraction-functions}

There are two extraction functions, which have their respective operators. The operators can only be used if the string is stored as the `JSON` logical type.
These functions support the same two location notations as [JSON Scalar functions](#::json-scalar-functions).

| Function                          | Alias                    | Operator | Description                                                                                                                       |
| :-------------------------------- | :----------------------- | :------- | --------------------------------------------------------------------------------------------------------------------------------- |
| `json_exists(json, path)`         |                          |          | Returns `true` if the supplied path exists in the `json`, and `false` otherwise.                                                  |
| `json_extract(json, path)`        | `json_extract_path`      | `->`     | Extracts `JSON` from `json` at the given `path`. If `path` is a `LIST`, the result will be a `LIST` of `JSON`.                    |
| `json_extract_string(json, path)` | `json_extract_path_text` | `->>`    | Extracts `VARCHAR` from `json` at the given `path`. If `path` is a `LIST`, the result will be a `LIST` of `VARCHAR`.              |
| `json_value(json, path)`          |                          |          | Extracts `JSON` from `json` at the given `path`. If the `json` at the supplied path is not a scalar value, it will return `NULL`. |

Note that the arrow operator `->`, which is used for JSON extracts, has a low precedence as it is also used in [lambda functions](#docs:current:sql:functions:lambda). Therefore, you need to surround the `->` operator with parentheses when expressing operations such as equality comparisons (` =`).
For example:

```sql
SELECT ((JSON '{"field": 42}')->'field') = 42;
```

> **Warning.** DuckDB's JSON data type uses [0-based indexing](#docs:current:data:json:overview::indexing).

Examples:

```sql
CREATE TABLE example (j JSON);
INSERT INTO example VALUES
    ('{ "family": "anatidae", "species": [ "duck", "goose", "swan", null ] }');
```

```sql
SELECT json_extract(j, '$.family') FROM example;
```

```text
"anatidae"
```

```sql
SELECT j->'$.family' FROM example;
```

```text
"anatidae"
```

```sql
SELECT j->'$.species[0]' FROM example;
```

```text
"duck"
```

```sql
SELECT j->'$.species[*]' FROM example;
```

```text
["duck", "goose", "swan", null]
```

```sql
SELECT j->>'$.species[*]' FROM example;
```

```text
[duck, goose, swan, null]
```

```sql
SELECT j->'$.species'->0 FROM example;
```

```text
"duck"
```

```sql
SELECT j->'species'->['/0', '/1'] FROM example;
```

```text
['"duck"', '"goose"']
```

```sql
SELECT json_extract_string(j, '$.family') FROM example;
```

```text
anatidae
```

```sql
SELECT j->>'$.family' FROM example;
```

```text
anatidae
```

```sql
SELECT j->>'$.species[0]' FROM example;
```

```text
duck
```

```sql
SELECT j->'species'->>0 FROM example;
```

```text
duck
```

```sql
SELECT j->'species'->>['/0', '/1'] FROM example;
```

```text
[duck, goose]
```

Note that DuckDB's JSON data type uses [0-based indexing](#docs:current:data:json:overview::indexing).

If multiple values need to be extracted from the same JSON, it is more efficient to extract a list of paths:

The following will cause the JSON to be parsed twice:

Resulting in a slower query that uses more memory:

```sql
SELECT
    json_extract(j, 'family') AS family,
    json_extract(j, 'species') AS species
FROM example;
```



| family     | species                      |
| ---------- | ---------------------------- |
| "anatidae" | ["duck","goose","swan",null] |

The following produces the same result but is faster and more memory-efficient:

```sql
WITH extracted AS (
    SELECT json_extract(j, ['family', 'species']) AS extracted_list
    FROM example
)
SELECT
    extracted_list[1] AS family,
    extracted_list[2] AS species
FROM extracted;
```

#### JSON Scalar Functions {#docs:current:data:json:json_functions::json-scalar-functions}

The following scalar JSON functions can be used to gain information about the stored JSON values.
With the exception of `json_valid(json)`, all JSON functions produce an error when invalid JSON is supplied.

We support two kinds of notations to describe locations within JSON: [JSON Pointer](https://datatracker.ietf.org/doc/html/rfc6901) and JSONPath.

| Function                                    | Description                                                                                                                                                                                                                                                                        |
| :------------------------------------------ | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `json_array_length(json[, path])`           | Return the number of elements in the JSON array `json`, or `0` if it is not a JSON array. If `path` is specified, return the number of elements in the JSON array at the given `path`. If `path` is a `LIST`, the result will be `LIST` of array lengths.                          |
| `json_contains(json_haystack, json_needle)` | Returns `true` if `json_needle` is contained in `json_haystack`. Both parameters are of JSON type, but `json_needle` can also be a numeric value or a string, however the string must be wrapped in double quotes.                                                                 |
| `json_keys(json[, path])`                   | Returns the keys of `json` as a `LIST` of `VARCHAR`, if `json` is a JSON object. If `path` is specified, return the keys of the JSON object at the given `path`. If `path` is a `LIST`, the result will be `LIST` of `LIST` of `VARCHAR`.                                          |
| `json_structure(json)`                      | Return the structure of `json`. Defaults to `JSON` if the structure is inconsistent (e.g., incompatible types in an array).                                                                                                                                                        |
| `json_type(json[, path])`                   | Return the type of the supplied `json`, which is one of `ARRAY`, `BIGINT`, `BOOLEAN`, `DOUBLE`, `OBJECT`, `UBIGINT`, `VARCHAR` and `NULL`. If `path` is specified, return the type of the element at the given `path`. If `path` is a `LIST`, the result will be `LIST` of types. |
| `json_valid(json)`                          | Return whether `json` is valid JSON.                                                                                                                                                                                                                                               |
| `json(json)`                                | Parse and minify `json`.                                                                                                                                                                                                                                                           |

The JSONPointer syntax separates each field with a `/`.
For example, to extract the first element of the array with key `duck`, you can do:

```sql
SELECT json_extract('{"duck": [1, 2, 3]}', '/duck/0');
```

```text
1
```

The JSONPath syntax separates fields with a `.`, and accesses array elements with `[i]`, and always starts with `$`. Using the same example, we can do the following:

```sql
SELECT json_extract('{"duck": [1, 2, 3]}', '$.duck[0]');
```

```text
1
```

Note that DuckDB's JSON data type uses [0-based indexing](#docs:current:data:json:overview::indexing).

JSONPath is more expressive, and can also access from the back of lists:

```sql
SELECT json_extract('{"duck": [1, 2, 3]}', '$.duck[#-1]');
```

```text
3
```

JSONPath also allows escaping syntax tokens, using double quotes:

```sql
SELECT json_extract('{"duck.goose": [1, 2, 3]}', '$."duck.goose"[1]');
```

```text
2
```

Examples using the [anatidae biological family](https://en.wikipedia.org/wiki/Anatidae):

```sql
CREATE TABLE example (j JSON);
INSERT INTO example VALUES
    ('{ "family": "anatidae", "species": [ "duck", "goose", "swan", null ] }');
```

```sql
SELECT json(j) FROM example;
```

```text
{"family":"anatidae","species":["duck","goose","swan",null]}
```

```sql
SELECT j.family FROM example;
```

```text
"anatidae"
```

```sql
SELECT j.species[0] FROM example;
```

```text
"duck"
```

```sql
SELECT json_valid(j) FROM example;
```

```text
true
```

```sql
SELECT json_valid('{');
```

```text
false
```

```sql
SELECT json_array_length('["duck", "goose", "swan", null]');
```

```text
4
```

```sql
SELECT json_array_length(j, 'species') FROM example;
```

```text
4
```

```sql
SELECT json_array_length(j, '/species') FROM example;
```

```text
4
```

```sql
SELECT json_array_length(j, '$.species') FROM example;
```

```text
4
```

```sql
SELECT json_array_length(j, ['$.species']) FROM example;
```

```text
[4]
```

```sql
SELECT json_type(j) FROM example;
```

```text
OBJECT
```

```sql
SELECT json_keys(j) FROM example;
```

```text
[family, species]
```

```sql
SELECT json_structure(j) FROM example;
```

```text
{"family":"VARCHAR","species":["VARCHAR"]}
```

```sql
SELECT json_structure('["duck", {"family": "anatidae"}]');
```

```text
["JSON"]
```

```sql
SELECT json_contains('{"key": "value"}', '"value"');
```

```text
true
```

```sql
SELECT json_contains('{"key": 1}', '1');
```

```text
true
```

```sql
SELECT json_contains('{"top_key": {"key": "value"}}', '{"key": "value"}');
```

```text
true
```

#### JSON Aggregate Functions {#docs:current:data:json:json_functions::json-aggregate-functions}

There are three JSON aggregate functions.

| Function                        | Description                                                            |
| :------------------------------ | :--------------------------------------------------------------------- |
| `json_group_array(any)`         | Return a JSON array with all values of `any` in the aggregation.       |
| `json_group_object(key, value)` | Return a JSON object with all `key`, `value` pairs in the aggregation. |
| `json_group_structure(json)`    | Return the combined `json_structure` of all `json` in the aggregation. |

Examples:

```sql
CREATE TABLE example1 (k VARCHAR, v INTEGER);
INSERT INTO example1 VALUES ('duck', 42), ('goose', 7);
```

```sql
SELECT json_group_array(v) FROM example1;
```

```text
[42, 7]
```

```sql
SELECT json_group_object(k, v) FROM example1;
```

```text
{"duck":42,"goose":7}
```

```sql
CREATE TABLE example2 (j JSON);
INSERT INTO example2 VALUES
    ('{"family": "anatidae", "species": ["duck", "goose"], "coolness": 42.42}'),
    ('{"family": "canidae", "species": ["labrador", "bulldog"], "hair": true}');
```

```sql
SELECT json_group_structure(j) FROM example2;
```

```text
{"family":"VARCHAR","species":["VARCHAR"],"coolness":"DOUBLE","hair":"BOOLEAN"}
```

#### Transforming JSON to Nested Types {#docs:current:data:json:json_functions::transforming-json-to-nested-types}

In many cases, it is inefficient to extract values from JSON one-by-one.
Instead, we can “extract” all values at once, transforming JSON to the nested types `LIST` and `STRUCT`.

| Function                                 | Description                                                            |
| :--------------------------------------- | :--------------------------------------------------------------------- |
| `json_transform(json, structure)`        | Transform `json` according to the specified `structure`.               |
| `from_json(json, structure)`             | Alias for `json_transform`.                                            |
| `json_transform_strict(json, structure)` | Same as `json_transform`, but throws an error when type casting fails. |
| `from_json_strict(json, structure)`      | Alias for `json_transform_strict`.                                     |

The `structure` argument is JSON of the same form as returned by `json_structure`.
The `structure` argument can be modified to transform the JSON into the desired structure and types.
It is possible to extract fewer key/value pairs than are present in the JSON, and it is also possible to extract more: missing keys become `NULL`.

Examples:

```sql
CREATE TABLE example (j JSON);
INSERT INTO example VALUES
    ('{"family": "anatidae", "species": ["duck", "goose"], "coolness": 42.42}'),
    ('{"family": "canidae", "species": ["labrador", "bulldog"], "hair": true}');
```

```sql
SELECT json_transform(j, '{"family": "VARCHAR", "coolness": "DOUBLE"}') FROM example;
```

```text
{'family': anatidae, 'coolness': 42.420000}
{'family': canidae, 'coolness': NULL}
```

```sql
SELECT json_transform(j, '{"family": "TINYINT", "coolness": "DECIMAL(4, 2)"}') FROM example;
```

```text
{'family': NULL, 'coolness': 42.42}
{'family': NULL, 'coolness': NULL}
```

```sql
SELECT json_transform_strict(j, '{"family": "TINYINT", "coolness": "DOUBLE"}') FROM example;
```

```console
Invalid Input Error: Failed to cast value: "anatidae"
```

#### JSON Table Functions {#docs:current:data:json:json_functions::json-table-functions}

DuckDB implements two JSON table functions that take a JSON value and produce a table from it.

| Function                 | Description                                                                                  |
| :----------------------- | :------------------------------------------------------------------------------------------- |
| `json_each(json[ ,path]` | Traverse `json` and return one row for each element in the top-level array or object.        |
| `json_tree(json[ ,path]` | Traverse `json` in depth-first fashion and return one row for each element in the structure. |

If the element is not an array or object, the element itself is returned.
If the optional `path` argument is supplied, traversal starts from the element at the given path instead of the root element.

The resulting table has the following columns:

| Field     | Type               | Description                                 |
| :-------- | :----------------- | :------------------------------------------ |
| `key`     | `VARCHAR`          | Key of element relative to its parent       |
| `value`   | `JSON`             | Value of element                            |
| `type`    | `VARCHAR`          | `json_type` (function) of this element      |
| `atom`    | `JSON`             | `json_value` (function) of this element     |
| `id`      | `UBIGINT`          | Element identifier, numbered by parse order |
| `parent`  | `UBIGINT`          | `id` of parent element                      |
| `fullkey` | `VARCHAR`          | JSON path to element                        |
| `path`    | `VARCHAR`          | JSON path to parent element                 |
| `json`    | `JSON` (Virtual)   | The `json` parameter                        |
| `root`    | `TEXT` (Virtual)   | The `path` parameter                        |
| `rowid`   | `BIGINT` (Virtual) | The row identifier                          |

These functions are analogous to [SQLite's functions with the same name](https://www.sqlite.org/json1.html#jeach).
Note that, because the `json_each` and `json_tree` functions refer to previous subqueries in the same FROM clause, they are [*lateral joins*](#docs:current:sql:query_syntax:from::lateral-joins).

Examples:

```sql
CREATE TABLE example (j JSON);
INSERT INTO example VALUES
    ('{"family": "anatidae", "species": ["duck", "goose"], "coolness": 42.42}'),
    ('{"family": "canidae", "species": ["labrador", "bulldog"], "hair": true}');
```

```sql
SELECT je.*, je.rowid
FROM example AS e, json_each(e.j) AS je;
```

| key      | value                  | type    | atom       |  id | parent | fullkey    | path | rowid |
| -------- | ---------------------- | ------- | ---------- | --: | ------ | ---------- | ---- | ----: |
| family   | "anatidae"             | VARCHAR | "anatidae" |   2 | NULL   | $.family   | $    |     0 |
| species  | ["duck","goose"]       | ARRAY   | NULL       |   4 | NULL   | $.species  | $    |     1 |
| coolness | 42.42                  | DOUBLE  | 42.42      |   8 | NULL   | $.coolness | $    |     2 |
| family   | "canidae"              | VARCHAR | "canidae"  |   2 | NULL   | $.family   | $    |     0 |
| species  | ["labrador","bulldog"] | ARRAY   | NULL       |   4 | NULL   | $.species  | $    |     1 |
| hair     | true                   | BOOLEAN | true       |   8 | NULL   | $.hair     | $    |     2 |

```sql
SELECT je.*, je.rowid
FROM example AS e, json_each(e.j, '$.species') AS je;
```

| key | value      | type    | atom       |  id | parent | fullkey      | path      | rowid |
| --- | ---------- | ------- | ---------- | --: | ------ | ------------ | --------- | ----: |
| 0   | "duck"     | VARCHAR | "duck"     |   5 | NULL   | $.species[0] | $.species |     0 |
| 1   | "goose"    | VARCHAR | "goose"    |   6 | NULL   | $.species[1] | $.species |     1 |
| 0   | "labrador" | VARCHAR | "labrador" |   5 | NULL   | $.species[0] | $.species |     0 |
| 1   | "bulldog"  | VARCHAR | "bulldog"  |   6 | NULL   | $.species[1] | $.species |     1 |

```sql
SELECT je.key, je.value, je.type, je.id, je.parent, je.fullkey, je.rowid
FROM example AS e, json_tree(e.j) AS je;
```

| key      | value                                                             | type    |  id | parent | fullkey      | rowid |
| -------- | ----------------------------------------------------------------- | ------- | --: | ------ | ------------ | ----: |
| NULL     | {"family":"anatidae","species":["duck","goose"],"coolness":42.42} | OBJECT  |   0 | NULL   | $            |     0 |
| family   | "anatidae"                                                        | VARCHAR |   2 | 0      | $.family     |     1 |
| species  | ["duck","goose"]                                                  | ARRAY   |   4 | 0      | $.species    |     2 |
| 0        | "duck"                                                            | VARCHAR |   5 | 4      | $.species[0] |     3 |
| 1        | "goose"                                                           | VARCHAR |   6 | 4      | $.species[1] |     4 |
| coolness | 42.42                                                             | DOUBLE  |   8 | 0      | $.coolness   |     5 |
| NULL     | {"family":"canidae","species":["labrador","bulldog"],"hair":true} | OBJECT  |   0 | NULL   | $            |     0 |
| family   | "canidae"                                                         | VARCHAR |   2 | 0      | $.family     |     1 |
| species  | ["labrador","bulldog"]                                            | ARRAY   |   4 | 0      | $.species    |     2 |
| 0        | "labrador"                                                        | VARCHAR |   5 | 4      | $.species[0] |     3 |
| 1        | "bulldog"                                                         | VARCHAR |   6 | 4      | $.species[1] |     4 |
| hair     | true                                                              | BOOLEAN |   8 | 0      | $.hair       |     5 |

### JSON Format Settings {#docs:current:data:json:format_settings}

The JSON extension can attempt to determine the format of a JSON file when setting `format` to `auto`.
Here are some example JSON files and the corresponding `format` settings that should be used.

In each of the below cases, the `format` setting was not needed, as DuckDB was able to infer it correctly, but it is included for illustrative purposes.
A query of this shape would work in each case:

```sql
SELECT *
FROM filename.json;
```

#### Format: `newline_delimited` {#docs:current:data:json:format_settings::format-newline_delimited}

With `format = 'newline_delimited'` newline-delimited JSON can be parsed.
Each line is a JSON.

We use the example file [`records.json`](https://duckdb.org/data/records.json) with the following content:

```json
{"key1":"value1", "key2": "value1"}
{"key1":"value2", "key2": "value2"}
{"key1":"value3", "key2": "value3"}
```

```sql
SELECT *
FROM read_json('records.json', format = 'newline_delimited');
```



|  key1  |  key2  |
|--------|--------|
| value1 | value1 |
| value2 | value2 |
| value3 | value3 |

#### Format: `array` {#docs:current:data:json:format_settings::format-array}

If the JSON file contains a JSON array of objects (pretty-printed or not), `array_of_objects` may be used.
To demonstrate its use, we use the example file [`records-in-array.json`](https://duckdb.org/data/records-in-array.json):

```json
[
    {"key1":"value1", "key2": "value1"},
    {"key1":"value2", "key2": "value2"},
    {"key1":"value3", "key2": "value3"}
]
```

```sql
SELECT *
FROM read_json('records-in-array.json', format = 'array');
```



|  key1  |  key2  |
|--------|--------|
| value1 | value1 |
| value2 | value2 |
| value3 | value3 |

#### Format: `unstructured` {#docs:current:data:json:format_settings::format-unstructured}

If the JSON file contains JSON that is not newline-delimited or an array, `unstructured` may be used.
To demonstrate its use, we use the example file [`unstructured.json`](https://duckdb.org/data/unstructured.json):

```json
{
    "key1":"value1",
    "key2":"value1"
}
{
    "key1":"value2",
    "key2":"value2"
}
{
    "key1":"value3",
    "key2":"value3"
}
```

```sql
SELECT *
FROM read_json('unstructured.json', format = 'unstructured');
```



|  key1  |  key2  |
|--------|--------|
| value1 | value1 |
| value2 | value2 |
| value3 | value3 |

#### `records` Options {#docs:current:data:json:format_settings::records-options}

The JSON extension can attempt to determine whether a JSON file contains records when setting `records = auto`.
When `records = true`, the JSON extension expects JSON objects, and will unpack the fields of JSON objects into individual columns.

Continuing with the same example file, [`records.json`](https://duckdb.org/data/records.json):

```json
{"key1":"value1", "key2": "value1"}
{"key1":"value2", "key2": "value2"}
{"key1":"value3", "key2": "value3"}
```

```sql
SELECT *
FROM read_json('records.json', records = true);
```



|  key1  |  key2  |
|--------|--------|
| value1 | value1 |
| value2 | value2 |
| value3 | value3 |

When `records = false`, the JSON extension will not unpack the top-level objects, and create `STRUCT`s instead:

```sql
SELECT *
FROM read_json('records.json', records = false);
```



|               json               |
|----------------------------------|
| {'key1': value1, 'key2': value1} |
| {'key1': value2, 'key2': value2} |
| {'key1': value3, 'key2': value3} |

This is especially useful if we have non-object JSON, for example, [`arrays.json`](https://duckdb.org/data/arrays.json):

```json
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
```

```sql
SELECT *
FROM read_json('arrays.json', records = false);
```



|   json    |
|-----------|
| [1, 2, 3] |
| [4, 5, 6] |
| [7, 8, 9] |

### Installing and Loading the JSON Extension {#docs:current:data:json:installing_and_loading}

The `json` extension is shipped by default in DuckDB builds, otherwise, it will be transparently [autoloaded](#docs:current:extensions:overview::autoloading-extensions) on first use. If you would like to install and load it manually, run:

```sql
INSTALL json;
LOAD json;
```

### SQL to/from JSON {#docs:current:data:json:sql_to_and_from_json}

DuckDB provides functions to serialize and deserialize `SELECT` statements between SQL and JSON, as well as executing JSON serialized statements.

| Function | Type | Description |
|:------|:-|:---------|
| `json_deserialize_sql(json)` | Scalar  | Deserialize one or many `json` serialized statements back to an equivalent SQL string. |
| `json_execute_serialized_sql(varchar)` | Table | Execute `json` serialized statements and return the resulting rows. Only one statement at a time is supported for now. |
| `json_serialize_sql(varchar, skip_default := boolean, skip_empty := boolean, skip_null := boolean, format := boolean)` | Scalar | Serialize a set of semicolon-separated (` ;`) select statements to an equivalent list of `json` serialized statements. |
| `PRAGMA json_execute_serialized_sql(varchar)` | Pragma | Pragma version of the `json_execute_serialized_sql` function. |

The `json_serialize_sql(varchar)` function takes three optional parameters, `skip_empty`, `skip_null` and `format` that can be used to control the output of the serialized statements.

If you run the `json_execute_serialized_sql(varchar)` table function inside of a transaction the serialized statements will not be able to see any transaction local changes. This is because the statements are executed in a separate query context. You can use the `PRAGMA json_execute_serialized_sql(varchar)` pragma version to execute the statements in the same query context as the pragma, although with the limitation that the serialized JSON must be provided as a constant string, i.e., you cannot do `PRAGMA json_execute_serialized_sql(json_serialize_sql(...))`.

Note that these functions do not preserve syntactic sugar such as `FROM * SELECT ...`, so a statement round-tripped through `json_deserialize_sql(json_serialize_sql(...))` may not be identical to the original statement, but should always be semantically equivalent and produce the same output.

#### Examples {#docs:current:data:json:sql_to_and_from_json::examples}

Simple example:

```sql
SELECT json_serialize_sql('SELECT 2');
```

```text
{"error":false,"statements":[{"node":{"type":"SELECT_NODE","modifiers":[],"cte_map":{"map":[]},"select_list":[{"class":"CONSTANT","type":"VALUE_CONSTANT","alias":"","query_location":7,"value":{"type":{"id":"INTEGER","type_info":null},"is_null":false,"value":2}}],"from_table":{"type":"EMPTY","alias":"","sample":null,"query_location":18446744073709551615},"where_clause":null,"group_expressions":[],"group_sets":[],"aggregate_handling":"STANDARD_HANDLING","having":null,"sample":null,"qualify":null},"named_param_map":[]}]}
```

Example with multiple statements and skip options:

```sql
SELECT json_serialize_sql('SELECT 1 + 2; SELECT a + b FROM tbl1', skip_empty := true, skip_null := true);
```

```text
{"error":false,"statements":[{"node":{"type":"SELECT_NODE","select_list":[{"class":"FUNCTION","type":"FUNCTION","query_location":9,"function_name":"+","children":[{"class":"CONSTANT","type":"VALUE_CONSTANT","query_location":7,"value":{"type":{"id":"INTEGER"},"is_null":false,"value":1}},{"class":"CONSTANT","type":"VALUE_CONSTANT","query_location":11,"value":{"type":{"id":"INTEGER"},"is_null":false,"value":2}}],"order_bys":{"type":"ORDER_MODIFIER"},"distinct":false,"is_operator":true,"export_state":false}],"from_table":{"type":"EMPTY","query_location":18446744073709551615},"aggregate_handling":"STANDARD_HANDLING"}},{"node":{"type":"SELECT_NODE","select_list":[{"class":"FUNCTION","type":"FUNCTION","query_location":23,"function_name":"+","children":[{"class":"COLUMN_REF","type":"COLUMN_REF","query_location":21,"column_names":["a"]},{"class":"COLUMN_REF","type":"COLUMN_REF","query_location":25,"column_names":["b"]}],"order_bys":{"type":"ORDER_MODIFIER"},"distinct":false,"is_operator":true,"export_state":false}],"from_table":{"type":"BASE_TABLE","query_location":32,"table_name":"tbl1"},"aggregate_handling":"STANDARD_HANDLING"}}]}
```

Skip the default values in the AST (e.g., `"distinct":false`):

```sql
SELECT json_serialize_sql('SELECT 1 + 2; SELECT a + b FROM tbl1', skip_default := true, skip_empty := true, skip_null := true);
```

```text
{"error":false,"statements":[{"node":{"type":"SELECT_NODE","select_list":[{"class":"FUNCTION","type":"FUNCTION","query_location":9,"function_name":"+","children":[{"class":"CONSTANT","type":"VALUE_CONSTANT","query_location":7,"value":{"type":{"id":"INTEGER"},"is_null":false,"value":1}},{"class":"CONSTANT","type":"VALUE_CONSTANT","query_location":11,"value":{"type":{"id":"INTEGER"},"is_null":false,"value":2}}],"order_bys":{"type":"ORDER_MODIFIER"},"is_operator":true}],"from_table":{"type":"EMPTY"},"aggregate_handling":"STANDARD_HANDLING"}},{"node":{"type":"SELECT_NODE","select_list":[{"class":"FUNCTION","type":"FUNCTION","query_location":23,"function_name":"+","children":[{"class":"COLUMN_REF","type":"COLUMN_REF","query_location":21,"column_names":["a"]},{"class":"COLUMN_REF","type":"COLUMN_REF","query_location":25,"column_names":["b"]}],"order_bys":{"type":"ORDER_MODIFIER"},"is_operator":true}],"from_table":{"type":"BASE_TABLE","query_location":32,"table_name":"tbl1"},"aggregate_handling":"STANDARD_HANDLING"}}]}
```
Example with a syntax error:

```sql
SELECT json_serialize_sql('TOTALLY NOT VALID SQL');
```

```text
{"error":true,"error_type":"parser","error_message":"syntax error at or near \"TOTALLY\"","error_subtype":"SYNTAX_ERROR","position":"0"}
```

Example with deserialize:

```sql
SELECT json_deserialize_sql(json_serialize_sql('SELECT 1 + 2'));
```

```text
SELECT (1 + 2)
```

Example with deserialize and syntax sugar, which is lost during the transformation:

```sql
SELECT json_deserialize_sql(json_serialize_sql('FROM x SELECT 1 + 2'));
```

```text
SELECT (1 + 2) FROM x
```

Example with execute:

```sql
SELECT * FROM json_execute_serialized_sql(json_serialize_sql('SELECT 1 + 2'));
```

```text
3
```

Example with error:

```sql
SELECT * FROM json_execute_serialized_sql(json_serialize_sql('TOTALLY NOT VALID SQL'));
```

```console
Parser Error:
Error parsing json: parser: syntax error at or near "TOTALLY"
```

### Caveats {#docs:current:data:json:caveats}

#### Equality Comparison {#docs:current:data:json:caveats::equality-comparison}

> **Warning.** Currently, equality comparison of JSON files can differ based on the context. In some cases, it is based on raw text comparison, while in other cases, it uses logical content comparison.

The following query returns true for all fields:

```sql
SELECT
    a != b, -- Space is part of physical JSON content. Despite equal logical content, values are treated as not equal.
    c != d, -- Same.
    c[0] = d[0], -- Equality because space was removed from physical content of fields:
    a = c[0], -- Indeed, field is equal to empty list without space...
    b != c[0], -- ... but different from empty list with space.
FROM (
    SELECT
        '[]'::JSON AS a,
        '[ ]'::JSON AS b,
        '[[]]'::JSON AS c,
        '[[ ]]'::JSON AS d
    );
```



| (a != b) | (c != d) | (c[0] = d[0]) | (a = c[0]) | (b != c[0]) |
|----------|----------|---------------|------------|-------------|
| true     | true     | true          | true       | true        |

## Multiple Files {#data:multiple_files}

### Reading Multiple Files {#docs:current:data:multiple_files:overview}

DuckDB can read multiple files of different types (CSV, Parquet, JSON files) at the same time using either the glob syntax, or by providing a list of files to read.
See the [combining schemas](#docs:current:data:multiple_files:combining_schemas) page for tips on reading files with different schemas.

#### CSV {#docs:current:data:multiple_files:overview::csv}

Read all files with a name ending in `.csv` in the folder `dir`:

```sql
SELECT *
FROM 'dir/*.csv';
```

Read all files with a name ending in `.csv`, two directories deep:

```sql
SELECT *
FROM '*/*/*.csv';
```

Read all files with a name ending in `.csv`, at any depth in the folder `dir`:

```sql
SELECT *
FROM 'dir/**/*.csv';
```

Read the CSV files `flights1.csv` and `flights2.csv`:

```sql
SELECT *
FROM read_csv(['flights1.csv', 'flights2.csv']);
```

Read the CSV files `flights1.csv` and `flights2.csv`, unifying schemas by name and outputting a `filename` column:

```sql
SELECT *
FROM read_csv(['flights1.csv', 'flights2.csv'], union_by_name = true, filename = true);
```

#### Parquet {#docs:current:data:multiple_files:overview::parquet}

Read all files that match the glob pattern:

```sql
SELECT *
FROM 'test/*.parquet';
```

Read three Parquet files and treat them as a single table:

```sql
SELECT *
FROM read_parquet(['file1.parquet', 'file2.parquet', 'file3.parquet']);
```

Read all Parquet files from two specific folders:

```sql
SELECT *
FROM read_parquet(['folder1/*.parquet', 'folder2/*.parquet']);
```

Read all Parquet files that match the glob pattern at any depth:

```sql
SELECT *
FROM read_parquet('dir/**/*.parquet');
```

#### Multi-File Reads and Globs {#docs:current:data:multiple_files:overview::multi-file-reads-and-globs}

DuckDB can also read a series of Parquet files and treat them as if they were a single table. Note that this only works if the Parquet files have the same schema. You can specify which Parquet files you want to read using a list parameter, glob pattern matching syntax, or a combination of both.

##### List Parameter {#docs:current:data:multiple_files:overview::list-parameter}

The `read_parquet` function can accept a list of filenames as the input parameter.

Read three Parquet files and treat them as a single table:

```sql
SELECT *
FROM read_parquet(['file1.parquet', 'file2.parquet', 'file3.parquet']);
```

##### Glob Syntax {#docs:current:data:multiple_files:overview::glob-syntax}

Any file name input to the `read_parquet` function can either be an exact filename, or use a glob syntax to read multiple files that match a pattern.

|  Wildcard  |                        Description                        |
|------------|-----------------------------------------------------------|
| `*`        | Matches any number of any characters (including none)     |
| `**`       | Matches any number of subdirectories (including none)     |
| `?`        | Matches any single character                              |
| `[abc]`    | Matches one character given in the bracket                |
| `[a-z]`    | Matches one character from the range given in the bracket |

Note that the `?` wildcard in globs is not supported for reads over S3 due to HTTP encoding issues.

Here is an example that reads all the files that end with `.parquet` located in the `test` folder:

Read all files that match the glob pattern:

```sql
SELECT *
FROM read_parquet('test/*.parquet');
```

##### List of Globs {#docs:current:data:multiple_files:overview::list-of-globs}

The glob syntax and the list input parameter can be combined to scan files that meet one of multiple patterns.

Read all Parquet files from 2 specific folders.

```sql
SELECT *
FROM read_parquet(['folder1/*.parquet', 'folder2/*.parquet']);
```

DuckDB can read multiple CSV files at the same time using either the glob syntax, or by providing a list of files to read.

#### Filename {#docs:current:data:multiple_files:overview::filename}

The `filename` argument can be used to add an extra `filename` column to the result that indicates which row came from which file. For example:

```sql
SELECT *
FROM read_csv(['flights1.csv', 'flights2.csv'], union_by_name = true, filename = true);
```

| FlightDate | OriginCityName |  DestCityName   | UniqueCarrier |   filename   |
|------------|----------------|-----------------|---------------|--------------|
| 1988-01-01 | New York, NY   | Los Angeles, CA | NULL          | flights1.csv |
| 1988-01-02 | New York, NY   | Los Angeles, CA | NULL          | flights1.csv |
| 1988-01-03 | New York, NY   | Los Angeles, CA | AA            | flights2.csv |

> The `filename` argument also accepts a string (e.g., `filename = 'input_file'`). When provided, the string is used as the name of the added column. This is useful when the source data already contains a `filename` column and you want to avoid a name collision.

#### Glob Function to Find Filenames {#docs:current:data:multiple_files:overview::glob-function-to-find-filenames}

The glob pattern matching syntax can also be used to search for filenames using the `glob` table function.
It accepts one parameter: the path to search (which may include glob patterns).

Search the current directory for all files.

```sql
SELECT *
FROM glob('*');
```

|     file      |
|---------------|
| test.csv      |
| test.json     |
| test.parquet  |
| test2.csv     |
| test2.parquet |
| todos.json    |

### Combining Schemas {#docs:current:data:multiple_files:combining_schemas}



#### Examples {#docs:current:data:multiple_files:combining_schemas::examples}

Read a set of CSV files combining columns by position:

```sql
SELECT * FROM read_csv('flights*.csv');
```

Read a set of CSV files combining columns by name:

```sql
SELECT * FROM read_csv('flights*.csv', union_by_name = true);
```

#### Combining Schemas {#docs:current:data:multiple_files:combining_schemas::combining-schemas}

When reading from multiple files, we have to **combine schemas** from those files. That is because each file has its own schema that can differ from the other files. DuckDB offers two ways of unifying schemas of multiple files: **by column position** and **by column name**.

By default, DuckDB reads the schema of the first file provided, and then unifies columns in subsequent files by column position. This works correctly as long as all files have the same schema. If the schema of the files differs, you might want to use the `union_by_name` option to allow DuckDB to construct the schema by reading all of the names instead.

Below is an example of how both methods work.

#### Union by Position {#docs:current:data:multiple_files:combining_schemas::union-by-position}

By default, DuckDB unifies the columns of these different files **by position**. This means that the first column in each file is combined together, as well as the second column in each file, etc. For example, consider the following two files.

[`flights1.csv`](https://duckdb.org/data/flights1.csv):

```csv
FlightDate|UniqueCarrier|OriginCityName|DestCityName
1988-01-01|AA|New York, NY|Los Angeles, CA
1988-01-02|AA|New York, NY|Los Angeles, CA
```

[`flights2.csv`](https://duckdb.org/data/flights2.csv):

```csv
FlightDate|UniqueCarrier|OriginCityName|DestCityName
1988-01-03|AA|New York, NY|Los Angeles, CA
```

Reading the two files at the same time will produce the following result set:

| FlightDate | UniqueCarrier | OriginCityName |  DestCityName   |
|------------|---------------|----------------|-----------------|
| 1988-01-01 | AA            | New York, NY   | Los Angeles, CA |
| 1988-01-02 | AA            | New York, NY   | Los Angeles, CA |
| 1988-01-03 | AA            | New York, NY   | Los Angeles, CA |

This is equivalent to the SQL construct [`UNION ALL`](#docs:current:sql:query_syntax:setops::union-all).

#### Union by Name {#docs:current:data:multiple_files:combining_schemas::union-by-name}

If you are processing multiple files that have different schemas, perhaps because columns have been added or renamed, it might be desirable to unify the columns of different files **by name** instead. This can be done by providing the `union_by_name` option. For example, consider the following two files, where `flights4.csv` has an extra column (` UniqueCarrier`).

[`flights3.csv`](https://duckdb.org/data/flights3.csv):

```csv
FlightDate|OriginCityName|DestCityName
1988-01-01|New York, NY|Los Angeles, CA
1988-01-02|New York, NY|Los Angeles, CA
```

[`flights4.csv`](https://duckdb.org/data/flights4.csv):

```csv
FlightDate|UniqueCarrier|OriginCityName|DestCityName
1988-01-03|AA|New York, NY|Los Angeles, CA
```

Reading these when unifying column names **by position** results in an error – as the two files have a different number of columns. When specifying the `union_by_name` option, the columns are correctly unified, and any missing values are set to `NULL`.

```sql
SELECT * FROM read_csv(['flights3.csv', 'flights4.csv'], union_by_name = true);
```

| FlightDate | OriginCityName |  DestCityName   | UniqueCarrier |
|------------|----------------|-----------------|---------------|
| 1988-01-01 | New York, NY   | Los Angeles, CA | NULL          |
| 1988-01-02 | New York, NY   | Los Angeles, CA | NULL          |
| 1988-01-03 | New York, NY   | Los Angeles, CA | AA            |

This is equivalent to the SQL construct [`UNION ALL BY NAME`](#docs:current:sql:query_syntax:setops::union-all-by-name).

> Using the `union_by_name` option increases memory consumption.

## Parquet Files {#data:parquet}

### Reading and Writing Parquet Files {#docs:current:data:parquet:overview}

#### Examples {#docs:current:data:parquet:overview::examples}

Read a single Parquet file:

```sql
SELECT * FROM 'test.parquet';
```

Figure out which columns/types are in a Parquet file:

```sql
DESCRIBE SELECT * FROM 'test.parquet';
```

Create a table from a Parquet file:

```sql
CREATE TABLE test AS
    SELECT * FROM 'test.parquet';
```

If the file does not end in `.parquet`, use the `read_parquet` function:

```sql
SELECT *
FROM read_parquet('test.parq');
```

Use list parameter to read three Parquet files and treat them as a single table:

```sql
SELECT *
FROM read_parquet(['file1.parquet', 'file2.parquet', 'file3.parquet']);
```

Read all files that match the glob pattern:

```sql
SELECT *
FROM 'test/*.parquet';
```

Read all files that match the glob pattern, and include the `filename` virtual column that specifies which file each row came from (this column is available by default without a configuration options since DuckDB v1.3.0):

```sql
SELECT *, filename
FROM read_parquet('test/*.parquet');
```

Use a list of globs to read all Parquet files from two specific folders:

```sql
SELECT *
FROM read_parquet(['folder1/*.parquet', 'folder2/*.parquet']);
```

Read over HTTPS:

```sql
SELECT *
FROM read_parquet('https://some.url/some_file.parquet');
```

Query the [metadata of a Parquet file](#docs:current:data:parquet:metadata::parquet-metadata):

```sql
SELECT *
FROM parquet_metadata('test.parquet');
```

Query the [file metadata of a Parquet file](#docs:current:data:parquet:metadata::parquet-file-metadata):

```sql
SELECT *
FROM parquet_file_metadata('test.parquet');
```

Query the [key-value metadata of a Parquet file](#docs:current:data:parquet:metadata::parquet-key-value-metadata):

```sql
SELECT *
FROM parquet_kv_metadata('test.parquet');
```

Query the [schema of a Parquet file](#docs:current:data:parquet:metadata::parquet-schema):

```sql
SELECT *
FROM parquet_schema('test.parquet');
```

Write the results of a query to a Parquet file using the default compression (Snappy):

```sql
COPY
    (SELECT * FROM tbl)
    TO 'result-snappy.parquet'
    (FORMAT parquet);
```

Write the results from a query to a Parquet file with specific compression and row group size:

```sql
COPY
    (FROM generate_series(100_000))
    TO 'test.parquet'
    (FORMAT parquet, COMPRESSION zstd, ROW_GROUP_SIZE 100_000);
```

Export the table contents of the entire database as parquet:

```sql
EXPORT DATABASE 'target_directory' (FORMAT parquet);
```

#### Parquet Files {#docs:current:data:parquet:overview::parquet-files}

Parquet files are compressed columnar files that are efficient to load and process. DuckDB provides support for both reading and writing Parquet files in an efficient manner, as well as support for pushing filters and projections into the Parquet file scans.

> Parquet datasets differ based on the number of files, the size of individual files, the compression algorithm used, row group size, etc. These have a significant effect on performance. Please consult the [Performance Guide](#docs:current:guides:performance:file_formats) for details.

#### `read_parquet` Function {#docs:current:data:parquet:overview::read_parquet-function}

| Function | Description | Example |
|:--|:--|:-----|
| `read_parquet(path_or_list_of_paths)` | Read Parquet file(s)     | `SELECT * FROM read_parquet('test.parquet');` |
| `parquet_scan(path_or_list_of_paths)` | Alias for `read_parquet` | `SELECT * FROM parquet_scan('test.parquet');` |

If your file ends in `.parquet`, the function syntax is optional. The system will automatically infer that you are reading a Parquet file:

```sql
SELECT * FROM 'test.parquet';
```

Multiple files can be read at once by providing a glob or a list of files. Refer to the [multiple files section](#docs:current:data:multiple_files:overview) for more information.

##### Parameters {#docs:current:data:parquet:overview::parameters}

There are a number of options exposed that can be passed to the `read_parquet` function or the [`COPY` statement](#docs:current:sql:statements:copy).

| Name | Description | Type | Default |
|:--|:-----|:-|:-|
| `binary_as_string` | Parquet files generated by legacy writers do not correctly set the `UTF8` flag for strings, causing string columns to be loaded as `BLOB` instead. Set this to true to load binary columns as strings. | `BOOL` | `false` |
| `encryption_config` | Configuration for [Parquet encryption](#docs:current:data:parquet:encryption). | `STRUCT` | - |
| `filename` | Whether or not an extra `filename` column should be included in the result. Since DuckDB v1.3.0, the `filename` column is added automatically as a virtual column and this option is only kept for compatibility reasons. | `BOOL` | `false` |
| `file_row_number` | Whether or not to include the `file_row_number` column. | `BOOL` | `false` |
| `hive_partitioning` | Whether or not to interpret the path as a [Hive partitioned path](#docs:current:data:partitioning:hive_partitioning). | `BOOL` | (auto-detected) |
| `union_by_name` | Whether the columns of multiple schemas should be [unified by name](#docs:current:data:multiple_files:combining_schemas), rather than by position. | `BOOL` | `false` |
| `schema` | Allows you to read a Parquet file as if it has the supplied schema. Field IDs are required. | `MAP` | `NULL` |

#### Using the `schema` Parameter {#docs:current:data:parquet:overview::using-the-schema-parameter}

The `schema` parameter allows you to read the Parquet file using a specific schema. This is useful for renaming, adding, deleting, reordering, or casting columns when reading Parquet files.

To use the `schema` parameter, field IDs are required. To make them available when creating the Parquet using DuckDB, use:

```sql
COPY (SELECT 42::INTEGER AS i) TO 'integers.parquet' (FIELD_IDS {i: 0});
```

Reading Parquet files:

```sql
SELECT *
FROM read_parquet('integers.parquet', schema = MAP {
                    0: {name: 'renamed_i', type: 'BIGINT', default_value: NULL},
                    1: {name: 'new_column', type: 'UTINYINT', default_value: 43}
                  });
```

```text
┌───────────┬────────────┐
│ renamed_i │ new_column │
│   int64   │   uint8    │
├───────────┼────────────┤
│        42 │         43 │
└───────────┴────────────┘
```

> The `schema` parameter cannot be combined with `union_by_name = true`.

#### Partial Reading {#docs:current:data:parquet:overview::partial-reading}

DuckDB supports projection pushdown into the Parquet file itself. That is to say, when querying a Parquet file, only the columns required for the query are read. This allows you to read only the part of the Parquet file that you are interested in. This will be done automatically by DuckDB.

DuckDB also supports filter pushdown into the Parquet reader. When you apply a filter to a column that is scanned from a Parquet file, the filter will be pushed down into the scan, and can even be used to skip parts of the file using the built-in zonemaps. Note that this will depend on whether or not your Parquet file contains zonemaps.

Filter and projection pushdown provide significant performance benefits. See [our blog post “Querying Parquet with Precision Using DuckDB”](https://duckdb.org/2021/06/25/querying-parquet) for more information.

#### Inserts and Views {#docs:current:data:parquet:overview::inserts-and-views}

You can also insert the data into a table or create a table from the Parquet file directly. This will load the data from the Parquet file and insert it into the database:

Insert the data from the Parquet file in the table:

```sql
INSERT INTO people
    SELECT * FROM read_parquet('test.parquet');
```

Create a table directly from a Parquet file:

```sql
CREATE TABLE people AS
    SELECT * FROM read_parquet('test.parquet');
```

If you wish to keep the data stored inside the Parquet file, but want to query the Parquet file directly, you can create a view over the `read_parquet` function. You can then query the Parquet file as if it were a built-in table:

Create a view over the Parquet file:

```sql
CREATE VIEW people AS
    SELECT * FROM read_parquet('test.parquet');
```

Query the Parquet file:

```sql
SELECT * FROM people;
```

#### Writing to Parquet Files {#docs:current:data:parquet:overview::writing-to-parquet-files}

DuckDB also has support for writing to Parquet files using the `COPY` statement syntax. See the [`COPY` Statement page](#docs:current:sql:statements:copy) for details, including all possible parameters for the `COPY` statement.

Write a query to a Snappy-compressed Parquet file:

```sql
COPY
    (SELECT * FROM tbl)
    TO 'result-snappy.parquet'
    (FORMAT parquet);
```

Write `tbl` to a zstd-compressed Parquet file:

```sql
COPY tbl
    TO 'result-zstd.parquet'
    (FORMAT parquet, COMPRESSION zstd);
```

Write `tbl` to a zstd-compressed Parquet file with the lowest compression level yielding the fastest compression:

```sql
COPY tbl
    TO 'result-zstd.parquet'
    (FORMAT parquet, COMPRESSION zstd, COMPRESSION_LEVEL 1);
```

Write to Parquet file with [key-value metadata](#docs:current:data:parquet:metadata::parquet-key-value-metadata):

```sql
COPY (
    SELECT
        42 AS number,
        true AS is_even
) TO 'kv_metadata.parquet' (
    FORMAT parquet,
    KV_METADATA {
        number: 'Answer to life, universe, and everything',
        is_even: 'not ''odd''' -- single quotes in values must be escaped
    }
);
```

Write to a Parquet v2 file:

```sql
COPY tbl
    TO 'result-v2.parquet'
    (FORMAT parquet, PARQUET_VERSION 'V2');
```

Write a CSV file to an uncompressed Parquet file:

```sql
COPY
    'test.csv'
    TO 'result-uncompressed.parquet'
    (FORMAT parquet, COMPRESSION uncompressed);
```

Write a query to a Parquet file with zstd-compression and row group size:

```sql
COPY
    (FROM generate_series(100_000))
    TO 'row-groups-zstd.parquet'
    (FORMAT parquet, COMPRESSION zstd, ROW_GROUP_SIZE 100_000);
```

Write data to an LZ4-compressed Parquet file:

```sql
COPY
    (FROM generate_series(100_000))
    TO 'result-lz4.parquet'
    (FORMAT parquet, COMPRESSION lz4);
```

Or, equivalently:

```sql
COPY
    (FROM generate_series(100_000))
    TO 'result-lz4.parquet'
    (FORMAT parquet, COMPRESSION lz4_raw);
```

Write data to a Brotli-compressed Parquet file:

```sql
COPY
    (FROM generate_series(100_000))
    TO 'result-brotli.parquet'
    (FORMAT parquet, COMPRESSION brotli);
```

To configure the page size of Parquet file's dictionary pages, use the `STRING_DICTIONARY_PAGE_SIZE_LIMIT` option (default: 1 MB):

```sql
COPY
    lineitem
    TO 'lineitem-with-custom-dictionary-size.parquet'
    (FORMAT parquet, STRING_DICTIONARY_PAGE_SIZE_LIMIT 100_000);
```

DuckDB's `EXPORT` command can be used to export an entire database to a series of Parquet files. See the [“`EXPORT` statement” page](#docs:current:sql:statements:export) for more details:

Export the table contents of the entire database as Parquet:

```sql
EXPORT DATABASE 'target_directory' (FORMAT parquet);
```

#### Encryption {#docs:current:data:parquet:overview::encryption}

DuckDB supports reading and writing [encrypted Parquet files](#docs:current:data:parquet:encryption).

#### Supported Features {#docs:current:data:parquet:overview::supported-features}

The list of supported Parquet features is available in the [Parquet documentation's “Implementation status” page](https://parquet.apache.org/docs/file-format/implementationstatus/).

#### Installing and Loading the Parquet Extension {#docs:current:data:parquet:overview::installing-and-loading-the-parquet-extension}

The support for Parquet files is enabled via extension. The `parquet` extension is bundled with almost all clients. However, if your client does not bundle the `parquet` extension, the extension must be installed separately:

```sql
INSTALL parquet;
```

### Querying Parquet Metadata {#docs:current:data:parquet:metadata}

#### Parquet Metadata {#docs:current:data:parquet:metadata::parquet-metadata}

The `parquet_metadata` function can be used to query the metadata contained within a Parquet file, which reveals various internal details of the Parquet file such as the statistics of the different columns. This can be useful for figuring out what kind of skipping is possible in Parquet files, or even to obtain a quick overview of what the different columns contain. The function supports glob patterns to query metadata across multiple files in parallel:

```sql
SELECT *
FROM parquet_metadata('test.parquet');
```

```sql
SELECT *
FROM parquet_metadata('data/*.parquet');
```

Below is a table of the columns returned by `parquet_metadata`.



| Field                      | Type            |
| -------------------------- | --------------- |
| file_name                  | VARCHAR         |
| row_group_id               | BIGINT          |
| row_group_num_rows         | BIGINT          |
| row_group_num_columns      | BIGINT          |
| row_group_bytes            | BIGINT          |
| column_id                  | BIGINT          |
| file_offset                | BIGINT          |
| num_values                 | BIGINT          |
| path_in_schema             | VARCHAR         |
| type                       | VARCHAR         |
| stats_min                  | VARCHAR         |
| stats_max                  | VARCHAR         |
| stats_null_count           | BIGINT          |
| stats_distinct_count       | BIGINT          |
| stats_min_value            | VARCHAR         |
| stats_max_value            | VARCHAR         |
| compression                | VARCHAR         |
| encodings                  | VARCHAR         |
| index_page_offset          | BIGINT          |
| dictionary_page_offset     | BIGINT          |
| data_page_offset           | BIGINT          |
| total_compressed_size      | BIGINT          |
| total_uncompressed_size    | BIGINT          |
| key_value_metadata         | MAP(BLOB, BLOB) |
| bloom_filter_offset        | BIGINT          |
| bloom_filter_length        | BIGINT          |
| min_is_exact               | BOOLEAN         |
| max_is_exact               | BOOLEAN         |
| row_group_compressed_bytes | BIGINT          |

#### Parquet Schema {#docs:current:data:parquet:metadata::parquet-schema}

The `parquet_schema` function can be used to query the internal schema contained within a Parquet file. Note that this is the schema as it is contained within the metadata of the Parquet file. If you want to figure out the column names and types contained within a Parquet file it is easier to use `DESCRIBE`.

Fetch the column names and column types:

```sql
DESCRIBE SELECT * FROM 'test.parquet';
```

Fetch the internal schema of a Parquet file:

```sql
SELECT *
FROM parquet_schema('test.parquet');
```

Below is a table of the columns returned by `parquet_schema`.



| Field           | Type    |
| --------------- | ------- |
| file_name       | VARCHAR |
| name            | VARCHAR |
| type            | VARCHAR |
| type_length     | VARCHAR |
| repetition_type | VARCHAR |
| num_children    | BIGINT  |
| converted_type  | VARCHAR |
| scale           | BIGINT  |
| precision       | BIGINT  |
| field_id        | BIGINT  |
| logical_type    | VARCHAR |

#### Parquet File Metadata {#docs:current:data:parquet:metadata::parquet-file-metadata}

The `parquet_file_metadata` function can be used to query file-level metadata such as the format version and the encryption algorithm used:

```sql
SELECT *
FROM parquet_file_metadata('test.parquet');
```

Below is a table of the columns returned by `parquet_file_metadata`.



| Field                       | Type      |
| --------------------------- | --------- |
| file_name                   | VARCHAR   |
| created_by                  | VARCHAR   |
| num_rows                    | BIGINT    |
| num_row_groups              | BIGINT    |
| format_version              | BIGINT    |
| encryption_algorithm        | VARCHAR   |
| footer_signing_key_metadata | VARCHAR   |
| file_size_bytes             | UBIGINT   |
| footer_size                 | UBIGINT   |
| column_orders               | VARCHAR[] |

#### Parquet Key-Value Metadata {#docs:current:data:parquet:metadata::parquet-key-value-metadata}

The `parquet_kv_metadata` function can be used to query custom metadata defined as key-value pairs:

```sql
SELECT *
FROM parquet_kv_metadata('test.parquet');
```

Below is a table of the columns returned by `parquet_kv_metadata`.



| Field     | Type    |
| --------- | ------- |
| file_name | VARCHAR |
| key       | BLOB    |
| value     | BLOB    |

#### Full Metadata {#docs:current:data:parquet:metadata::full-metadata}

The `parquet_full_metadata` function returns all metadata for a Parquet file in a single row, combining the results of `parquet_file_metadata`, `parquet_metadata`, `parquet_schema`, and `parquet_kv_metadata` as nested struct arrays:

```sql
SELECT *
FROM parquet_full_metadata('test.parquet');
```



| Field                 | Type                    |
| --------------------- | ----------------------- |
| parquet_file_metadata | STRUCT(...)[]           |
| parquet_metadata      | STRUCT(...)[]           |
| parquet_schema        | STRUCT(...)[]           |
| parquet_kv_metadata   | STRUCT(...)[]           |

Each struct array contains the same columns as the corresponding standalone function.

#### Bloom Filters {#docs:current:data:parquet:metadata::bloom-filters}

DuckDB [supports Bloom filters](https://duckdb.org/2025/03/07/parquet-bloom-filters-in-duckdb) for pruning the row groups that need to be read to answer highly selective queries.
Currently, Bloom filters are supported for the following types:

* Integer types: `TINYINT`, `UTINYINT`, `SMALLINT`, `USMALLINT`, `INTEGER`, `UINTEGER`, `BIGINT`, `UBIGINT`
* Floating point types: `FLOAT`, `DOUBLE`
* `VARCHAR`
* `BLOB`

The `parquet_bloom_probe(filename, column_name, value)` function shows which row groups can be excluded when filtering for a given value of a given column using the Bloom filter.
For example:

```sql
FROM parquet_bloom_probe('my_file.parquet', 'my_col', 500);
```

| file_name       | row_group_id | bloom_filter_excludes |
| --------------- | -----------: | --------------------: |
| my_file.parquet |            0 |                  true |
| ...             |          ... |                   ... |
| my_file.parquet |            9 |                 false |

### Parquet Encryption {#docs:current:data:parquet:encryption}

Starting with version 0.10.0, DuckDB supports reading and writing encrypted Parquet files.
DuckDB broadly follows the [Parquet Modular Encryption specification](https://github.com/apache/parquet-format/blob/master/Encryption.md) with some [limitations](#::limitations).

#### Reading and Writing Encrypted Files {#docs:current:data:parquet:encryption::reading-and-writing-encrypted-files}

Using the `PRAGMA add_parquet_key` function, named encryption keys of 128, 192, or 256 bits can be added to a session. These keys are stored in-memory:

```sql
PRAGMA add_parquet_key('key128', '0123456789112345');
PRAGMA add_parquet_key('key192', '012345678911234501234567');
PRAGMA add_parquet_key('key256', '01234567891123450123456789112345');
PRAGMA add_parquet_key('key256base64', 'MDEyMzQ1Njc4OTExMjM0NTAxMjM0NTY3ODkxMTIzNDU=');
```

##### Writing Encrypted Parquet Files {#docs:current:data:parquet:encryption::writing-encrypted-parquet-files}

After specifying the key (e.g., `key256`), files can be encrypted as follows:

```sql
COPY tbl TO 'tbl.parquet' (ENCRYPTION_CONFIG {footer_key: 'key256'});
```

##### Reading Encrypted Parquet Files {#docs:current:data:parquet:encryption::reading-encrypted-parquet-files}

An encrypted Parquet file using a specific key (e.g., `key256`), can then be read as follows:

```sql
COPY tbl FROM 'tbl.parquet' (ENCRYPTION_CONFIG {footer_key: 'key256'});
```

Or:

```sql
SELECT *
FROM read_parquet('tbl.parquet', encryption_config = {footer_key: 'key256'});
```

#### Interoperability {#docs:current:data:parquet:encryption::interoperability}

DuckDB can read uniformly encrypted Parquet files written by the Arrow C++ API (e.g., via PyArrow), as long as the same encryption key is used for both the footer and all columns.

#### Limitations {#docs:current:data:parquet:encryption::limitations}

DuckDB's Parquet encryption currently has the following limitations.

DuckDB encrypts the footer and all columns using the `footer_key`. The Parquet specification allows encryption of individual columns with different keys, e.g.:

```sql
COPY tbl TO 'tbl.parquet'
    (ENCRYPTION_CONFIG {
        footer_key: 'key256',
        column_keys: {key256: ['col0', 'col1']}
    });
```

However, this is unsupported at the moment and will cause an error to be thrown (for now):

```console
Not implemented Error: Parquet encryption_config column_keys not yet implemented
```

#### Performance Implications {#docs:current:data:parquet:encryption::performance-implications}

Note that encryption has some performance implications.
Without encryption, reading/writing the `lineitem` table from [`TPC-H`](#docs:current:core_extensions:tpch) at SF1, which is 6M rows and 15 columns, from/to a Parquet file takes 0.26 and 0.99 seconds, respectively.
With encryption, this takes 0.64 and 2.21 seconds, both approximately 2.5× slower than the unencrypted version.

### Parquet Tips {#docs:current:data:parquet:tips}

Below is a collection of tips to help when dealing with Parquet files.

#### Tips for Reading Parquet Files {#docs:current:data:parquet:tips::tips-for-reading-parquet-files}

##### Use `union_by_name` When Loading Files with Different Schemas {#docs:current:data:parquet:tips::use-union_by_name-when-loading-files-with-different-schemas}

The `union_by_name` option can be used to unify the schema of files that have different or missing columns. For files that do not have certain columns, `NULL` values are filled in:

```sql
SELECT *
FROM read_parquet('flights*.parquet', union_by_name = true);
```

#### Tips for Writing Parquet Files {#docs:current:data:parquet:tips::tips-for-writing-parquet-files}

Using a [glob pattern](#docs:current:data:multiple_files:overview::glob-syntax) upon read or a [Hive partitioning](#docs:current:data:partitioning:hive_partitioning) structure are good ways to transparently handle multiple files.

##### Enabling `PER_THREAD_OUTPUT` {#docs:current:data:parquet:tips::enabling-per_thread_output}

If the final number of Parquet files is not important, writing one file per thread can significantly improve performance:

```sql
COPY
    (FROM generate_series(10_000_000))
    TO 'test.parquet'
    (FORMAT parquet, PER_THREAD_OUTPUT);
```

##### Selecting a `ROW_GROUP_SIZE` {#docs:current:data:parquet:tips::selecting-a-row_group_size}

The `ROW_GROUP_SIZE` parameter specifies the minimum number of rows in a Parquet row group, with a minimum value equal to DuckDB's vector size, 2,048, and a default of 122,880.
A Parquet row group is a partition of rows, consisting of a column chunk for each column in the dataset.

Compression algorithms are only applied per row group, so the larger the row group size, the more opportunities to compress the data.
On the other hand, larger row group sizes mean that each thread keeps more data in memory before flushing when streaming results.
Another argument for smaller row group sizes is that DuckDB can read Parquet row groups in parallel even within the same file and uses predicate pushdown to only scan the row groups whose metadata ranges match the `WHERE` clause of the query. However, there is some overhead associated with reading the metadata in each group.

A good rule of thumb is to ensure that the number of row groups per file is at least as large as the number of CPU threads used to query that file.
More row groups beyond the thread count would improve the speed of highly selective queries, but slow down queries that must scan the whole file like aggregations.

To write a query to a Parquet file with a different row group size, run:

```sql
COPY
    (FROM generate_series(100_000))
    TO 'row-groups.parquet'
    (FORMAT parquet, ROW_GROUP_SIZE 100_000);
```

##### The `ROW_GROUPS_PER_FILE` Option {#docs:current:data:parquet:tips::the-row_groups_per_file-option}

The `ROW_GROUPS_PER_FILE` parameter creates a new Parquet file if the current one has a specified number of row groups.

```sql
COPY
    (FROM generate_series(100_000))
    TO 'output-directory'
    (FORMAT parquet, ROW_GROUP_SIZE 20_000, ROW_GROUPS_PER_FILE 2);
```

> If multiple threads are active, the number of row groups in a file may slightly exceed the specified number of row groups to limit the amount of locking – similarly to the behavior of [`FILE_SIZE_BYTES`](#..:..:sql:statements:copy::copy--to-options).
> However, if `PER_THREAD_OUTPUT` is set, only one thread writes to each file, and it becomes accurate again.

See the [Performance Guide on “File Formats”](#docs:current:guides:performance:file_formats::parquet-file-sizes) for more tips.

## Partitioning {#data:partitioning}

### Hive Partitioning {#docs:current:data:partitioning:hive_partitioning}

#### Examples {#docs:current:data:partitioning:hive_partitioning::examples}

Read data from a Hive partitioned dataset:

```sql
SELECT *
FROM read_parquet('orders/*/*/*.parquet', hive_partitioning = true);
```

Write a table to a Hive partitioned dataset:

```sql
COPY orders
TO 'orders' (FORMAT parquet, PARTITION_BY (year, month));
```

Note that the `PARTITION_BY` options cannot use expressions. You can produce columns on the fly using the following syntax:

```sql
COPY (SELECT *, year(timestamp) AS year, month(timestamp) AS month FROM services)
TO 'test' (PARTITION_BY (year, month));
```

When reading, the partition columns are read from the directory structure and
can be included or excluded depending on the `hive_partitioning` parameter.

```sql
FROM read_parquet('test/*/*/*.parquet', hive_partitioning = false); -- will not include year, month columns
FROM read_parquet('test/*/*/*.parquet', hive_partitioning = true);  -- will include year, month partition columns
```

#### Hive Partitioning {#docs:current:data:partitioning:hive_partitioning::hive-partitioning}

Hive partitioning is a [partitioning strategy](https://en.wikipedia.org/wiki/Partition_(database)) that is used to split a table into multiple files based on **partition keys**. The files are organized into folders. Within each folder, the **partition key** has a value that is determined by the name of the folder.

Below is an example of a Hive partitioned file hierarchy. The files are partitioned on two keys (` year` and `month`).

```text
orders
├── year=2021
│    ├── month=1
│    │   ├── file1.parquet
│    │   └── file2.parquet
│    └── month=2
│        └── file3.parquet
└── year=2022
     ├── month=11
     │   ├── file4.parquet
     │   └── file5.parquet
     └── month=12
         └── file6.parquet
```

Files stored in this hierarchy can be read using the `hive_partitioning` flag.

```sql
SELECT *
FROM read_parquet('orders/*/*/*.parquet', hive_partitioning = true);
```

When we specify the `hive_partitioning` flag, the values of the columns will be read from the directories.

##### Filter Pushdown {#docs:current:data:partitioning:hive_partitioning::filter-pushdown}

Filters on the partition keys are automatically pushed down into the files. This way the system skips reading files that are not necessary to answer a query. For example, consider the following query on the above dataset:

```sql
SELECT *
FROM read_parquet('orders/*/*/*.parquet', hive_partitioning = true)
WHERE year = 2022
  AND month = 11;
```

When executing this query, only the following files will be read:

```text
orders
└── year=2022
     └── month=11
         ├── file4.parquet
         └── file5.parquet
```

##### Auto-detection {#docs:current:data:partitioning:hive_partitioning::auto-detection}

By default the system tries to infer if the provided files are in a hive partitioned hierarchy. And if so, the `hive_partitioning` flag is enabled automatically. The auto-detection will look at the names of the folders and search for a `'key' = 'value'` pattern. This behavior can be overridden by using the `hive_partitioning` configuration option:

```sql
SET hive_partitioning = false;
```

##### Hive Types {#docs:current:data:partitioning:hive_partitioning::hive-types}

`hive_types` is a way to specify the logical types of the hive partitions in a struct:

```sql
SELECT *
FROM read_parquet(
    'dir/**/*.parquet',
    hive_partitioning = true,
    hive_types = {'release': DATE, 'orders': BIGINT}
);
```

`hive_types` will be auto-detected for the following types: `DATE`, `TIMESTAMP` and `BIGINT`. To switch off the auto-detection, the flag `hive_types_autocast = 0` can be set.

##### Writing Partitioned Files {#docs:current:data:partitioning:hive_partitioning::writing-partitioned-files}

See the [Partitioned Writes](#docs:current:data:partitioning:partitioned_writes) section.

### Partitioned Writes {#docs:current:data:partitioning:partitioned_writes}

#### Examples {#docs:current:data:partitioning:partitioned_writes::examples}

Write a table to a Hive partitioned dataset of Parquet files:

```sql
COPY orders TO 'orders'
(FORMAT parquet, PARTITION_BY (year, month));
```

Write a table to a Hive partitioned dataset of CSV files, allowing overwrites:

```sql
COPY orders TO 'orders'
(FORMAT csv, PARTITION_BY (year, month), OVERWRITE_OR_IGNORE);
```

Write a table to a Hive partitioned dataset of GZIP-compressed CSV files, setting explicit data files' extension:

```sql
COPY orders TO 'orders'
(FORMAT csv, PARTITION_BY (year, month), COMPRESSION gzip, FILE_EXTENSION 'csv.gz');
```

#### Partitioned Writes {#docs:current:data:partitioning:partitioned_writes::partitioned-writes}

When the `PARTITION_BY` clause is specified for the [`COPY` statement](#docs:current:sql:statements:copy), the files are written in a [Hive partitioned](#docs:current:data:partitioning:hive_partitioning) folder hierarchy. The target is the name of the root directory (in the example above: `orders`). The files are written in-order in the file hierarchy. Currently, one file is written per thread to each directory.

```text
orders
├── year=2021
│    ├── month=1
│    │   ├── data_1.parquet
│    │   └── data_2.parquet
│    └── month=2
│        └── data_1.parquet
└── year=2022
     ├── month=11
     │   ├── data_1.parquet
     │   └── data_2.parquet
     └── month=12
         └── data_1.parquet
```

The values of the partitions are automatically extracted from the data. Note that it can be very expensive to write a larger number of partitions as many files will be created. The ideal partition count depends on how large your dataset is.

To limit the maximum number of files the system can keep open before flushing to disk when writing using `PARTITION_BY`, use the `partitioned_write_max_open_files` configuration option (default: 100):

```batch
SET partitioned_write_max_open_files = 10;
```

> **Best practice.** Writing data into many small partitions is expensive. It is generally recommended to have at least `100 MB` of data per partition.

##### Filename Pattern {#docs:current:data:partitioning:partitioned_writes::filename-pattern}

By default, files will be named `data_0.parquet` or `data_0.csv`. With the flag `FILENAME_PATTERN` a pattern with `{i}` or `{uuid}` can be defined to create specific filenames:

* `{i}` will be replaced by an index.
* `{uuid}` will be replaced by a 128 bits long UUID.

Write a table to a Hive partitioned dataset of .parquet files, with an index in the filename:

```sql
COPY orders TO 'orders'
(FORMAT parquet, PARTITION_BY (year, month), OVERWRITE_OR_IGNORE, FILENAME_PATTERN 'orders_{i}');
```

Write a table to a Hive partitioned dataset of .parquet files, with unique filenames:

```sql
COPY orders TO 'orders'
(FORMAT parquet, PARTITION_BY (year, month), OVERWRITE_OR_IGNORE, FILENAME_PATTERN 'file_{uuid}');
```

##### Overwriting {#docs:current:data:partitioning:partitioned_writes::overwriting}

By default the partitioned write will not allow overwriting existing directories.
On a local file system, the `OVERWRITE` and `OVERWRITE_OR_IGNORE` options remove the existing directories.
On remote file systems, overwriting is not supported.

##### Appending {#docs:current:data:partitioning:partitioned_writes::appending}

To append to an existing Hive partitioned directory structure, use the `APPEND` option:

```sql
COPY orders TO 'orders'
(FORMAT parquet, PARTITION_BY (year, month), APPEND);
```

Using the `APPEND` option results in a behavior similar to the `OVERWRITE_OR_IGNORE, FILENAME_PATTERN '{uuid}'` options,
but DuckDB performs an extra check for whether the file already exists and then regenerates the UUID in the rare event that it does (to avoid clashes).

##### Handling Slashes in Columns {#docs:current:data:partitioning:partitioned_writes::handling-slashes-in-columns}

To handle slashes in column names, use Percent-Encoding implemented by the [`url_encode` function](#docs:current:sql:functions:text::url_encodestring).

## Appender {#docs:current:data:appender}

The Appender can be used to load bulk data into a DuckDB database. It is currently available in the [C, C++, Go, Java and Rust APIs](#::appender-support-in-other-clients). The Appender is tied to a connection, and will use the transaction context of that connection when appending. An Appender always appends to a single table in the database file.

In the [C++ API](#docs:current:clients:cpp), the Appender works as follows:

```cpp
DuckDB db;
Connection con(db);
// create the table
con.Query("CREATE TABLE people (id INTEGER, name VARCHAR)");
// initialize the appender
Appender appender(con, "people");
```

The `AppendRow` function is the easiest way of appending data. It uses recursive templates to allow you to put all the values of a single row within one function call, as follows:

```cpp
appender.AppendRow(1, "Mark");
```

Rows can also be individually constructed using the `BeginRow`, `EndRow` and `Append` methods. This is done internally by `AppendRow`, and hence has the same performance characteristics.

```cpp
appender.BeginRow();
appender.Append<int32_t>(2);
appender.Append<string>("Hannes");
appender.EndRow();
```

Any values added to the Appender are cached prior to being inserted into the database system
for performance reasons. That means that, while appending, the rows might not be immediately visible in the system. The cache is automatically flushed when the Appender goes out of scope or when `appender.Close()` is called. The cache can also be manually flushed using the `appender.Flush()` method. After either `Flush` or `Close` is called, all the data has been written to the database system.

#### Date, Time and Timestamps {#docs:current:data:appender::date-time-and-timestamps}

While numbers and strings are rather self-explanatory, dates, times and timestamps require some explanation. They can be directly appended using the methods provided by `duckdb::Date`, `duckdb::Time` or `duckdb::Timestamp`. They can also be appended using the internal `duckdb::Value` type, however, this adds some additional overheads and should be avoided if possible.

Below is a short example:

```cpp
con.Query("CREATE TABLE dates (d DATE, t TIME, ts TIMESTAMP)");
Appender appender(con, "dates");

// construct the values using the Date/Time/Timestamp types
// (this is the most efficient approach)
appender.AppendRow(
    Date::FromDate(1992, 1, 1),
    Time::FromTime(1, 1, 1, 0),
    Timestamp::FromDatetime(Date::FromDate(1992, 1, 1), Time::FromTime(1, 1, 1, 0))
);
// construct duckdb::Value objects
appender.AppendRow(
    Value::DATE(1992, 1, 1),
    Value::TIME(1, 1, 1, 0),
    Value::TIMESTAMP(1992, 1, 1, 1, 1, 1, 0)
);
```

#### Commit Frequency {#docs:current:data:appender::commit-frequency}

By default, the appender performs commits every 204,800 rows.
You can change this by explicitly using [transactions](#docs:current:sql:statements:transactions) and surrounding your batches of `AppendRow` calls by `BEGIN TRANSACTION` and `COMMIT` statements.

#### Handling Constraint Violations {#docs:current:data:appender::handling-constraint-violations}

If the Appender encounters a `PRIMARY KEY` conflict or a `UNIQUE` constraint violation, it fails and returns the following error:

```console
Constraint Error:
PRIMARY KEY or UNIQUE constraint violated: duplicate key "..."
```

In this case, the entire append operation fails and no rows are inserted.

#### Appender Support in Other Clients {#docs:current:data:appender::appender-support-in-other-clients}

The Appender is also available in the following client APIs:

* [C](#docs:current:clients:c:appender)
* [Go](#docs:current:clients:go::appender)
* [Java (JDBC)](#docs:current:clients:java::appender)
* [Julia](#docs:current:clients:tertiary_clients:julia::appender-api)
* [Rust](#docs:current:clients:rust::appender)
* [Node.js](#docs:current:clients:node_neo:overview::append-to-table)

## INSERT Statements {#docs:current:data:insert}

`INSERT` statements are the standard way of loading data into a relational database. When using `INSERT` statements, the values are supplied row-by-row. While simple, there is significant overhead involved in parsing and processing individual `INSERT` statements. This makes lots of individual row-by-row insertions very inefficient for bulk insertion.

> **Best practice.** As a rule-of-thumb, avoid using lots of individual row-by-row `INSERT` statements when inserting more than a few rows (i.e., avoid using `INSERT` statements as part of a loop). When bulk inserting data, try to maximize the amount of data that is inserted per statement.

If you must use `INSERT` statements to load data in a loop, avoid executing the statements in auto-commit mode. After every commit, the database is required to sync the changes made to disk to ensure no data is lost. In auto-commit mode every single statement will be wrapped in a separate transaction, meaning `fsync` will be called for every statement. This is typically unnecessary when bulk loading and will significantly slow down your program.

> **Tip.** If you absolutely must use `INSERT` statements in a loop to load data, wrap them in calls to `BEGIN TRANSACTION` and `COMMIT`.

#### Syntax {#docs:current:data:insert::syntax}

An example of using `INSERT INTO` to load data in a table is as follows:

```sql
CREATE TABLE people (id INTEGER, name VARCHAR);
INSERT INTO people VALUES (1, 'Mark'), (2, 'Hannes');
```

For a more detailed description together with a syntax diagram, see the [page on the `INSERT` statement](#docs:current:sql:statements:insert).

# Lakehouse Formats {#docs:current:lakehouse_formats}

Lakehouse formats, often referred to as open table formats, are specifications for storing data in object storage while maintaining some guarantees such as ACID transactions or keeping snapshot history. Over time, multiple lakehouse formats have emerged, each one with its own unique approach to managing its metadata (a.k.a. catalog). In this page, we will go over the support that DuckDB offers for some of these formats as well as some workarounds that you can use to still use DuckDB and get close to full interoperability with these formats.

#### DuckDB Lakehouse Support Matrix {#docs:current:lakehouse_formats::duckdb-lakehouse-support-matrix}

DuckDB supports Iceberg, Delta, Lance and DuckLake as first-class citizens. The following matrix represents what DuckDB natively supports out of the box through core extensions.

|                              | DuckLake                                                              | Iceberg                                                                 | Delta                                                      | Lance                                                      |
| ---------------------------- | :-------------------------------------------------------------------- | :---------------------------------------------------------------------- | :--------------------------------------------------------- | :--------------------------------------------------------- |
| Extension                    | [`ducklake`](https://ducklake.select/docs/stable/duckdb/introduction) | [`iceberg`](#docs:lts:core_extensions:iceberg:overview) | [`delta`](#docs:lts:core_extensions:delta) | [`lance`](#docs:lts:core_extensions:lance) |
| Read                         | ✅                                                                    | ✅                                                                      | ✅                                                         | ✅                                                         |
| Write                        | ✅                                                                    | ✅                                                                      | ✅                                                         | ✅                                                         |
| Deletes                      | ✅                                                                    | ✅                                                                      | ❌                                                         | ✅                                                         |
| Updates                      | ✅                                                                    | ✅                                                                      | ❌                                                         | ✅                                                         |
| Upserting                    | ✅                                                                    | ❌                                                                      | ❌                                                         | ✅                                                         |
| Create table                 | ✅                                                                    | ✅                                                                      | ❌                                                         | ✅                                                         |
| Create table with partitions | ✅                                                                    | ❌                                                                      | ❌                                                         | ❌                                                         |
| Attaching to a catalog       | ✅                                                                    | ✅                                                                      | ✅ \*                                                      | ✅                                                         |
| Rename table                 | ✅                                                                    | ❌                                                                      | ❌                                                         | ❌                                                         |
| Rename columns               | ✅                                                                    | ❌                                                                      | ❌                                                         | ✅                                                         |
| Add/drop columns             | ✅                                                                    | ❌                                                                      | ❌                                                         | ✅                                                         |
| Alter column type            | ✅                                                                    | ❌                                                                      | ❌                                                         | ✅                                                         |
| Compaction and maintenance   | ✅                                                                    | ❌                                                                      | ❌                                                         | ✅                                                         |
| Encryption                   | ✅                                                                    | ❌                                                                      | ❌                                                         | ❌                                                         |
| Manage table properties      | ✅                                                                    | ❌                                                                      | ❌                                                         | ❌                                                         |
| Time travel                  | ✅                                                                    | ✅                                                                      | ✅                                                         | ❌                                                         |
| Query table changes          | ✅                                                                    | ❌                                                                      | ❌                                                         | ❌                                                         |

\* Through the [`unity_catalog`](https://github.com/duckdb/unity_catalog) extension.

DuckDB aims to build native extensions with minimal dependencies. The `iceberg` extension for example, has no dependencies on third-party Iceberg libraries, which means all data and metadata operations are implemented natively in the DuckDB extension. For the `delta` extension, we use the [`delta-kernel-rs` project](https://github.com/delta-io/delta-kernel-rs), which is meant to be a lightweight platform for engines to build delta integrations that are as close to native as possible.

> **Why do native implementations matter?** Native implementations allow DuckDB to do more performance optimizations such as complex filter pushdowns (with file-level and row-group level pruning) and improve memory management.

# Client APIs {#clients}

## Client Overview {#docs:current:clients:overview}

DuckDB is an in-process database system and offers client APIs (“drivers”) for several languages.

| Client API                                                                      | Maintainer                                      | Support tier | Latest version                                                                                                                                                                                                                                                                                                                                                                                                                         |
| ------------------------------------------------------------------------------- | ----------------------------------------------- | ------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [C](#docs:current:clients:c:overview)                              | Core team                                       | Primary      | {% if site.current_duckdb_version != "" %}[1.5.2](https://duckdb.org/install/index.html?version=current&environment=c){% else %}[{{ site.lts_duckdb_version }}](https://duckdb.org/install/index.html?version=stable&environment=c){% endif %}                                                                                                                                                                             |
| [Command Line Interface (CLI)](#docs:current:clients:cli:overview) | Core team                                       | Primary      | {% if site.current_duckdb_version != "" %}[1.5.2](https://duckdb.org/install/index.html?version=current&environment=cli){% else %}[{{ site.lts_duckdb_version }}](https://duckdb.org/install/index.html?version=stable&environment=cli){% endif %}                                                                                                                                                                         |
| [Java (JDBC)](#docs:current:clients:java)                          | Core team                                       | Primary      | {% if site.current_duckdb_java_short_version != "" %}[{{ site.current_duckdb_java_short_version }}](https://duckdb.org/install/index.html?version=current&environment=java){% else %}[{{ site.lts_duckdb_java_short_version }}](https://duckdb.org/install/index.html?version=stable&environment=java){% endif %}                                                                                                                                      |
| [Go](#docs:current:clients:go)                                     | Core team                                       | Primary      | {% if site.current_duckdb_go_version != "" %}[{{ site.current_duckdb_go_version }}](https://duckdb.org/install/index.html?version=current&environment=go){% else %}[{{ site.lts_duckdb_go_version }}](https://duckdb.org/install/index.html?version=stable&environment=go){% endif %}                                                                                                                                                                  |
| [Node.js (node-neo)](#docs:current:clients:node_neo:overview)      | [Jeff Raymakers](https://github.com/jraymakers) | Primary      | {% if site.current_duckdb_node_neo_version != "" %}[{{ site.current_duckdb_node_neo_version }}](https://duckdb.org/install/index.html?version=current&environment=nodejs){% else %}[{{ site.lts_duckdb_node_neo_version }}](https://duckdb.org/install/index.html?version=stable&environment=nodejs){% endif %}                                                                                                                                        |
| [ODBC](#docs:current:clients:odbc:overview)                        | Core team                                       | Primary      | {% if site.current_duckdb_odbc_short_version != "" %}[{{ site.current_duckdb_odbc_short_version }}](https://duckdb.org/install/index.html?version=current&environment=odbc){% else %}[{{ site.lts_duckdb_odbc_short_version }}](https://duckdb.org/install/index.html?version=stable&environment=odbc){% endif %}                                                                                                                                      |
| [Python](#docs:current:clients:python:overview)                    | Core team                                       | Primary      | {% if site.current_duckdb_version != "" %}[1.5.2](https://duckdb.org/install/index.html?version=current&environment=python){% else %}[{{ site.lts_duckdb_version }}](https://duckdb.org/install/index.html?version=stable&environment=python){% endif %}                                                                                                                                                                   |
| [R](#docs:current:clients:r)                                       | [Kirill Müller](https://github.com/krlmlr)      | Primary      | {% if site.current_duckdb_r_version != "" %}[{{ site.current_duckdb_r_version }}](https://duckdb.org/install/index.html?version=current&environment=r){% else %}[{{ site.lts_duckdb_r_version }}](https://duckdb.org/install/index.html?version=stable&environment=r){% endif %}                                                                                                                                                                       |
| [Rust](#docs:current:clients:rust)                                 | Core team                                       | Primary      | {% if site.current_duckdb_rust_version != "" %}[{{ site.current_duckdb_rust_version }}](https://duckdb.org/install/index.html?version=current&environment=rust){% else %}[{{ site.lts_duckdb_rust_version }}](https://duckdb.org/install/index.html?version=stable&environment=rust){% endif %}                                                                                                                                                        |
| [WebAssembly (Wasm)](#docs:current:clients:wasm:overview)          | Core team                                       | Primary      | {% if site.current_duckdb_wasm_version != "" %}[{{ site.current_duckdb_wasm_version }}](https://github.com/duckdb/duckdb-wasm#readme){% else %}[{{ site.lts_duckdb_wasm_version }}](https://github.com/duckdb/duckdb-wasm#readme){% endif %}                                                                                                                                                                                           |
| [ADBC (Arrow)](#docs:current:clients:adbc)                         | Core team                                       | Secondary    | {% if site.current_duckdb_version != "" %}[1.5.2](#docs:current:clients:adbc){% else %}[{{ site.lts_duckdb_version }}](#docs:lts:clients:adbc){% endif %}                                                                                                                                                                                                                        |
| [C# (.NET)](https://duckdb.net/)                                                | [Giorgi](https://github.com/Giorgi)             | Secondary    | {% if site.current_duckdb_csharp_version != "" %}[{{ site.current_duckdb_csharp_version }}](https://www.nuget.org/packages?q=Tags%3A%22DuckDB%22+Author%3A%22Giorgi%22&includeComputedFrameworks=true&prerel=true&sortby=relevance){% else %}[{{ site.lts_duckdb_csharp_version }}](https://www.nuget.org/packages?q=Tags%3A%22DuckDB%22+Author%3A%22Giorgi%22&includeComputedFrameworks=true&prerel=true&sortby=relevance){% endif %} |
| [C++](#docs:current:clients:cpp)                                   | Core team                                       | Secondary    | {% if site.current_duckdb_version != "" %}[1.5.2](https://duckdb.org/install/index.html?version=current&environment=c){% else %}[{{ site.lts_duckdb_version }}](https://duckdb.org/install/index.html?version=stable&environment=c){% endif %}                                                                                                                                                                             |

The table above lists the DuckDB clients with the primary and secondary [support tiers](#::support-tiers).
For a list of tertiary clients, see the [“Tertiary Clients” page](#docs:current:clients:tertiary_clients:overview).

#### Support Tiers {#docs:current:clients:overview::support-tiers}

There are three tiers of support for clients.
Primary clients are the first to receive new features and are covered by [community support](https://duckdblabs.com/community_support_policy).
Secondary clients receive new features but are not covered by community support.
Finally, there are no feature or support guarantees for tertiary clients.

> The DuckDB clients listed above are open-source and we welcome community contributions to these libraries.
> All primary and secondary clients are available under the MIT license.
> For tertiary clients, please consult the repository for the license.

#### Compatibility {#docs:current:clients:overview::compatibility}

All DuckDB clients support the same DuckDB SQL syntax and use the same on-disk [database format](#docs:current:internals:storage).
[DuckDB extensions](#docs:current:extensions:overview) are also portable between clients with some exceptions (see [Wasm extensions](#docs:current:clients:wasm:extensions::list-of-officially-available-extensions)).

## ADBC Client {#docs:current:clients:adbc}

> Installation To use the DuckDB ADBC client, download the [`libduckdb` archive](https://duckdb.org/install/index.html?environment=c) for your platform and follow the [instructions below](#::installing-the-library).
>
> The latest stable version of the DuckDB ADBC client is 1.5.2.

[Arrow Database Connectivity (ADBC)](https://arrow.apache.org/adbc/), similarly to ODBC and JDBC, is a C-style API that enables code portability between different database systems. This allows developers to effortlessly build applications that communicate with database systems without using code specific to that system. The main difference between ADBC and ODBC/JDBC is that ADBC uses [Arrow](https://arrow.apache.org/) to transfer data between the database system and the application. DuckDB has an ADBC driver, which takes advantage of the [zero-copy integration between DuckDB and Arrow](https://duckdb.org/2021/12/03/duck-arrow) to efficiently transfer data.

Please refer to the [ADBC documentation page](https://arrow.apache.org/adbc/current/) for a more extensive discussion on ADBC and a detailed API explanation.

#### Implemented Functionality {#docs:current:clients:adbc::implemented-functionality}

The DuckDB-ADBC driver implements the full ADBC specification, with the exception of the `ConnectionReadPartition` and `StatementExecutePartitions` functions. Both of these functions exist to support systems that internally partition the query results, which does not apply to DuckDB.
In this section, we will describe the main functions that exist in ADBC, along with the arguments they take and provide examples for each function.

##### Database {#docs:current:clients:adbc::database}

Set of functions that operate on a database.

| Function name | Description | Arguments | Example |
|:---|:-|:---|:----|
| `DatabaseNew` | Allocate a new (but uninitialized) database. | `(AdbcDatabase *database, AdbcError *error)` | `AdbcDatabaseNew(&adbc_database, &adbc_error)` |
| `DatabaseSetOption` | Set a char* option. | `(AdbcDatabase *database, const char *key, const char *value, AdbcError *error)` | `AdbcDatabaseSetOption(&adbc_database, "path", "test.db", &adbc_error)` |
| `DatabaseInit` | Finish setting options and initialize the database. | `(AdbcDatabase *database, AdbcError *error)` | `AdbcDatabaseInit(&adbc_database, &adbc_error)` |
| `DatabaseRelease` | Destroy the database. | `(AdbcDatabase *database, AdbcError *error)` | `AdbcDatabaseRelease(&adbc_database, &adbc_error)` |

###### Database Options {#docs:current:clients:adbc::database-options}

| Option | Description |
|:-------|:------------|
| `driver` | Path to the DuckDB shared library (` libduckdb.so`, `libduckdb.dylib`, or `duckdb.dll`). |
| `entrypoint` | Entry point function name. Must be `duckdb_adbc_init`. |
| `path` | Path to a DuckDB database file. If not set, an in-memory database is created. |
| `uri` | Alternative to `path`. Accepts plain paths or `file:` URIs (e.g., `file:test.db`, `file:///absolute/path.db`). Takes precedence over `path` if both are set. |

##### Connection {#docs:current:clients:adbc::connection}

A set of functions that create and destroy a connection to interact with a database.

| Function name | Description | Arguments | Example |
|:---|:-|:---|:----|
| `ConnectionNew` | Allocate a new (but uninitialized) connection. | `(AdbcConnection*, AdbcError*)` | `AdbcConnectionNew(&adbc_connection, &adbc_error)` |
| `ConnectionSetOption` | Options may be set before ConnectionInit. | `(AdbcConnection*, const char*, const char*, AdbcError*)` | `AdbcConnectionSetOption(&adbc_connection, ADBC_CONNECTION_OPTION_AUTOCOMMIT, ADBC_OPTION_VALUE_DISABLED, &adbc_error)` |
| `ConnectionInit` | Finish setting options and initialize the connection. | `(AdbcConnection*, AdbcDatabase*, AdbcError*)` | `AdbcConnectionInit(&adbc_connection, &adbc_database, &adbc_error)` |
| `ConnectionRelease` | Destroy this connection. | `(AdbcConnection*, AdbcError*)` | `AdbcConnectionRelease(&adbc_connection, &adbc_error)` |

A set of functions that retrieve metadata about the database. In general, these functions will return Arrow objects, specifically an ArrowArrayStream.

| Function name | Description | Arguments | Example |
|:---|:-|:---|:----|
| `ConnectionGetInfo` | Get metadata about the driver and database. | `(AdbcConnection*, const uint32_t*, size_t, ArrowArrayStream*, AdbcError*)` | `AdbcConnectionGetInfo(&adbc_connection, NULL, 0, &arrow_stream, &adbc_error)` |
| `ConnectionGetObjects` | Get a hierarchical view of all catalogs, database schemas, tables and columns. | `(AdbcConnection*, int, const char*, const char*, const char*, const char**, const char*, ArrowArrayStream*, AdbcError*)` | `AdbcDatabaseInit(&adbc_database, &adbc_error)` |
| `ConnectionGetTableSchema` | Get the Arrow schema of a table. | `(AdbcConnection*, const char*, const char*, const char*, ArrowSchema*, AdbcError*)` | `AdbcDatabaseRelease(&adbc_database, &adbc_error)` |
| `ConnectionGetTableTypes` | Get a list of table types in the database. | `(AdbcConnection*, ArrowArrayStream*, AdbcError*)` | `AdbcDatabaseNew(&adbc_database, &adbc_error)` |

The `ConnectionGetInfo` function supports the following info codes:



| Info code | Constant | Description |
|----------:|:---------|:------------|
| 0 | `ADBC_INFO_VENDOR_NAME` | Database vendor name (` duckdb`). |
| 1 | `ADBC_INFO_VENDOR_VERSION` | Database version. |
| 100 | `ADBC_INFO_DRIVER_NAME` | Driver name. |
| 101 | `ADBC_INFO_DRIVER_VERSION` | Driver version. |
| 102 | `ADBC_INFO_DRIVER_ARROW_VERSION` | Arrow library version. |
| 103 | `ADBC_INFO_DRIVER_ADBC_VERSION` | ADBC specification version supported by the driver (as an integer, e.g., `1001000` for 1.1.0). |

A set of functions with transaction semantics for the connection. By default, all connections start with auto-commit mode on, but this can be turned off via the ConnectionSetOption function.

| Function name | Description | Arguments | Example |
|:---|:-|:---|:----|
| `ConnectionCommit` | Commit any pending transactions. | `(AdbcConnection*, AdbcError*)` | `AdbcConnectionCommit(&adbc_connection, &adbc_error)` |
| `ConnectionRollback` | Rollback any pending transactions. | `(AdbcConnection*, AdbcError*)` | `AdbcConnectionRollback(&adbc_connection, &adbc_error)` |

##### Statement {#docs:current:clients:adbc::statement}

Statements hold state related to query execution. They represent both one-off queries and prepared statements. They can be reused; however, doing so will invalidate prior result sets from that statement.

The functions used to create, destroy and set options for a statement:

| Function name | Description | Arguments | Example |
|:---|:-|:---|:----|
| `StatementNew` | Create a new statement for a given connection. | `(AdbcConnection*, AdbcStatement*, AdbcError*)` | `AdbcStatementNew(&adbc_connection, &adbc_statement, &adbc_error)` |
| `StatementRelease` | Destroy a statement. | `(AdbcStatement*, AdbcError*)` | `AdbcStatementRelease(&adbc_statement, &adbc_error)` |
| `StatementSetOption` | Set a string option on a statement. | `(AdbcStatement*, const char*, const char*, AdbcError*)` | `StatementSetOption(&adbc_statement, ADBC_INGEST_OPTION_TARGET_TABLE, "TABLE_NAME", &adbc_error)` |

Functions related to query execution:

| Function name | Description | Arguments | Example |
|:---|:-|:---|:----|
| `StatementSetSqlQuery` | Set the SQL query to execute. The query can then be executed with StatementExecuteQuery. | `(AdbcStatement*, const char*, AdbcError*)` | `AdbcStatementSetSqlQuery(&adbc_statement, "SELECT * FROM TABLE", &adbc_error)` |
| `StatementSetSubstraitPlan` | Set a substrait plan to execute. The query can then be executed with StatementExecuteQuery. | `(AdbcStatement*, const uint8_t*, size_t, AdbcError*)` | `AdbcStatementSetSubstraitPlan(&adbc_statement, substrait_plan, length, &adbc_error)` |
| `StatementExecuteQuery` | Execute a statement and get the results. | `(AdbcStatement*, ArrowArrayStream*, int64_t*, AdbcError*)` | `AdbcStatementExecuteQuery(&adbc_statement, &arrow_stream, &rows_affected, &adbc_error)` |
| `StatementPrepare` | Turn this statement into a prepared statement to be  executed multiple times. | `(AdbcStatement*, AdbcError*)` | `AdbcStatementPrepare(&adbc_statement, &adbc_error)` |

Functions related to binding, used for bulk insertion or in prepared statements.

| Function name | Description | Arguments | Example |
|:---|:-|:---|:----|
| `StatementBindStream` |  Bind Arrow Stream. This can be used for bulk inserts or prepared statements. | `(AdbcStatement*, ArrowArrayStream*, AdbcError*)` | `StatementBindStream(&adbc_statement, &input_data, &adbc_error)` |

###### Ingestion Modes {#docs:current:clients:adbc::ingestion-modes}

When ingesting data via `StatementBindStream`, the ingestion mode can be set using `ADBC_INGEST_OPTION_MODE`:

| Mode | Constant | Description |
|:-----|:---------|:------------|
| Create | `ADBC_INGEST_OPTION_MODE_CREATE` | Create a new table (error if the table already exists). This is the default. |
| Append | `ADBC_INGEST_OPTION_MODE_APPEND` | Append to an existing table (error if the table does not exist). |
| Replace | `ADBC_INGEST_OPTION_MODE_REPLACE` | Drop the existing table and create a new one. |
| Create or Append | `ADBC_INGEST_OPTION_MODE_CREATE_APPEND` | Create a new table if it does not exist, otherwise append to it. |

`StatementExecuteQuery` returns the number of rows affected through its `rows_affected` output parameter.

###### Ingestion Options {#docs:current:clients:adbc::ingestion-options}

| Option | Description |
|:-------|:------------|
| `adbc.ingest.target_table` | The target table name for ingestion. |
| `adbc.ingest.target_catalog` | The target catalog (attached database) for ingestion. When set without `adbc.ingest.target_db_schema`, the schema defaults to `main`. |
| `adbc.ingest.target_db_schema` | The target schema for ingestion. |
| `adbc.ingest.temporary` | Set to `enabled` to create a temporary table. Incompatible with `target_catalog` and `target_db_schema`. |

#### Setting Up the DuckDB ADBC Driver {#docs:current:clients:adbc::setting-up-the-duckdb-adbc-driver}

Before using DuckDB as an ADBC driver, you must install the `libduckdb` shared library on your system and make it available to your application. This library contains the core DuckDB engine that the ADBC driver interfaces with.

##### Downloading libduckdb {#docs:current:clients:adbc::downloading-libduckdb}

Download the appropriate `libduckdb` library for your platform from the [DuckDB releases page](https://github.com/duckdb/duckdb/releases):

- **Linux**: `libduckdb-linux-amd64.zip` (contains `libduckdb.so`)
- **macOS**: `libduckdb-osx-universal.zip` (contains `libduckdb.dylib`)
- **Windows**: `libduckdb-windows-amd64.zip` (contains `duckdb.dll`)

Extract the archive to obtain the shared library file.

##### Installing the Library {#docs:current:clients:adbc::installing-the-library}

###### Linux {#docs:current:clients:adbc::linux}

1. Extract the `libduckdb.so` file from the downloaded archive
2. Make sure your code can use the library. You can:

    - Either copy it to a system library directory (requires root access):

      ```bash
      sudo cp libduckdb.so /usr/local/lib/
      sudo ldconfig
      ```

    - Or place it in a custom directory and add that directory to your `LD_LIBRARY_PATH`:

      ```bash
      mkdir -p ~/lib
      cp libduckdb.so ~/lib/
      export LD_LIBRARY_PATH=~/lib:$LD_LIBRARY_PATH
      ```

###### macOS {#docs:current:clients:adbc::macos}

1. Extract the `libduckdb.dylib` file from the downloaded archive
2. Make sure your code can use the library. You can:

    - Either copy it to a system library directory:

      ```bash
      sudo cp libduckdb.dylib /usr/local/lib/
      ```

    - Or place it in a custom directory and add that directory to your `DYLD_LIBRARY_PATH`:

      ```bash
      mkdir -p ~/lib
      cp libduckdb.dylib ~/lib/
      export DYLD_LIBRARY_PATH=~/lib:$DYLD_LIBRARY_PATH
      ```

###### Windows {#docs:current:clients:adbc::windows}

1. Extract the `duckdb.dll` file from the downloaded archive
2. Place it in one of the following locations:
   - The same directory as your application executable
   - A directory listed in your `PATH` environment variable
   - The Windows system directory (e.g., `C:\Windows\System32`)


##### Understanding Library Paths {#docs:current:clients:adbc::understanding-library-paths}

The `LD_LIBRARY_PATH` (Linux) and `DYLD_LIBRARY_PATH` (macOS) are environment variables that tell the system where to look for shared libraries at runtime. When your application tries to load `libduckdb`, the system searches these paths to locate the library file.

##### Verifying Installation {#docs:current:clients:adbc::verifying-installation}

You can verify that the library is properly installed and accessible:

**Linux/macOS:**
```bash
ldd path/to/your/application  # Linux
otool -L path/to/your/application  # macOS
```

#### Examples {#docs:current:clients:adbc::examples}

Regardless of the programming language being used, there are two database options which will be required to utilize ADBC with DuckDB. The first one is the `driver`, which takes a path to the DuckDB library (see [Setting Up the DuckDB ADBC Driver](#::setting-up-the-duckdb-adbc-driver) above for installation instructions). The second option is the `entrypoint`, which is an exported function from the DuckDB-ADBC driver that initializes all the ADBC functions. Once we have configured these two options, we can optionally set the `path` option, providing a path on disk to store our DuckDB database. If not set, an in-memory database is created. After configuring all the necessary options, we can proceed to initialize our database. Below is how you can do so with various different language environments.

##### C++ {#docs:current:clients:adbc::c}

We begin our C++ example by declaring the essential variables for querying data through ADBC. These variables include Error, Database, Connection, Statement handling and an Arrow Stream to transfer data between DuckDB and the application.

```cpp
AdbcError adbc_error;
AdbcDatabase adbc_database;
AdbcConnection adbc_connection;
AdbcStatement adbc_statement;
ArrowArrayStream arrow_stream;
```

We can then initialize our database variable. Before initializing the database, we need to set the `driver` and `entrypoint` options as mentioned above. Then we set the `path` option and initialize the database. The `driver` option should point to your installed `libduckdb` library – see [Setting Up the DuckDB ADBC Driver](#::setting-up-the-duckdb-adbc-driver) for installation instructions.

```cpp
AdbcDatabaseNew(&adbc_database, &adbc_error);
AdbcDatabaseSetOption(&adbc_database, "driver", "path/to/libduckdb.dylib", &adbc_error);
AdbcDatabaseSetOption(&adbc_database, "entrypoint", "duckdb_adbc_init", &adbc_error);
// By default, we start an in-memory database, but you can optionally define a path to store it on disk.
AdbcDatabaseSetOption(&adbc_database, "path", "test.db", &adbc_error);
AdbcDatabaseInit(&adbc_database, &adbc_error);
```

After initializing the database, we must create and initialize a connection to it.

```cpp
AdbcConnectionNew(&adbc_connection, &adbc_error);
AdbcConnectionInit(&adbc_connection, &adbc_database, &adbc_error);
```

We can now initialize our statement and run queries through our connection. After the `AdbcStatementExecuteQuery` the `arrow_stream` is populated with the result.

```cpp
AdbcStatementNew(&adbc_connection, &adbc_statement, &adbc_error);
AdbcStatementSetSqlQuery(&adbc_statement, "SELECT 42", &adbc_error);
int64_t rows_affected;
AdbcStatementExecuteQuery(&adbc_statement, &arrow_stream, &rows_affected, &adbc_error);
arrow_stream.release(arrow_stream)
```

Besides running queries, we can also ingest data via `arrow_streams`. For this we need to set an option with the table name we want to insert to, bind the stream and then execute the query.

```cpp
StatementSetOption(&adbc_statement, ADBC_INGEST_OPTION_TARGET_TABLE, "AnswerToEverything", &adbc_error);
StatementBindStream(&adbc_statement, &arrow_stream, &adbc_error);
StatementExecuteQuery(&adbc_statement, nullptr, nullptr, &adbc_error);
```

##### Python {#docs:current:clients:adbc::python}

The first thing to do is to use `pip` and install the ADBC Driver manager. You will also need to install the `pyarrow` to directly access Apache Arrow formatted result sets (such as using `fetch_arrow_table`).

```bash
pip install adbc_driver_manager pyarrow
```

> For details on the `adbc_driver_manager` package, see the [`adbc_driver_manager` package documentation](https://arrow.apache.org/adbc/current/python/api/adbc_driver_manager.html).

As with C++, we need to provide initialization options consisting of the location of the libduckdb shared object and entrypoint function. Notice that the `path` argument for DuckDB is passed in through the `db_kwargs` dictionary.

```python
import adbc_driver_duckdb.dbapi

with adbc_driver_duckdb.dbapi.connect("test.db") as conn, conn.cursor() as cur:
    cur.execute("SELECT 42")
    # fetch a pyarrow table
    tbl = cur.fetch_arrow_table()
    print(tbl)
```

Alongside `fetch_arrow_table`, other methods from DBApi are also implemented on the cursor, such as `fetchone` and `fetchall`. Data can also be ingested via `arrow_streams`. We just need to set options on the statement to bind the stream of data and execute the query.

```python
import adbc_driver_duckdb.dbapi
import pyarrow

data = pyarrow.record_batch(
    [[1, 2, 3, 4], ["a", "b", "c", "d"]],
    names = ["ints", "strs"],
)

with adbc_driver_duckdb.dbapi.connect("test.db") as conn, conn.cursor() as cur:
    cur.adbc_ingest("AnswerToEverything", data)
```

##### Go {#docs:current:clients:adbc::go}

Make sure to install the `libduckdb` library first – see [Setting Up the DuckDB ADBC Driver](#::setting-up-the-duckdb-adbc-driver) for detailed installation instructions.

The following example uses an in-memory DuckDB database to modify in-memory Arrow RecordBatches via SQL queries:

{% raw %}
```go
package main

import (
    "bytes"
    "context"
    "fmt"
    "io"

    "github.com/apache/arrow-adbc/go/adbc"
    "github.com/apache/arrow-adbc/go/adbc/drivermgr"
    "github.com/apache/arrow-go/v18/arrow"
    "github.com/apache/arrow-go/v18/arrow/array"
    "github.com/apache/arrow-go/v18/arrow/ipc"
    "github.com/apache/arrow-go/v18/arrow/memory"
)

func _makeSampleArrowRecord() arrow.Record {
    b := array.NewFloat64Builder(memory.DefaultAllocator)
    b.AppendValues([]float64{1, 2, 3}, nil)
    col := b.NewArray()

    defer col.Release()
    defer b.Release()

    schema := arrow.NewSchema([]arrow.Field{{Name: "column1", Type: arrow.PrimitiveTypes.Float64}}, nil)
    return array.NewRecord(schema, []arrow.Array{col}, int64(col.Len()))
}

type DuckDBSQLRunner struct {
    ctx  context.Context
    conn adbc.Connection
    db   adbc.Database
}

func NewDuckDBSQLRunner(ctx context.Context) (*DuckDBSQLRunner, error) {
    var drv drivermgr.Driver
    db, err := drv.NewDatabase(map[string]string{
        "driver":     "duckdb",
        "entrypoint": "duckdb_adbc_init",
        "path":       ":memory:",
    })
    if err != nil {
        return nil, fmt.Errorf("failed to create new in-memory DuckDB database: %w", err)
    }
    conn, err := db.Open(ctx)
    if err != nil {
        return nil, fmt.Errorf("failed to open connection to new in-memory DuckDB database: %w", err)
    }
    return &DuckDBSQLRunner{ctx: ctx, conn: conn, db: db}, nil
}

func serializeRecord(record arrow.Record) (io.Reader, error) {
    buf := new(bytes.Buffer)
    wr := ipc.NewWriter(buf, ipc.WithSchema(record.Schema()))
    if err := wr.Write(record); err != nil {
        return nil, fmt.Errorf("failed to write record: %w", err)
    }
    if err := wr.Close(); err != nil {
        return nil, fmt.Errorf("failed to close writer: %w", err)
    }
    return buf, nil
}

func (r *DuckDBSQLRunner) importRecord(sr io.Reader) error {
    rdr, err := ipc.NewReader(sr)
    if err != nil {
        return fmt.Errorf("failed to create IPC reader: %w", err)
    }
    defer rdr.Release()

    _, err = adbc.IngestStream(r.ctx, r.conn, rdr, "temp_table", adbc.OptionValueIngestModeCreate, adbc.IngestStreamOptions{})

    return err
}

func (r *DuckDBSQLRunner) runSQL(sql string) ([]arrow.Record, error) {
    stmt, err := r.conn.NewStatement()
    if err != nil {
        return nil, fmt.Errorf("failed to create new statement: %w", err)
    }
    defer stmt.Close()

    if err := stmt.SetSqlQuery(sql); err != nil {
        return nil, fmt.Errorf("failed to set SQL query: %w", err)
    }
    out, n, err := stmt.ExecuteQuery(r.ctx)
    if err != nil {
        return nil, fmt.Errorf("failed to execute query: %w", err)
    }
    defer out.Release()

    result := make([]arrow.Record, 0, n)
    for out.Next() {
        rec := out.Record()
        rec.Retain() // .Next() will release the record, so we need to retain it
        result = append(result, rec)
    }
    if out.Err() != nil {
        return nil, out.Err()
    }
    return result, nil
}

func (r *DuckDBSQLRunner) RunSQLOnRecord(record arrow.Record, sql string) ([]arrow.Record, error) {
    serializedRecord, err := serializeRecord(record)
    if err != nil {
        return nil, fmt.Errorf("failed to serialize record: %w", err)
    }
    if err := r.importRecord(serializedRecord); err != nil {
        return nil, fmt.Errorf("failed to import record: %w", err)
    }
    result, err := r.runSQL(sql)
    if err != nil {
        return nil, fmt.Errorf("failed to run SQL: %w", err)
    }

    if _, err := r.runSQL("DROP TABLE temp_table"); err != nil {
        return nil, fmt.Errorf("failed to drop temp table after running query: %w", err)
    }
    return result, nil
}

func (r *DuckDBSQLRunner) Close() {
    r.conn.Close()
    r.db.Close()
}

func main() {
    rec := _makeSampleArrowRecord()
    fmt.Println(rec)

    runner, err := NewDuckDBSQLRunner(context.Background())
    if err != nil {
        panic(err)
    }
    defer runner.Close()

    resultRecords, err := runner.RunSQLOnRecord(rec, "SELECT column1+1 FROM temp_table")
    if err != nil {
        panic(err)
    }

    for _, resultRecord := range resultRecords {
        fmt.Println(resultRecord)
        resultRecord.Release()
    }
}
```
{% endraw %}

Running it produces the following output:

```go
record:
  schema:
  fields: 1
    - column1: type=float64
  rows: 3
  col[0][column1]: [1 2 3]

record:
  schema:
  fields: 1
    - (column1 + 1): type=float64, nullable
  rows: 3
  col[0][(column1 + 1)]: [2 3 4]
```

## C {#clients:c}

### Overview {#docs:current:clients:c:overview}

> Installation To use the DuckDB C API, download the [`libduckdb` archive](https://duckdb.org/install/index.html?environment=c) for your platform.
>
> The latest stable version of the DuckDB C API is 1.5.2.

DuckDB implements a custom C API modeled somewhat following the SQLite C API. The API is contained in the `duckdb.h` header. Continue to [Startup & Shutdown](#docs:current:clients:c:connect) to get started, or check out the [Full API overview](#docs:current:clients:c:api).

We also provide a SQLite API wrapper which means that if your application is programmed against the SQLite C API, you can re-link to DuckDB and it should continue working. See the [`shell_helpers.cpp`](https://github.com/duckdb/duckdb/tree/main/tools/shell/shell_helpers.cpp) file in our source repository for more information.

#### Installation {#docs:current:clients:c:overview::installation}

The DuckDB C API can be installed as part of the `libduckdb` packages. Please see the [installation page](https://duckdb.org/install) for details.

### Startup & Shutdown {#docs:current:clients:c:connect}



To use DuckDB, you must first initialize a `duckdb_database` handle using `duckdb_open()`. `duckdb_open()` takes as parameter the database file to read and write from. The special value `NULL` (` nullptr`) can be used to create an **in-memory database**. Note that for an in-memory database no data is persisted to disk (i.e., all data is lost when you exit the process).

With the `duckdb_database` handle, you can create one or many `duckdb_connection` using `duckdb_connect()`. While individual connections are thread-safe, they will be locked during querying. It is therefore recommended that each thread uses its own connection to allow for the best parallel performance.

All `duckdb_connection`s have to explicitly be disconnected with `duckdb_disconnect()` and the `duckdb_database` has to be explicitly closed with `duckdb_close()` to avoid memory and file handle leaking.

#### Example {#docs:current:clients:c:connect::example}

```c
duckdb_database db;
duckdb_connection con;

if (duckdb_open(NULL, &db) == DuckDBError) {
    // handle error
}
if (duckdb_connect(db, &con) == DuckDBError) {
    // handle error
}

// run queries...

// cleanup
duckdb_disconnect(&con);
duckdb_close(&db);
```

#### API Reference Overview {#docs:current:clients:c:connect::api-reference-overview}



```c
            duckdb_instance_cache duckdb_create_instance_cache();
duckdb_state duckdb_get_or_create_from_cache(duckdb_instance_cache instance_cache, const char *path, duckdb_database *out_database, duckdb_config config, char **out_error);
void duckdb_destroy_instance_cache(duckdb_instance_cache *instance_cache);
duckdb_state duckdb_open(const char *path, duckdb_database *out_database);
duckdb_state duckdb_open_ext(const char *path, duckdb_database *out_database, duckdb_config config, char **out_error);
void duckdb_close(duckdb_database *database);
duckdb_state duckdb_connect(duckdb_database database, duckdb_connection *out_connection);
void duckdb_interrupt(duckdb_connection connection);
duckdb_query_progress_type duckdb_query_progress(duckdb_connection connection);
void duckdb_disconnect(duckdb_connection *connection);
void duckdb_connection_get_client_context(duckdb_connection connection, duckdb_client_context *out_context);
void duckdb_connection_get_arrow_options(duckdb_connection connection, duckdb_arrow_options *out_arrow_options);
idx_t duckdb_client_context_get_connection_id(duckdb_client_context context);
void duckdb_destroy_client_context(duckdb_client_context *context);
void duckdb_destroy_arrow_options(duckdb_arrow_options *arrow_options);
const char *duckdb_library_version();
duckdb_value duckdb_get_table_names(duckdb_connection connection, const char *query, bool qualified);
```


###### `duckdb_create_instance_cache` {#docs:current:clients:c:connect::duckdb_create_instance_cache}

Creates a new database instance cache.
The instance cache is necessary if a client/program (re)opens multiple databases to the same file within the same
process. Must be destroyed with 'duckdb_destroy_instance_cache'.


####### Return Value {#docs:current:clients:c:connect::return-value}

The database instance cache.

####### Syntax {#docs:current:clients:c:connect::syntax}

```c
            duckdb_instance_cache duckdb_create_instance_cache(

);
```

<br>

###### `duckdb_get_or_create_from_cache` {#docs:current:clients:c:connect::duckdb_get_or_create_from_cache}

Creates a new database instance in the instance cache, or retrieves an existing database instance.
Must be closed with 'duckdb_close'.

####### Syntax {#docs:current:clients:c:connect::syntax}

```c
            duckdb_state duckdb_get_or_create_from_cache(
  duckdb_instance_cache instance_cache,
  const char *path,
  duckdb_database *out_database,
  duckdb_config config,
  char **out_error
);
```


####### Parameters {#docs:current:clients:c:connect::parameters}

* `instance_cache`: The instance cache in which to create the database, or from which to take the database.
* `path`: Path to the database file on disk. Both `nullptr` and `:memory:` open or retrieve an in-memory database.
* `out_database`: The resulting cached database.
* `config`: (Optional) configuration used to create the database.
* `out_error`: If set and the function returns `DuckDBError`, this contains the error message.
Note that the error message must be freed using `duckdb_free`.

####### Return Value {#docs:current:clients:c:connect::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_destroy_instance_cache` {#docs:current:clients:c:connect::duckdb_destroy_instance_cache}

Destroys an existing database instance cache and de-allocates its memory.

####### Syntax {#docs:current:clients:c:connect::syntax}

```c
            void duckdb_destroy_instance_cache(
  duckdb_instance_cache *instance_cache
);
```


####### Parameters {#docs:current:clients:c:connect::parameters}

* `instance_cache`: The instance cache to destroy.

<br>

###### `duckdb_open` {#docs:current:clients:c:connect::duckdb_open}

Creates a new database or opens an existing database file stored at the given path.
If no path is given a new in-memory database is created instead.
The database must be closed with 'duckdb_close'.

####### Syntax {#docs:current:clients:c:connect::syntax}

```c
            duckdb_state duckdb_open(
  const char *path,
  duckdb_database *out_database
);
```


####### Parameters {#docs:current:clients:c:connect::parameters}

* `path`: Path to the database file on disk. Both `nullptr` and `:memory:` open an in-memory database.
* `out_database`: The result database object.

####### Return Value {#docs:current:clients:c:connect::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_open_ext` {#docs:current:clients:c:connect::duckdb_open_ext}

Extended version of duckdb_open. Creates a new database or opens an existing database file stored at the given path.
The database must be closed with 'duckdb_close'.

####### Syntax {#docs:current:clients:c:connect::syntax}

```c
            duckdb_state duckdb_open_ext(
  const char *path,
  duckdb_database *out_database,
  duckdb_config config,
  char **out_error
);
```


####### Parameters {#docs:current:clients:c:connect::parameters}

* `path`: Path to the database file on disk. Both `nullptr` and `:memory:` open an in-memory database.
* `out_database`: The result database object.
* `config`: (Optional) configuration used to start up the database.
* `out_error`: If set and the function returns `DuckDBError`, this contains the error message.
Note that the error message must be freed using `duckdb_free`.

####### Return Value {#docs:current:clients:c:connect::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_close` {#docs:current:clients:c:connect::duckdb_close}

Closes the specified database and de-allocates all memory allocated for that database.
This should be called after you are done with any database allocated through `duckdb_open` or `duckdb_open_ext`.
Note that failing to call `duckdb_close` (in case of e.g., a program crash) will not cause data corruption.
Still, it is recommended to always correctly close a database object after you are done with it.

####### Syntax {#docs:current:clients:c:connect::syntax}

```c
            void duckdb_close(
  duckdb_database *database
);
```


####### Parameters {#docs:current:clients:c:connect::parameters}

* `database`: The database object to shut down.

<br>

###### `duckdb_connect` {#docs:current:clients:c:connect::duckdb_connect}

Opens a connection to a database. Connections are required to query the database and store transactional state
associated with the connection.
The instantiated connection should be closed using 'duckdb_disconnect'.

####### Syntax {#docs:current:clients:c:connect::syntax}

```c
            duckdb_state duckdb_connect(
  duckdb_database database,
  duckdb_connection *out_connection
);
```


####### Parameters {#docs:current:clients:c:connect::parameters}

* `database`: The database file to connect to.
* `out_connection`: The result connection object.

####### Return Value {#docs:current:clients:c:connect::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_interrupt` {#docs:current:clients:c:connect::duckdb_interrupt}

Interrupt running query

####### Syntax {#docs:current:clients:c:connect::syntax}

```c
            void duckdb_interrupt(
  duckdb_connection connection
);
```


####### Parameters {#docs:current:clients:c:connect::parameters}

* `connection`: The connection to interrupt

<br>

###### `duckdb_query_progress` {#docs:current:clients:c:connect::duckdb_query_progress}

Get progress of the running query

####### Syntax {#docs:current:clients:c:connect::syntax}

```c
            duckdb_query_progress_type duckdb_query_progress(
  duckdb_connection connection
);
```


####### Parameters {#docs:current:clients:c:connect::parameters}

* `connection`: The working connection

####### Return Value {#docs:current:clients:c:connect::return-value}

-1 if no progress or a percentage of the progress

<br>

###### `duckdb_disconnect` {#docs:current:clients:c:connect::duckdb_disconnect}

Closes the specified connection and de-allocates all memory allocated for that connection.

####### Syntax {#docs:current:clients:c:connect::syntax}

```c
            void duckdb_disconnect(
  duckdb_connection *connection
);
```


####### Parameters {#docs:current:clients:c:connect::parameters}

* `connection`: The connection to close.

<br>

###### `duckdb_connection_get_client_context` {#docs:current:clients:c:connect::duckdb_connection_get_client_context}

Retrieves the client context of the connection.

####### Syntax {#docs:current:clients:c:connect::syntax}

```c
            void duckdb_connection_get_client_context(
  duckdb_connection connection,
  duckdb_client_context *out_context
);
```


####### Parameters {#docs:current:clients:c:connect::parameters}

* `connection`: The connection.
* `out_context`: The client context of the connection. Must be destroyed with `duckdb_destroy_client_context`.

<br>

###### `duckdb_connection_get_arrow_options` {#docs:current:clients:c:connect::duckdb_connection_get_arrow_options}

Retrieves the arrow options of the connection.

####### Syntax {#docs:current:clients:c:connect::syntax}

```c
            void duckdb_connection_get_arrow_options(
  duckdb_connection connection,
  duckdb_arrow_options *out_arrow_options
);
```


####### Parameters {#docs:current:clients:c:connect::parameters}

* `connection`: The connection.

<br>

###### `duckdb_client_context_get_connection_id` {#docs:current:clients:c:connect::duckdb_client_context_get_connection_id}

Returns the connection id of the client context.

####### Syntax {#docs:current:clients:c:connect::syntax}

```c
            idx_t duckdb_client_context_get_connection_id(
  duckdb_client_context context
);
```


####### Parameters {#docs:current:clients:c:connect::parameters}

* `context`: The client context.

####### Return Value {#docs:current:clients:c:connect::return-value}

The connection id of the client context.

<br>

###### `duckdb_destroy_client_context` {#docs:current:clients:c:connect::duckdb_destroy_client_context}

Destroys the client context and deallocates its memory.

####### Syntax {#docs:current:clients:c:connect::syntax}

```c
            void duckdb_destroy_client_context(
  duckdb_client_context *context
);
```


####### Parameters {#docs:current:clients:c:connect::parameters}

* `context`: The client context to destroy.

<br>

###### `duckdb_destroy_arrow_options` {#docs:current:clients:c:connect::duckdb_destroy_arrow_options}

Destroys the arrow options and deallocates its memory.

####### Syntax {#docs:current:clients:c:connect::syntax}

```c
            void duckdb_destroy_arrow_options(
  duckdb_arrow_options *arrow_options
);
```


####### Parameters {#docs:current:clients:c:connect::parameters}

* `arrow_options`: The arrow options to destroy.

<br>

###### `duckdb_library_version` {#docs:current:clients:c:connect::duckdb_library_version}

Returns the version of the linked DuckDB, with a version postfix for dev versions

Usually used for developing C extensions that must return this for a compatibility check.

####### Syntax {#docs:current:clients:c:connect::syntax}

```c
            const char *duckdb_library_version(

);
```

<br>

###### `duckdb_get_table_names` {#docs:current:clients:c:connect::duckdb_get_table_names}

Get the list of (fully qualified) table names of the query.

####### Syntax {#docs:current:clients:c:connect::syntax}

```c
            duckdb_value duckdb_get_table_names(
  duckdb_connection connection,
  const char *query,
  bool qualified
);
```


####### Parameters {#docs:current:clients:c:connect::parameters}

* `connection`: The connection for which to get the table names.
* `query`: The query for which to get the table names.
* `qualified`: Returns fully qualified table names (catalog.schema.table), if set to true, else only the (not
escaped) table names.

####### Return Value {#docs:current:clients:c:connect::return-value}

A duckdb_value of type VARCHAR[] containing the (fully qualified) table names of the query. Must be destroyed
with duckdb_destroy_value.

<br>

### Configuration {#docs:current:clients:c:config}



Configuration options can be provided to change different settings of the database system. Note that many of these
settings can be changed later on using [`PRAGMA` statements](#..:..:configuration:pragmas) as well. The configuration object
should be created, filled with values and passed to `duckdb_open_ext`.

#### Example {#docs:current:clients:c:config::example}

```c
duckdb_database db;
duckdb_config config;

// create the configuration object
if (duckdb_create_config(&config) == DuckDBError) {
    // handle error
}
// set some configuration options
duckdb_set_config(config, "access_mode", "READ_WRITE"); // or READ_ONLY
duckdb_set_config(config, "threads", "8");
duckdb_set_config(config, "max_memory", "8GB");
duckdb_set_config(config, "default_order", "DESC");

// open the database using the configuration
if (duckdb_open_ext(NULL, &db, config, NULL) == DuckDBError) {
    // handle error
}
// cleanup the configuration object
duckdb_destroy_config(&config);

// run queries...

// cleanup
duckdb_close(&db);
```

#### API Reference Overview {#docs:current:clients:c:config::api-reference-overview}



```c
            duckdb_state duckdb_create_config(duckdb_config *out_config);
size_t duckdb_config_count();
duckdb_state duckdb_get_config_flag(size_t index, const char **out_name, const char **out_description);
duckdb_state duckdb_set_config(duckdb_config config, const char *name, const char *option);
void duckdb_destroy_config(duckdb_config *config);
```


###### `duckdb_create_config` {#docs:current:clients:c:config::duckdb_create_config}

Initializes an empty configuration object that can be used to provide start-up options for the DuckDB instance
through `duckdb_open_ext`.
The duckdb_config must be destroyed using 'duckdb_destroy_config'

This will always succeed unless there is a malloc failure.

Note that `duckdb_destroy_config` should always be called on the resulting config, even if the function returns
`DuckDBError`.

####### Syntax {#docs:current:clients:c:config::syntax}

```c
            duckdb_state duckdb_create_config(
  duckdb_config *out_config
);
```


####### Parameters {#docs:current:clients:c:config::parameters}

* `out_config`: The result configuration object.

####### Return Value {#docs:current:clients:c:config::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_config_count` {#docs:current:clients:c:config::duckdb_config_count}

This returns the total amount of configuration options available for usage with `duckdb_get_config_flag`.

This should not be called in a loop as it internally loops over all the options.


####### Return Value {#docs:current:clients:c:config::return-value}

The amount of config options available.

####### Syntax {#docs:current:clients:c:config::syntax}

```c
            size_t duckdb_config_count(

);
```

<br>

###### `duckdb_get_config_flag` {#docs:current:clients:c:config::duckdb_get_config_flag}

Obtains a human-readable name and description of a specific configuration option. This can be used to e.g.
display configuration options. This will succeed unless `index` is out of range (i.e., `>= duckdb_config_count`).

The result name or description MUST NOT be freed.

####### Syntax {#docs:current:clients:c:config::syntax}

```c
            duckdb_state duckdb_get_config_flag(
  size_t index,
  const char **out_name,
  const char **out_description
);
```


####### Parameters {#docs:current:clients:c:config::parameters}

* `index`: The index of the configuration option (between 0 and `duckdb_config_count`)
* `out_name`: A name of the configuration flag.
* `out_description`: A description of the configuration flag.

####### Return Value {#docs:current:clients:c:config::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_set_config` {#docs:current:clients:c:config::duckdb_set_config}

Sets the specified option for the specified configuration. The configuration option is indicated by name.
To obtain a list of config options, see `duckdb_get_config_flag`.

In the source code, configuration options are defined in `config.cpp`.

This can fail if either the name is invalid, or if the value provided for the option is invalid.

####### Syntax {#docs:current:clients:c:config::syntax}

```c
            duckdb_state duckdb_set_config(
  duckdb_config config,
  const char *name,
  const char *option
);
```


####### Parameters {#docs:current:clients:c:config::parameters}

* `config`: The configuration object to set the option on.
* `name`: The name of the configuration flag to set.
* `option`: The value to set the configuration flag to.

####### Return Value {#docs:current:clients:c:config::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_destroy_config` {#docs:current:clients:c:config::duckdb_destroy_config}

Destroys the specified configuration object and de-allocates all memory allocated for the object.

####### Syntax {#docs:current:clients:c:config::syntax}

```c
            void duckdb_destroy_config(
  duckdb_config *config
);
```


####### Parameters {#docs:current:clients:c:config::parameters}

* `config`: The configuration object to destroy.

<br>

### Query {#docs:current:clients:c:query}



The `duckdb_query` method allows SQL queries to be run in DuckDB from C. This method takes two parameters, a (null-terminated) SQL query string and a `duckdb_result` result pointer. The result pointer may be `NULL` if the application is not interested in the result set or if the query produces no result. After the result is consumed, the `duckdb_destroy_result` method should be used to clean up the result.

Elements can be extracted from the `duckdb_result` object using a variety of methods. The `duckdb_column_count` can be used to extract the number of columns. `duckdb_column_name` and `duckdb_column_type` can be used to extract the names and types of individual columns.

#### Example {#docs:current:clients:c:query::example}

```c
duckdb_state state;
duckdb_result result;

// create a table
state = duckdb_query(con, "CREATE TABLE integers (i INTEGER, j INTEGER);", NULL);
if (state == DuckDBError) {
    // handle error
}
// insert three rows into the table
state = duckdb_query(con, "INSERT INTO integers VALUES (3, 4), (5, 6), (7, NULL);", NULL);
if (state == DuckDBError) {
    // handle error
}
// query rows again
state = duckdb_query(con, "SELECT * FROM integers", &result);
if (state == DuckDBError) {
    // handle error
}
// handle the result
// ...

// destroy the result after we are done with it
duckdb_destroy_result(&result);
```

#### Value Extraction {#docs:current:clients:c:query::value-extraction}

Values can be extracted using either the `duckdb_fetch_chunk` function, or using the `duckdb_value` convenience functions. The `duckdb_fetch_chunk` function directly hands you data chunks in DuckDB's native array format and can therefore be very fast. The `duckdb_value` functions perform bounds- and type-checking, and will automatically cast values to the desired type. This makes them more convenient and easier to use, at the expense of being slower.

See the [Types](#docs:current:clients:c:types) page for more information.

> For optimal performance, use `duckdb_fetch_chunk` to extract data from the query result.
> The `duckdb_value` functions perform internal type-checking, bounds-checking and casting which makes them slower.

##### `duckdb_fetch_chunk` {#docs:current:clients:c:query::duckdb_fetch_chunk}

Below is an end-to-end example that prints the above result to CSV format using the `duckdb_fetch_chunk` function.
Note that the function is NOT generic: we do need to know exactly what the types of the result columns are.

```c
duckdb_database db;
duckdb_connection con;
duckdb_open(nullptr, &db);
duckdb_connect(db, &con);

duckdb_result res;
duckdb_query(con, "CREATE TABLE integers (i INTEGER, j INTEGER);", NULL);
duckdb_query(con, "INSERT INTO integers VALUES (3, 4), (5, 6), (7, NULL);", NULL);
duckdb_query(con, "SELECT * FROM integers;", &res);

// iterate until result is exhausted
while (true) {
    duckdb_data_chunk result = duckdb_fetch_chunk(res);
    if (!result) {
        // result is exhausted
        break;
    }
    // get the number of rows from the data chunk
    idx_t row_count = duckdb_data_chunk_get_size(result);
    // get the first column
    duckdb_vector col1 = duckdb_data_chunk_get_vector(result, 0);
    int32_t *col1_data = (int32_t *) duckdb_vector_get_data(col1);
    uint64_t *col1_validity = duckdb_vector_get_validity(col1);

    // get the second column
    duckdb_vector col2 = duckdb_data_chunk_get_vector(result, 1);
    int32_t *col2_data = (int32_t *) duckdb_vector_get_data(col2);
    uint64_t *col2_validity = duckdb_vector_get_validity(col2);

    // iterate over the rows
    for (idx_t row = 0; row < row_count; row++) {
        if (duckdb_validity_row_is_valid(col1_validity, row)) {
            printf("%d", col1_data[row]);
        } else {
            printf("NULL");
        }
        printf(",");
        if (duckdb_validity_row_is_valid(col2_validity, row)) {
            printf("%d", col2_data[row]);
        } else {
            printf("NULL");
        }
        printf("\n");
    }
    duckdb_destroy_data_chunk(&result);
}
// clean-up
duckdb_destroy_result(&res);
duckdb_disconnect(&con);
duckdb_close(&db);
```

This prints the following result:

```csv
3,4
5,6
7,NULL
```

##### `duckdb_value` {#docs:current:clients:c:query::duckdb_value}

> **Deprecated.** The `duckdb_value` functions are deprecated and are scheduled for removal in a future release.

Below is an example that prints the above result to CSV format using the `duckdb_value_varchar` function.
Note that the function is generic: we do not need to know about the types of the individual result columns.

```c
// print the above result to CSV format using `duckdb_value_varchar`
idx_t row_count = duckdb_row_count(&result);
idx_t column_count = duckdb_column_count(&result);
for (idx_t row = 0; row < row_count; row++) {
    for (idx_t col = 0; col < column_count; col++) {
        if (col > 0) printf(",");
        auto str_val = duckdb_value_varchar(&result, col, row);
        printf("%s", str_val);
        duckdb_free(str_val);
   }
   printf("\n");
}
```

#### API Reference Overview {#docs:current:clients:c:query::api-reference-overview}



```c
            duckdb_state duckdb_query(duckdb_connection connection, const char *query, duckdb_result *out_result);
void duckdb_destroy_result(duckdb_result *result);
const char *duckdb_column_name(duckdb_result *result, idx_t col);
duckdb_type duckdb_column_type(duckdb_result *result, idx_t col);
duckdb_statement_type duckdb_result_statement_type(duckdb_result result);
duckdb_logical_type duckdb_column_logical_type(duckdb_result *result, idx_t col);
duckdb_arrow_options duckdb_result_get_arrow_options(duckdb_result *result);
idx_t duckdb_column_count(duckdb_result *result);
idx_t duckdb_row_count(duckdb_result *result);
idx_t duckdb_rows_changed(duckdb_result *result);
void *duckdb_column_data(duckdb_result *result, idx_t col);
bool *duckdb_nullmask_data(duckdb_result *result, idx_t col);
const char *duckdb_result_error(duckdb_result *result);
duckdb_error_type duckdb_result_error_type(duckdb_result *result);
```


###### `duckdb_query` {#docs:current:clients:c:query::duckdb_query}

Executes a SQL query within a connection and stores the full (materialized) result in the out_result pointer.
If the query fails to execute, DuckDBError is returned and the error message can be retrieved by calling
`duckdb_result_error`.

Note that after running `duckdb_query`, `duckdb_destroy_result` must be called on the result object even if the
query fails, otherwise the error stored within the result will not be freed correctly.

####### Syntax {#docs:current:clients:c:query::syntax}

```c
            duckdb_state duckdb_query(
  duckdb_connection connection,
  const char *query,
  duckdb_result *out_result
);
```


####### Parameters {#docs:current:clients:c:query::parameters}

* `connection`: The connection to perform the query in.
* `query`: The SQL query to run.
* `out_result`: The query result.

####### Return Value {#docs:current:clients:c:query::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_destroy_result` {#docs:current:clients:c:query::duckdb_destroy_result}

Closes the result and de-allocates all memory allocated for that result.

####### Syntax {#docs:current:clients:c:query::syntax}

```c
            void duckdb_destroy_result(
  duckdb_result *result
);
```


####### Parameters {#docs:current:clients:c:query::parameters}

* `result`: The result to destroy.

<br>

###### `duckdb_column_name` {#docs:current:clients:c:query::duckdb_column_name}

Returns the column name of the specified column. The result should not need to be freed; the column names will
automatically be destroyed when the result is destroyed.

Returns `NULL` if the column is out of range.

####### Syntax {#docs:current:clients:c:query::syntax}

```c
            const char *duckdb_column_name(
  duckdb_result *result,
  idx_t col
);
```


####### Parameters {#docs:current:clients:c:query::parameters}

* `result`: The result object to fetch the column name from.
* `col`: The column index.

####### Return Value {#docs:current:clients:c:query::return-value}

The column name of the specified column.

<br>

###### `duckdb_column_type` {#docs:current:clients:c:query::duckdb_column_type}

Returns the column type of the specified column.

Returns `DUCKDB_TYPE_INVALID` if the column is out of range.

####### Syntax {#docs:current:clients:c:query::syntax}

```c
            duckdb_type duckdb_column_type(
  duckdb_result *result,
  idx_t col
);
```


####### Parameters {#docs:current:clients:c:query::parameters}

* `result`: The result object to fetch the column type from.
* `col`: The column index.

####### Return Value {#docs:current:clients:c:query::return-value}

The column type of the specified column.

<br>

###### `duckdb_result_statement_type` {#docs:current:clients:c:query::duckdb_result_statement_type}

Returns the statement type of the statement that was executed

####### Syntax {#docs:current:clients:c:query::syntax}

```c
            duckdb_statement_type duckdb_result_statement_type(
  duckdb_result result
);
```


####### Parameters {#docs:current:clients:c:query::parameters}

* `result`: The result object to fetch the statement type from.

####### Return Value {#docs:current:clients:c:query::return-value}

duckdb_statement_type value or DUCKDB_STATEMENT_TYPE_INVALID

<br>

###### `duckdb_column_logical_type` {#docs:current:clients:c:query::duckdb_column_logical_type}

Returns the logical column type of the specified column.

The return type of this call should be destroyed with `duckdb_destroy_logical_type`.

Returns `NULL` if the column is out of range.

####### Syntax {#docs:current:clients:c:query::syntax}

```c
            duckdb_logical_type duckdb_column_logical_type(
  duckdb_result *result,
  idx_t col
);
```


####### Parameters {#docs:current:clients:c:query::parameters}

* `result`: The result object to fetch the column type from.
* `col`: The column index.

####### Return Value {#docs:current:clients:c:query::return-value}

The logical column type of the specified column.

<br>

###### `duckdb_result_get_arrow_options` {#docs:current:clients:c:query::duckdb_result_get_arrow_options}

Returns the arrow options associated with the given result. These options are definitions of how the arrow arrays/schema
should be produced.

####### Syntax {#docs:current:clients:c:query::syntax}

```c
            duckdb_arrow_options duckdb_result_get_arrow_options(
  duckdb_result *result
);
```


####### Parameters {#docs:current:clients:c:query::parameters}

* `result`: The result object to fetch arrow options from.

####### Return Value {#docs:current:clients:c:query::return-value}

The arrow options associated with the given result. This must be destroyed with
`duckdb_destroy_arrow_options`.

<br>

###### `duckdb_column_count` {#docs:current:clients:c:query::duckdb_column_count}

Returns the number of columns present in the result object.

####### Syntax {#docs:current:clients:c:query::syntax}

```c
            idx_t duckdb_column_count(
  duckdb_result *result
);
```


####### Parameters {#docs:current:clients:c:query::parameters}

* `result`: The result object.

####### Return Value {#docs:current:clients:c:query::return-value}

The number of columns present in the result object.

<br>

###### `duckdb_row_count` {#docs:current:clients:c:query::duckdb_row_count}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.

Returns the number of rows present in the result object.

####### Syntax {#docs:current:clients:c:query::syntax}

```c
            idx_t duckdb_row_count(
  duckdb_result *result
);
```


####### Parameters {#docs:current:clients:c:query::parameters}

* `result`: The result object.

####### Return Value {#docs:current:clients:c:query::return-value}

The number of rows present in the result object.

<br>

###### `duckdb_rows_changed` {#docs:current:clients:c:query::duckdb_rows_changed}

Returns the number of rows changed by the query stored in the result. This is relevant only for INSERT/UPDATE/DELETE
queries. For other queries the rows_changed will be 0.

####### Syntax {#docs:current:clients:c:query::syntax}

```c
            idx_t duckdb_rows_changed(
  duckdb_result *result
);
```


####### Parameters {#docs:current:clients:c:query::parameters}

* `result`: The result object.

####### Return Value {#docs:current:clients:c:query::return-value}

The number of rows changed.

<br>

###### `duckdb_column_data` {#docs:current:clients:c:query::duckdb_column_data}

> **Deprecated.** This method has been deprecated. Prefer using `duckdb_result_get_chunk` instead.

Returns the data of a specific column of a result in columnar format.

The function returns a dense array which contains the result data. The exact type stored in the array depends on the
corresponding duckdb_type (as provided by `duckdb_column_type`). For the exact type by which the data should be
accessed, see the comments in [the types section](#types) or the `DUCKDB_TYPE` enum.

For example, for a column of type `DUCKDB_TYPE_INTEGER`, rows can be accessed in the following manner:
```c
int32_t *data = (int32_t *) duckdb_column_data(&result, 0);
printf("Data for row %d: %d\n", row, data[row]);
```

####### Syntax {#docs:current:clients:c:query::syntax}

```c
            void *duckdb_column_data(
  duckdb_result *result,
  idx_t col
);
```


####### Parameters {#docs:current:clients:c:query::parameters}

* `result`: The result object to fetch the column data from.
* `col`: The column index.

####### Return Value {#docs:current:clients:c:query::return-value}

The column data of the specified column.

<br>

###### `duckdb_nullmask_data` {#docs:current:clients:c:query::duckdb_nullmask_data}

> **Deprecated.** This method has been deprecated. Prefer using `duckdb_result_get_chunk` instead.

Returns the nullmask of a specific column of a result in columnar format. The nullmask indicates for every row
whether or not the corresponding row is `NULL`. If a row is `NULL`, the values present in the array provided
by `duckdb_column_data` are undefined.

```c
int32_t *data = (int32_t *) duckdb_column_data(&result, 0);
bool *nullmask = duckdb_nullmask_data(&result, 0);
if (nullmask[row]) {
    printf("Data for row %d: NULL\n", row);
} else {
    printf("Data for row %d: %d\n", row, data[row]);
}
```

####### Syntax {#docs:current:clients:c:query::syntax}

```c
            bool *duckdb_nullmask_data(
  duckdb_result *result,
  idx_t col
);
```


####### Parameters {#docs:current:clients:c:query::parameters}

* `result`: The result object to fetch the nullmask from.
* `col`: The column index.

####### Return Value {#docs:current:clients:c:query::return-value}

The nullmask of the specified column.

<br>

###### `duckdb_result_error` {#docs:current:clients:c:query::duckdb_result_error}

Returns the error message contained within the result. The error is only set if `duckdb_query` returns `DuckDBError`.

The result of this function must not be freed. It will be cleaned up when `duckdb_destroy_result` is called.

####### Syntax {#docs:current:clients:c:query::syntax}

```c
            const char *duckdb_result_error(
  duckdb_result *result
);
```


####### Parameters {#docs:current:clients:c:query::parameters}

* `result`: The result object to fetch the error from.

####### Return Value {#docs:current:clients:c:query::return-value}

The error of the result.

<br>

###### `duckdb_result_error_type` {#docs:current:clients:c:query::duckdb_result_error_type}

Returns the result error type contained within the result. The error is only set if `duckdb_query` returns
`DuckDBError`.

####### Syntax {#docs:current:clients:c:query::syntax}

```c
            duckdb_error_type duckdb_result_error_type(
  duckdb_result *result
);
```


####### Parameters {#docs:current:clients:c:query::parameters}

* `result`: The result object to fetch the error from.

####### Return Value {#docs:current:clients:c:query::return-value}

The error type of the result.

<br>

### Data Chunks {#docs:current:clients:c:data_chunk}



Data chunks represent a horizontal slice of a table. They hold a number of [vectors](#docs:current:clients:c:vector), that can each hold up to the `VECTOR_SIZE` rows. The vector size can be obtained through the `duckdb_vector_size` function and is configurable, but is usually set to `2048`.

Data chunks and vectors are what DuckDB uses natively to store and represent data. For this reason, the data chunk interface is the most efficient way of interfacing with DuckDB. Be aware, however, that correctly interfacing with DuckDB using the data chunk API does require knowledge of DuckDB's internal vector format.

Data chunks can be used in two manners:

* **Reading Data**: Data chunks can be obtained from query results using the `duckdb_fetch_chunk` method, or as input to a user-defined function. In this case, the [vector methods](#docs:current:clients:c:vector) can be used to read individual values.
* **Writing Data**: Data chunks can be created using `duckdb_create_data_chunk`. The data chunk can then be filled with values and used in `duckdb_append_data_chunk` to write data to the database.

The primary manner of interfacing with data chunks is by obtaining the internal vectors of the data chunk using the `duckdb_data_chunk_get_vector` method. Afterwards, the [vector methods](#docs:current:clients:c:vector) can be used to read from or write to the individual vectors.

#### API Reference Overview {#docs:current:clients:c:data_chunk::api-reference-overview}



```c
            duckdb_data_chunk duckdb_create_data_chunk(duckdb_logical_type *types, idx_t column_count);
void duckdb_destroy_data_chunk(duckdb_data_chunk *chunk);
void duckdb_data_chunk_reset(duckdb_data_chunk chunk);
idx_t duckdb_data_chunk_get_column_count(duckdb_data_chunk chunk);
duckdb_vector duckdb_data_chunk_get_vector(duckdb_data_chunk chunk, idx_t col_idx);
idx_t duckdb_data_chunk_get_size(duckdb_data_chunk chunk);
void duckdb_data_chunk_set_size(duckdb_data_chunk chunk, idx_t size);
```


###### `duckdb_create_data_chunk` {#docs:current:clients:c:data_chunk::duckdb_create_data_chunk}

Creates an empty data chunk with the specified column types.
The result must be destroyed with `duckdb_destroy_data_chunk`.

####### Syntax {#docs:current:clients:c:data_chunk::syntax}

```c
            duckdb_data_chunk duckdb_create_data_chunk(
  duckdb_logical_type *types,
  idx_t column_count
);
```


####### Parameters {#docs:current:clients:c:data_chunk::parameters}

* `types`: An array of column types. Column types cannot contain ANY and INVALID types.
* `column_count`: The number of columns.

####### Return Value {#docs:current:clients:c:data_chunk::return-value}

The data chunk.

<br>

###### `duckdb_destroy_data_chunk` {#docs:current:clients:c:data_chunk::duckdb_destroy_data_chunk}

Destroys the data chunk and de-allocates all memory allocated for that chunk.

####### Syntax {#docs:current:clients:c:data_chunk::syntax}

```c
            void duckdb_destroy_data_chunk(
  duckdb_data_chunk *chunk
);
```


####### Parameters {#docs:current:clients:c:data_chunk::parameters}

* `chunk`: The data chunk to destroy.

<br>

###### `duckdb_data_chunk_reset` {#docs:current:clients:c:data_chunk::duckdb_data_chunk_reset}

Resets a data chunk, clearing the validity masks and setting the cardinality of the data chunk to 0.
After calling this method, you must call `duckdb_vector_get_validity` and `duckdb_vector_get_data` to obtain current
data and validity pointers

####### Syntax {#docs:current:clients:c:data_chunk::syntax}

```c
            void duckdb_data_chunk_reset(
  duckdb_data_chunk chunk
);
```


####### Parameters {#docs:current:clients:c:data_chunk::parameters}

* `chunk`: The data chunk to reset.

<br>

###### `duckdb_data_chunk_get_column_count` {#docs:current:clients:c:data_chunk::duckdb_data_chunk_get_column_count}

Retrieves the number of columns in a data chunk.

####### Syntax {#docs:current:clients:c:data_chunk::syntax}

```c
            idx_t duckdb_data_chunk_get_column_count(
  duckdb_data_chunk chunk
);
```


####### Parameters {#docs:current:clients:c:data_chunk::parameters}

* `chunk`: The data chunk to get the data from

####### Return Value {#docs:current:clients:c:data_chunk::return-value}

The number of columns in the data chunk

<br>

###### `duckdb_data_chunk_get_vector` {#docs:current:clients:c:data_chunk::duckdb_data_chunk_get_vector}

Retrieves the vector at the specified column index in the data chunk.

The pointer to the vector is valid for as long as the chunk is alive.
It does NOT need to be destroyed.

####### Syntax {#docs:current:clients:c:data_chunk::syntax}

```c
            duckdb_vector duckdb_data_chunk_get_vector(
  duckdb_data_chunk chunk,
  idx_t col_idx
);
```


####### Parameters {#docs:current:clients:c:data_chunk::parameters}

* `chunk`: The data chunk to get the data from

####### Return Value {#docs:current:clients:c:data_chunk::return-value}

The vector

<br>

###### `duckdb_data_chunk_get_size` {#docs:current:clients:c:data_chunk::duckdb_data_chunk_get_size}

Retrieves the current number of tuples in a data chunk.

####### Syntax {#docs:current:clients:c:data_chunk::syntax}

```c
            idx_t duckdb_data_chunk_get_size(
  duckdb_data_chunk chunk
);
```


####### Parameters {#docs:current:clients:c:data_chunk::parameters}

* `chunk`: The data chunk to get the data from

####### Return Value {#docs:current:clients:c:data_chunk::return-value}

The number of tuples in the data chunk

<br>

###### `duckdb_data_chunk_set_size` {#docs:current:clients:c:data_chunk::duckdb_data_chunk_set_size}

Sets the current number of tuples in a data chunk.

####### Syntax {#docs:current:clients:c:data_chunk::syntax}

```c
            void duckdb_data_chunk_set_size(
  duckdb_data_chunk chunk,
  idx_t size
);
```


####### Parameters {#docs:current:clients:c:data_chunk::parameters}

* `chunk`: The data chunk to set the size in
* `size`: The number of tuples in the data chunk

<br>

### Vectors {#docs:current:clients:c:vector}

Vectors represent a horizontal slice of a column. They hold a number of values of a specific type, similar to an array. Vectors are the core data representation used in DuckDB. Vectors are typically stored within [data chunks](#docs:current:clients:c:data_chunk).

The vector and data chunk interfaces are the most efficient way of interacting with DuckDB, allowing for the highest performance. However, the interfaces are also difficult to use and care must be taken when using them.

#### Vector Format {#docs:current:clients:c:vector::vector-format}

Vectors are arrays of a specific data type. The logical type of a vector can be obtained using `duckdb_vector_get_column_type`. The type id of the logical type can then be obtained using `duckdb_get_type_id`.

Vectors themselves do not have sizes. Instead, the parent data chunk has a size (that can be obtained through `duckdb_data_chunk_get_size`). All vectors that belong to a data chunk have the same size.

##### Primitive Types {#docs:current:clients:c:vector::primitive-types}

For primitive types, the underlying array can be obtained using the `duckdb_vector_get_data` method. The array can then be accessed using the correct native type. Below is a table that contains a mapping of the `duckdb_type` to the native type of the array.



|       duckdb_type        |    NativeType    |
|--------------------------|------------------|
| DUCKDB_TYPE_BOOLEAN      | bool             |
| DUCKDB_TYPE_TINYINT      | int8_t           |
| DUCKDB_TYPE_SMALLINT     | int16_t          |
| DUCKDB_TYPE_INTEGER      | int32_t          |
| DUCKDB_TYPE_BIGINT       | int64_t          |
| DUCKDB_TYPE_UTINYINT     | uint8_t          |
| DUCKDB_TYPE_USMALLINT    | uint16_t         |
| DUCKDB_TYPE_UINTEGER     | uint32_t         |
| DUCKDB_TYPE_UBIGINT      | uint64_t         |
| DUCKDB_TYPE_FLOAT        | float            |
| DUCKDB_TYPE_DOUBLE       | double           |
| DUCKDB_TYPE_TIMESTAMP    | duckdb_timestamp |
| DUCKDB_TYPE_DATE         | duckdb_date      |
| DUCKDB_TYPE_TIME         | duckdb_time      |
| DUCKDB_TYPE_INTERVAL     | duckdb_interval  |
| DUCKDB_TYPE_HUGEINT      | duckdb_hugeint   |
| DUCKDB_TYPE_UHUGEINT     | duckdb_uhugeint  |
| DUCKDB_TYPE_VARCHAR      | duckdb_string_t  |
| DUCKDB_TYPE_BLOB         | duckdb_string_t  |
| DUCKDB_TYPE_TIMESTAMP_S  | duckdb_timestamp |
| DUCKDB_TYPE_TIMESTAMP_MS | duckdb_timestamp |
| DUCKDB_TYPE_TIMESTAMP_NS | duckdb_timestamp |
| DUCKDB_TYPE_UUID         | duckdb_hugeint   |
| DUCKDB_TYPE_TIME_TZ      | duckdb_time_tz   |
| DUCKDB_TYPE_TIMESTAMP_TZ | duckdb_timestamp |

##### `NULL` Values {#docs:current:clients:c:vector::null-values}

Any value in a vector can be `NULL`. When a value is `NULL`, the values contained within the primary array at that index is undefined (and can be uninitialized). The validity mask is a bitmask consisting of `uint64_t` elements. For every `64` values in the vector, one `uint64_t` element exists (rounded up). The validity mask has its bit set to 1 if the value is valid, or set to 0 if the value is invalid (i.e., `NULL`).

The bits of the bitmask can be read directly, or the slower helper method `duckdb_validity_row_is_valid` can be used to check whether or not a value is `NULL`.

The `duckdb_vector_get_validity` returns a pointer to the validity mask. Note that if all values in a vector are valid, this function **might** return `nullptr` in which case the validity mask does not need to be checked.

##### Strings {#docs:current:clients:c:vector::strings}

String values are stored as a `duckdb_string_t`. This is a special struct that stores the string inline (if it is short, i.e., `<= 12 bytes`) or a pointer to the string data if it is longer than `12` bytes.

```c
typedef struct {
	union {
		struct {
			uint32_t length;
			char prefix[4];
			char *ptr;
		} pointer;
		struct {
			uint32_t length;
			char inlined[12];
		} inlined;
	} value;
} duckdb_string_t;
```

The length can either be accessed directly, or the `duckdb_string_is_inlined` can be used to check if a string is inlined.

##### Decimals {#docs:current:clients:c:vector::decimals}

Decimals are stored as integer values internally. The exact native type depends on the `width` of the decimal type, as shown in the following table:



| Width |   NativeType   |
|-------|----------------|
| <= 4  | int16_t        |
| <= 9  | int32_t        |
| <= 18 | int64_t        |
| <= 38 | duckdb_hugeint |

The `duckdb_decimal_internal_type` can be used to obtain the internal type of the decimal.

Decimals are stored as integer values multiplied by `10^scale`. The scale of a decimal can be obtained using `duckdb_decimal_scale`. For example, a decimal value of `10.5` with type `DECIMAL(8, 3)` is stored internally as an `int32_t` value of `10500`. In order to obtain the correct decimal value, the value should be divided by the appropriate power-of-ten.

##### Enums {#docs:current:clients:c:vector::enums}

Enums are stored as unsigned integer values internally. The exact native type depends on the size of the enum dictionary, as shown in the following table:



| Dictionary size | NativeType |
|-----------------|------------|
| <= 255          | uint8_t    |
| <= 65535        | uint16_t   |
| <= 4294967295   | uint32_t   |

The `duckdb_enum_internal_type` can be used to obtain the internal type of the enum.

In order to obtain the actual string value of the enum, the `duckdb_enum_dictionary_value` function must be used to obtain the enum value that corresponds to the given dictionary entry. Note that the enum dictionary is the same for the entire column – and so only needs to be constructed once.

##### Structs {#docs:current:clients:c:vector::structs}

Structs are nested types that contain any number of child types. Think of them like a `struct` in C. The way to access struct data using vectors is to access the child vectors recursively using the `duckdb_struct_vector_get_child` method.

The struct vector itself does not have any data (i.e., you should not use `duckdb_vector_get_data` method on the struct). **However**, the struct vector itself **does** have a validity mask. The reason for this is that the child elements of a struct can be `NULL`, but the struct **itself** can also be `NULL`.

##### Lists {#docs:current:clients:c:vector::lists}

Lists are nested types that contain a single child type, repeated `x` times per row. Think of them like a variable-length array in C. The way to access list data using vectors is to access the child vector using the `duckdb_list_vector_get_child` method.

The `duckdb_vector_get_data` must be used to get the offsets and lengths of the lists stored as `duckdb_list_entry`, that can then be applied to the child vector.

```c
typedef struct {
	uint64_t offset;
	uint64_t length;
} duckdb_list_entry;
```

Note that both list entries itself **and** any children stored in the lists can also be `NULL`. This must be checked using the validity mask again.

##### Arrays {#docs:current:clients:c:vector::arrays}

Arrays are nested types that contain a single child type, repeated exactly `array_size` times per row. Think of them like a fixed-size array in C. Arrays work exactly the same as lists, **except** the length and offset of each entry is fixed. The fixed array size can be obtained by using `duckdb_array_type_array_size`. The data for entry `n` then resides at `offset = n * array_size` and always has `length = array_size`.

Note that much like lists, arrays can still be `NULL`, which must be checked using the validity mask.

#### Examples {#docs:current:clients:c:vector::examples}

Below are several full end-to-end examples of how to interact with vectors.

##### Example: Reading an int64 Vector with `NULL` Values {#docs:current:clients:c:vector::example-reading-an-int64-vector-with-null-values}

```c
duckdb_database db;
duckdb_connection con;
duckdb_open(nullptr, &db);
duckdb_connect(db, &con);

duckdb_result res;
duckdb_query(con, "SELECT CASE WHEN i%2=0 THEN NULL ELSE i END res_col FROM range(10) t(i)", &res);

// iterate until result is exhausted
while (true) {
	duckdb_data_chunk result = duckdb_fetch_chunk(res);
	if (!result) {
		// result is exhausted
		break;
	}
	// get the number of rows from the data chunk
	idx_t row_count = duckdb_data_chunk_get_size(result);
	// get the first column
	duckdb_vector res_col = duckdb_data_chunk_get_vector(result, 0);
	// get the native array and the validity mask of the vector
	int64_t *vector_data = (int64_t *) duckdb_vector_get_data(res_col);
	uint64_t *vector_validity = duckdb_vector_get_validity(res_col);
	// iterate over the rows
	for (idx_t row = 0; row < row_count; row++) {
		if (duckdb_validity_row_is_valid(vector_validity, row)) {
			printf("%lld\n", vector_data[row]);
		} else {
			printf("NULL\n");
		}
	}
	duckdb_destroy_data_chunk(&result);
}
// clean-up
duckdb_destroy_result(&res);
duckdb_disconnect(&con);
duckdb_close(&db);
```

##### Example: Reading a String Vector {#docs:current:clients:c:vector::example-reading-a-string-vector}

```c
duckdb_database db;
duckdb_connection con;
duckdb_open(nullptr, &db);
duckdb_connect(db, &con);

duckdb_result res;
duckdb_query(con, "SELECT CASE WHEN i%2=0 THEN CONCAT('short_', i) ELSE CONCAT('longstringprefix', i) END FROM range(10) t(i)", &res);

// iterate until result is exhausted
while (true) {
	duckdb_data_chunk result = duckdb_fetch_chunk(res);
	if (!result) {
		// result is exhausted
		break;
	}
	// get the number of rows from the data chunk
	idx_t row_count = duckdb_data_chunk_get_size(result);
	// get the first column
	duckdb_vector res_col = duckdb_data_chunk_get_vector(result, 0);
	// get the native array and the validity mask of the vector
	duckdb_string_t *vector_data = (duckdb_string_t *) duckdb_vector_get_data(res_col);
	uint64_t *vector_validity = duckdb_vector_get_validity(res_col);
	// iterate over the rows
	for (idx_t row = 0; row < row_count; row++) {
		if (duckdb_validity_row_is_valid(vector_validity, row)) {
			duckdb_string_t str = vector_data[row];
			if (duckdb_string_is_inlined(str)) {
				// use inlined string
				printf("%.*s\n", str.value.inlined.length, str.value.inlined.inlined);
			} else {
				// follow string pointer
				printf("%.*s\n", str.value.pointer.length, str.value.pointer.ptr);
			}
		} else {
			printf("NULL\n");
		}
	}
	duckdb_destroy_data_chunk(&result);
}
// clean-up
duckdb_destroy_result(&res);
duckdb_disconnect(&con);
duckdb_close(&db);
```

##### Example: Reading a Struct Vector {#docs:current:clients:c:vector::example-reading-a-struct-vector}

```c
duckdb_database db;
duckdb_connection con;
duckdb_open(nullptr, &db);
duckdb_connect(db, &con);

duckdb_result res;
duckdb_query(con, "SELECT CASE WHEN i%5=0 THEN NULL ELSE {'col1': i, 'col2': CASE WHEN i%2=0 THEN NULL ELSE 100 + i * 42 END} END FROM range(10) t(i)", &res);

// iterate until result is exhausted
while (true) {
	duckdb_data_chunk result = duckdb_fetch_chunk(res);
	if (!result) {
		// result is exhausted
		break;
	}
	// get the number of rows from the data chunk
	idx_t row_count = duckdb_data_chunk_get_size(result);
	// get the struct column
	duckdb_vector struct_col = duckdb_data_chunk_get_vector(result, 0);
	uint64_t *struct_validity = duckdb_vector_get_validity(struct_col);
	// get the child columns of the struct
	duckdb_vector col1_vector = duckdb_struct_vector_get_child(struct_col, 0);
	int64_t *col1_data = (int64_t *) duckdb_vector_get_data(col1_vector);
	uint64_t *col1_validity = duckdb_vector_get_validity(col1_vector);

	duckdb_vector col2_vector = duckdb_struct_vector_get_child(struct_col, 1);
	int64_t *col2_data = (int64_t *) duckdb_vector_get_data(col2_vector);
	uint64_t *col2_validity = duckdb_vector_get_validity(col2_vector);

	// iterate over the rows
	for (idx_t row = 0; row < row_count; row++) {
		if (!duckdb_validity_row_is_valid(struct_validity, row)) {
			// entire struct is NULL
			printf("NULL\n");
			continue;
		}
		// read col1
		printf("{'col1': ");
		if (!duckdb_validity_row_is_valid(col1_validity, row)) {
			// col1 is NULL
			printf("NULL");
		} else {
			printf("%lld", col1_data[row]);
		}
		printf(", 'col2': ");
		if (!duckdb_validity_row_is_valid(col2_validity, row)) {
			// col2 is NULL
			printf("NULL");
		} else {
			printf("%lld", col2_data[row]);
		}
		printf("}\n");
	}
	duckdb_destroy_data_chunk(&result);
}
// clean-up
duckdb_destroy_result(&res);
duckdb_disconnect(&con);
duckdb_close(&db);
```

##### Example: Reading a List Vector {#docs:current:clients:c:vector::example-reading-a-list-vector}

```c
duckdb_database db;
duckdb_connection con;
duckdb_open(nullptr, &db);
duckdb_connect(db, &con);

duckdb_result res;
duckdb_query(con, "SELECT CASE WHEN i % 5 = 0 THEN NULL WHEN i % 2 = 0 THEN [i, i + 1] ELSE [i * 42, NULL, i * 84] END FROM range(10) t(i)", &res);

// iterate until result is exhausted
while (true) {
	duckdb_data_chunk result = duckdb_fetch_chunk(res);
	if (!result) {
		// result is exhausted
		break;
	}
	// get the number of rows from the data chunk
	idx_t row_count = duckdb_data_chunk_get_size(result);
	// get the list column
	duckdb_vector list_col = duckdb_data_chunk_get_vector(result, 0);
	duckdb_list_entry *list_data = (duckdb_list_entry *) duckdb_vector_get_data(list_col);
	uint64_t *list_validity = duckdb_vector_get_validity(list_col);
	// get the child column of the list
	duckdb_vector list_child = duckdb_list_vector_get_child(list_col);
	int64_t *child_data = (int64_t *) duckdb_vector_get_data(list_child);
	uint64_t *child_validity = duckdb_vector_get_validity(list_child);

	// iterate over the rows
	for (idx_t row = 0; row < row_count; row++) {
		if (!duckdb_validity_row_is_valid(list_validity, row)) {
			// entire list is NULL
			printf("NULL\n");
			continue;
		}
		// read the list offsets for this row
		duckdb_list_entry list = list_data[row];
		printf("[");
		for (idx_t child_idx = list.offset; child_idx < list.offset + list.length; child_idx++) {
			if (child_idx > list.offset) {
				printf(", ");
			}
			if (!duckdb_validity_row_is_valid(child_validity, child_idx)) {
				// col1 is NULL
				printf("NULL");
			} else {
				printf("%lld", child_data[child_idx]);
			}
		}
		printf("]\n");
	}
	duckdb_destroy_data_chunk(&result);
}
// clean-up
duckdb_destroy_result(&res);
duckdb_disconnect(&con);
duckdb_close(&db);
```

#### API Reference Overview {#docs:current:clients:c:vector::api-reference-overview}



```c
            duckdb_vector duckdb_create_vector(duckdb_logical_type type, idx_t capacity);
void duckdb_destroy_vector(duckdb_vector *vector);
duckdb_logical_type duckdb_vector_get_column_type(duckdb_vector vector);
void *duckdb_vector_get_data(duckdb_vector vector);
uint64_t *duckdb_vector_get_validity(duckdb_vector vector);
void duckdb_vector_ensure_validity_writable(duckdb_vector vector);
void duckdb_vector_assign_string_element(duckdb_vector vector, idx_t index, const char *str);
void duckdb_vector_assign_string_element_len(duckdb_vector vector, idx_t index, const char *str, idx_t str_len);
duckdb_vector duckdb_list_vector_get_child(duckdb_vector vector);
idx_t duckdb_list_vector_get_size(duckdb_vector vector);
duckdb_state duckdb_list_vector_set_size(duckdb_vector vector, idx_t size);
duckdb_state duckdb_list_vector_reserve(duckdb_vector vector, idx_t required_capacity);
duckdb_vector duckdb_struct_vector_get_child(duckdb_vector vector, idx_t index);
duckdb_vector duckdb_array_vector_get_child(duckdb_vector vector);
void duckdb_slice_vector(duckdb_vector vector, duckdb_selection_vector sel, idx_t len);
void duckdb_vector_copy_sel(duckdb_vector src, duckdb_vector dst, duckdb_selection_vector sel, idx_t src_count, idx_t src_offset, idx_t dst_offset);
void duckdb_vector_reference_value(duckdb_vector vector, duckdb_value value);
void duckdb_vector_reference_vector(duckdb_vector to_vector, duckdb_vector from_vector);
```


##### Validity Mask Functions {#docs:current:clients:c:vector::validity-mask-functions}

```c
            bool duckdb_validity_row_is_valid(uint64_t *validity, idx_t row);
void duckdb_validity_set_row_validity(uint64_t *validity, idx_t row, bool valid);
void duckdb_validity_set_row_invalid(uint64_t *validity, idx_t row);
void duckdb_validity_set_row_valid(uint64_t *validity, idx_t row);
```


###### `duckdb_create_vector` {#docs:current:clients:c:vector::duckdb_create_vector}

Creates a flat vector. Must be destroyed with `duckdb_destroy_vector`.

####### Syntax {#docs:current:clients:c:vector::syntax}

```c
            duckdb_vector duckdb_create_vector(
  duckdb_logical_type type,
  idx_t capacity
);
```


####### Parameters {#docs:current:clients:c:vector::parameters}

* `type`: The logical type of the vector.
* `capacity`: The capacity of the vector.

####### Return Value {#docs:current:clients:c:vector::return-value}

The vector.

<br>

###### `duckdb_destroy_vector` {#docs:current:clients:c:vector::duckdb_destroy_vector}

Destroys the vector and de-allocates its memory.

####### Syntax {#docs:current:clients:c:vector::syntax}

```c
            void duckdb_destroy_vector(
  duckdb_vector *vector
);
```


####### Parameters {#docs:current:clients:c:vector::parameters}

* `vector`: A pointer to the vector.

<br>

###### `duckdb_vector_get_column_type` {#docs:current:clients:c:vector::duckdb_vector_get_column_type}

Retrieves the column type of the specified vector.

The result must be destroyed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:vector::syntax}

```c
            duckdb_logical_type duckdb_vector_get_column_type(
  duckdb_vector vector
);
```


####### Parameters {#docs:current:clients:c:vector::parameters}

* `vector`: The vector get the data from

####### Return Value {#docs:current:clients:c:vector::return-value}

The type of the vector

<br>

###### `duckdb_vector_get_data` {#docs:current:clients:c:vector::duckdb_vector_get_data}

Retrieves the data pointer of the vector.

The data pointer can be used to read or write values from the vector.
How to read or write values depends on the type of the vector.

####### Syntax {#docs:current:clients:c:vector::syntax}

```c
            void *duckdb_vector_get_data(
  duckdb_vector vector
);
```


####### Parameters {#docs:current:clients:c:vector::parameters}

* `vector`: The vector to get the data from

####### Return Value {#docs:current:clients:c:vector::return-value}

The data pointer

<br>

###### `duckdb_vector_get_validity` {#docs:current:clients:c:vector::duckdb_vector_get_validity}

Retrieves the validity mask pointer of the specified vector.

If all values are valid, this function MIGHT return NULL!

The validity mask is a bitset that signifies null-ness within the data chunk.
It is a series of uint64_t values, where each uint64_t value contains validity for 64 tuples.
The bit is set to 1 if the value is valid (i.e., not NULL) or 0 if the value is invalid (i.e., NULL).

Validity of a specific value can be obtained like this:

idx_t entry_idx = row_idx / 64;
idx_t idx_in_entry = row_idx % 64;
bool is_valid = validity_mask[entry_idx] & (1 << idx_in_entry);

Alternatively, the (slower) duckdb_validity_row_is_valid function can be used.

####### Syntax {#docs:current:clients:c:vector::syntax}

```c
            uint64_t *duckdb_vector_get_validity(
  duckdb_vector vector
);
```


####### Parameters {#docs:current:clients:c:vector::parameters}

* `vector`: The vector to get the data from

####### Return Value {#docs:current:clients:c:vector::return-value}

The pointer to the validity mask, or NULL if no validity mask is present

<br>

###### `duckdb_vector_ensure_validity_writable` {#docs:current:clients:c:vector::duckdb_vector_ensure_validity_writable}

Ensures the validity mask is writable by allocating it.

After this function is called, `duckdb_vector_get_validity` will ALWAYS return non-NULL.
This allows NULL values to be written to the vector, regardless of whether a validity mask was present before.

####### Syntax {#docs:current:clients:c:vector::syntax}

```c
            void duckdb_vector_ensure_validity_writable(
  duckdb_vector vector
);
```


####### Parameters {#docs:current:clients:c:vector::parameters}

* `vector`: The vector to alter

<br>

###### `duckdb_vector_assign_string_element` {#docs:current:clients:c:vector::duckdb_vector_assign_string_element}

Assigns a string element in the vector at the specified location.

####### Syntax {#docs:current:clients:c:vector::syntax}

```c
            void duckdb_vector_assign_string_element(
  duckdb_vector vector,
  idx_t index,
  const char *str
);
```


####### Parameters {#docs:current:clients:c:vector::parameters}

* `vector`: The vector to alter
* `index`: The row position in the vector to assign the string to
* `str`: The null-terminated string

<br>

###### `duckdb_vector_assign_string_element_len` {#docs:current:clients:c:vector::duckdb_vector_assign_string_element_len}

Assigns a string element in the vector at the specified location. You may also use this function to assign BLOBs.

####### Syntax {#docs:current:clients:c:vector::syntax}

```c
            void duckdb_vector_assign_string_element_len(
  duckdb_vector vector,
  idx_t index,
  const char *str,
  idx_t str_len
);
```


####### Parameters {#docs:current:clients:c:vector::parameters}

* `vector`: The vector to alter
* `index`: The row position in the vector to assign the string to
* `str`: The string
* `str_len`: The length of the string (in bytes)

<br>

###### `duckdb_list_vector_get_child` {#docs:current:clients:c:vector::duckdb_list_vector_get_child}

Retrieves the child vector of a list vector.

The resulting vector is valid as long as the parent vector is valid.

####### Syntax {#docs:current:clients:c:vector::syntax}

```c
            duckdb_vector duckdb_list_vector_get_child(
  duckdb_vector vector
);
```


####### Parameters {#docs:current:clients:c:vector::parameters}

* `vector`: The vector

####### Return Value {#docs:current:clients:c:vector::return-value}

The child vector

<br>

###### `duckdb_list_vector_get_size` {#docs:current:clients:c:vector::duckdb_list_vector_get_size}

Returns the size of the child vector of the list.

####### Syntax {#docs:current:clients:c:vector::syntax}

```c
            idx_t duckdb_list_vector_get_size(
  duckdb_vector vector
);
```


####### Parameters {#docs:current:clients:c:vector::parameters}

* `vector`: The vector

####### Return Value {#docs:current:clients:c:vector::return-value}

The size of the child list

<br>

###### `duckdb_list_vector_set_size` {#docs:current:clients:c:vector::duckdb_list_vector_set_size}

Sets the total size of the underlying child-vector of a list vector.

####### Syntax {#docs:current:clients:c:vector::syntax}

```c
            duckdb_state duckdb_list_vector_set_size(
  duckdb_vector vector,
  idx_t size
);
```


####### Parameters {#docs:current:clients:c:vector::parameters}

* `vector`: The list vector.
* `size`: The size of the child list.

####### Return Value {#docs:current:clients:c:vector::return-value}

The duckdb state. Returns DuckDBError if the vector is nullptr.

<br>

###### `duckdb_list_vector_reserve` {#docs:current:clients:c:vector::duckdb_list_vector_reserve}

Sets the total capacity of the underlying child-vector of a list.

After calling this method, you must call `duckdb_vector_get_validity` and `duckdb_vector_get_data` to obtain current
data and validity pointers

####### Syntax {#docs:current:clients:c:vector::syntax}

```c
            duckdb_state duckdb_list_vector_reserve(
  duckdb_vector vector,
  idx_t required_capacity
);
```


####### Parameters {#docs:current:clients:c:vector::parameters}

* `vector`: The list vector.
* `required_capacity`: the total capacity to reserve.

####### Return Value {#docs:current:clients:c:vector::return-value}

The duckdb state. Returns DuckDBError if the vector is nullptr.

<br>

###### `duckdb_struct_vector_get_child` {#docs:current:clients:c:vector::duckdb_struct_vector_get_child}

Retrieves the child vector of a struct vector.
The resulting vector is valid as long as the parent vector is valid.

####### Syntax {#docs:current:clients:c:vector::syntax}

```c
            duckdb_vector duckdb_struct_vector_get_child(
  duckdb_vector vector,
  idx_t index
);
```


####### Parameters {#docs:current:clients:c:vector::parameters}

* `vector`: The vector
* `index`: The child index

####### Return Value {#docs:current:clients:c:vector::return-value}

The child vector

<br>

###### `duckdb_array_vector_get_child` {#docs:current:clients:c:vector::duckdb_array_vector_get_child}

Retrieves the child vector of an array vector.
The resulting vector is valid as long as the parent vector is valid.
The resulting vector has the size of the parent vector multiplied by the array size.

####### Syntax {#docs:current:clients:c:vector::syntax}

```c
            duckdb_vector duckdb_array_vector_get_child(
  duckdb_vector vector
);
```


####### Parameters {#docs:current:clients:c:vector::parameters}

* `vector`: The vector

####### Return Value {#docs:current:clients:c:vector::return-value}

The child vector

<br>

###### `duckdb_slice_vector` {#docs:current:clients:c:vector::duckdb_slice_vector}

Slice a vector with a selection vector.
The length of the selection vector must be less than or equal to the length of the vector.
Turns the vector into a dictionary vector.

####### Syntax {#docs:current:clients:c:vector::syntax}

```c
            void duckdb_slice_vector(
  duckdb_vector vector,
  duckdb_selection_vector sel,
  idx_t len
);
```


####### Parameters {#docs:current:clients:c:vector::parameters}

* `vector`: The vector to slice.
* `sel`: The selection vector.
* `len`: The length of the selection vector.

<br>

###### `duckdb_vector_copy_sel` {#docs:current:clients:c:vector::duckdb_vector_copy_sel}

Copy the src vector to the dst with a selection vector that identifies which indices to copy.

####### Syntax {#docs:current:clients:c:vector::syntax}

```c
            void duckdb_vector_copy_sel(
  duckdb_vector src,
  duckdb_vector dst,
  duckdb_selection_vector sel,
  idx_t src_count,
  idx_t src_offset,
  idx_t dst_offset
);
```


####### Parameters {#docs:current:clients:c:vector::parameters}

* `src`: The vector to copy from.
* `dst`: The vector to copy to.
* `sel`: The selection vector. The length of the selection vector should not be more than the length of the src
vector
* `src_count`: The number of entries from selection vector to copy. Think of this as the effective length of the
selection vector starting from index 0
* `src_offset`: The offset in the selection vector to copy from (important: actual number of items copied =
src_count - src_offset).
* `dst_offset`: The offset in the dst vector to start copying to.

<br>

###### `duckdb_vector_reference_value` {#docs:current:clients:c:vector::duckdb_vector_reference_value}

Copies the value from `value` to `vector`.

####### Syntax {#docs:current:clients:c:vector::syntax}

```c
            void duckdb_vector_reference_value(
  duckdb_vector vector,
  duckdb_value value
);
```


####### Parameters {#docs:current:clients:c:vector::parameters}

* `vector`: The receiving vector.
* `value`: The value to copy into the vector.

<br>

###### `duckdb_vector_reference_vector` {#docs:current:clients:c:vector::duckdb_vector_reference_vector}

Changes `to_vector` to reference `from_vector. After, the vectors share ownership of the data.

####### Syntax {#docs:current:clients:c:vector::syntax}

```c
            void duckdb_vector_reference_vector(
  duckdb_vector to_vector,
  duckdb_vector from_vector
);
```


####### Parameters {#docs:current:clients:c:vector::parameters}

* `to_vector`: The receiving vector.
* `from_vector`: The vector to reference.

<br>

###### `duckdb_validity_row_is_valid` {#docs:current:clients:c:vector::duckdb_validity_row_is_valid}

Returns whether or not a row is valid (i.e., not NULL) in the given validity mask.

####### Syntax {#docs:current:clients:c:vector::syntax}

```c
            bool duckdb_validity_row_is_valid(
  uint64_t *validity,
  idx_t row
);
```


####### Parameters {#docs:current:clients:c:vector::parameters}

* `validity`: The validity mask, as obtained through `duckdb_vector_get_validity`
* `row`: The row index

####### Return Value {#docs:current:clients:c:vector::return-value}

true if the row is valid, false otherwise

<br>

###### `duckdb_validity_set_row_validity` {#docs:current:clients:c:vector::duckdb_validity_set_row_validity}

In a validity mask, sets a specific row to either valid or invalid.

Note that `duckdb_vector_ensure_validity_writable` should be called before calling `duckdb_vector_get_validity`,
to ensure that there is a validity mask to write to.

####### Syntax {#docs:current:clients:c:vector::syntax}

```c
            void duckdb_validity_set_row_validity(
  uint64_t *validity,
  idx_t row,
  bool valid
);
```


####### Parameters {#docs:current:clients:c:vector::parameters}

* `validity`: The validity mask, as obtained through `duckdb_vector_get_validity`.
* `row`: The row index
* `valid`: Whether or not to set the row to valid, or invalid

<br>

###### `duckdb_validity_set_row_invalid` {#docs:current:clients:c:vector::duckdb_validity_set_row_invalid}

In a validity mask, sets a specific row to invalid.

Equivalent to `duckdb_validity_set_row_validity` with valid set to false.

####### Syntax {#docs:current:clients:c:vector::syntax}

```c
            void duckdb_validity_set_row_invalid(
  uint64_t *validity,
  idx_t row
);
```


####### Parameters {#docs:current:clients:c:vector::parameters}

* `validity`: The validity mask
* `row`: The row index

<br>

###### `duckdb_validity_set_row_valid` {#docs:current:clients:c:vector::duckdb_validity_set_row_valid}

In a validity mask, sets a specific row to valid.

Equivalent to `duckdb_validity_set_row_validity` with valid set to true.

####### Syntax {#docs:current:clients:c:vector::syntax}

```c
            void duckdb_validity_set_row_valid(
  uint64_t *validity,
  idx_t row
);
```


####### Parameters {#docs:current:clients:c:vector::parameters}

* `validity`: The validity mask
* `row`: The row index

<br>

### Values {#docs:current:clients:c:value}



The value class represents a single value of any type.

#### API Reference Overview {#docs:current:clients:c:value::api-reference-overview}



```c
            void duckdb_destroy_value(duckdb_value *value);
duckdb_value duckdb_create_varchar(const char *text);
duckdb_value duckdb_create_varchar_length(const char *text, idx_t length);
duckdb_value duckdb_create_bool(bool input);
duckdb_value duckdb_create_int8(int8_t input);
duckdb_value duckdb_create_uint8(uint8_t input);
duckdb_value duckdb_create_int16(int16_t input);
duckdb_value duckdb_create_uint16(uint16_t input);
duckdb_value duckdb_create_int32(int32_t input);
duckdb_value duckdb_create_uint32(uint32_t input);
duckdb_value duckdb_create_uint64(uint64_t input);
duckdb_value duckdb_create_int64(int64_t val);
duckdb_value duckdb_create_hugeint(duckdb_hugeint input);
duckdb_value duckdb_create_uhugeint(duckdb_uhugeint input);
duckdb_value duckdb_create_bignum(duckdb_bignum input);
duckdb_value duckdb_create_decimal(duckdb_decimal input);
duckdb_value duckdb_create_float(float input);
duckdb_value duckdb_create_double(double input);
duckdb_value duckdb_create_date(duckdb_date input);
duckdb_value duckdb_create_time(duckdb_time input);
duckdb_value duckdb_create_time_ns(duckdb_time_ns input);
duckdb_value duckdb_create_time_tz_value(duckdb_time_tz value);
duckdb_value duckdb_create_timestamp(duckdb_timestamp input);
duckdb_value duckdb_create_timestamp_tz(duckdb_timestamp input);
duckdb_value duckdb_create_timestamp_s(duckdb_timestamp_s input);
duckdb_value duckdb_create_timestamp_ms(duckdb_timestamp_ms input);
duckdb_value duckdb_create_timestamp_ns(duckdb_timestamp_ns input);
duckdb_value duckdb_create_interval(duckdb_interval input);
duckdb_value duckdb_create_blob(const uint8_t *data, idx_t length);
duckdb_value duckdb_create_bit(duckdb_bit input);
duckdb_value duckdb_create_uuid(duckdb_uhugeint input);
bool duckdb_get_bool(duckdb_value val);
int8_t duckdb_get_int8(duckdb_value val);
uint8_t duckdb_get_uint8(duckdb_value val);
int16_t duckdb_get_int16(duckdb_value val);
uint16_t duckdb_get_uint16(duckdb_value val);
int32_t duckdb_get_int32(duckdb_value val);
uint32_t duckdb_get_uint32(duckdb_value val);
int64_t duckdb_get_int64(duckdb_value val);
uint64_t duckdb_get_uint64(duckdb_value val);
duckdb_hugeint duckdb_get_hugeint(duckdb_value val);
duckdb_uhugeint duckdb_get_uhugeint(duckdb_value val);
duckdb_bignum duckdb_get_bignum(duckdb_value val);
duckdb_decimal duckdb_get_decimal(duckdb_value val);
float duckdb_get_float(duckdb_value val);
double duckdb_get_double(duckdb_value val);
duckdb_date duckdb_get_date(duckdb_value val);
duckdb_time duckdb_get_time(duckdb_value val);
duckdb_time_ns duckdb_get_time_ns(duckdb_value val);
duckdb_time_tz duckdb_get_time_tz(duckdb_value val);
duckdb_timestamp duckdb_get_timestamp(duckdb_value val);
duckdb_timestamp duckdb_get_timestamp_tz(duckdb_value val);
duckdb_timestamp_s duckdb_get_timestamp_s(duckdb_value val);
duckdb_timestamp_ms duckdb_get_timestamp_ms(duckdb_value val);
duckdb_timestamp_ns duckdb_get_timestamp_ns(duckdb_value val);
duckdb_interval duckdb_get_interval(duckdb_value val);
duckdb_logical_type duckdb_get_value_type(duckdb_value val);
duckdb_blob duckdb_get_blob(duckdb_value val);
duckdb_bit duckdb_get_bit(duckdb_value val);
duckdb_uhugeint duckdb_get_uuid(duckdb_value val);
char *duckdb_get_varchar(duckdb_value value);
duckdb_value duckdb_create_struct_value(duckdb_logical_type type, duckdb_value *values);
duckdb_value duckdb_create_list_value(duckdb_logical_type type, duckdb_value *values, idx_t value_count);
duckdb_value duckdb_create_array_value(duckdb_logical_type type, duckdb_value *values, idx_t value_count);
duckdb_value duckdb_create_map_value(duckdb_logical_type map_type, duckdb_value *keys, duckdb_value *values, idx_t entry_count);
duckdb_value duckdb_create_union_value(duckdb_logical_type union_type, idx_t tag_index, duckdb_value value);
idx_t duckdb_get_map_size(duckdb_value value);
duckdb_value duckdb_get_map_key(duckdb_value value, idx_t index);
duckdb_value duckdb_get_map_value(duckdb_value value, idx_t index);
bool duckdb_is_null_value(duckdb_value value);
duckdb_value duckdb_create_null_value();
idx_t duckdb_get_list_size(duckdb_value value);
duckdb_value duckdb_get_list_child(duckdb_value value, idx_t index);
duckdb_value duckdb_create_enum_value(duckdb_logical_type type, uint64_t value);
uint64_t duckdb_get_enum_value(duckdb_value value);
duckdb_value duckdb_get_struct_child(duckdb_value value, idx_t index);
char *duckdb_value_to_string(duckdb_value value);
```


###### `duckdb_destroy_value` {#docs:current:clients:c:value::duckdb_destroy_value}

Destroys the value and de-allocates all memory allocated for that type.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            void duckdb_destroy_value(
  duckdb_value *value
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `value`: The value to destroy.

<br>

###### `duckdb_create_varchar` {#docs:current:clients:c:value::duckdb_create_varchar}

Creates a value from a null-terminated string

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_varchar(
  const char *text
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `text`: The null-terminated string

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_varchar_length` {#docs:current:clients:c:value::duckdb_create_varchar_length}

Creates a value from a string

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_varchar_length(
  const char *text,
  idx_t length
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `text`: The text
* `length`: The length of the text

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_bool` {#docs:current:clients:c:value::duckdb_create_bool}

Creates a value from a boolean

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_bool(
  bool input
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `input`: The boolean value

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_int8` {#docs:current:clients:c:value::duckdb_create_int8}

Creates a value from an int8_t (a tinyint)

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_int8(
  int8_t input
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `input`: The tinyint value

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_uint8` {#docs:current:clients:c:value::duckdb_create_uint8}

Creates a value from a uint8_t (a utinyint)

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_uint8(
  uint8_t input
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `input`: The utinyint value

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_int16` {#docs:current:clients:c:value::duckdb_create_int16}

Creates a value from an int16_t (a smallint)

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_int16(
  int16_t input
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `input`: The smallint value

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_uint16` {#docs:current:clients:c:value::duckdb_create_uint16}

Creates a value from a uint16_t (a usmallint)

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_uint16(
  uint16_t input
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `input`: The usmallint value

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_int32` {#docs:current:clients:c:value::duckdb_create_int32}

Creates a value from an int32_t (an integer)

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_int32(
  int32_t input
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `input`: The integer value

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_uint32` {#docs:current:clients:c:value::duckdb_create_uint32}

Creates a value from a uint32_t (a uinteger)

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_uint32(
  uint32_t input
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `input`: The uinteger value

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_uint64` {#docs:current:clients:c:value::duckdb_create_uint64}

Creates a value from a uint64_t (a ubigint)

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_uint64(
  uint64_t input
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `input`: The ubigint value

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_int64` {#docs:current:clients:c:value::duckdb_create_int64}

Creates a value from an int64


####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_int64(
  int64_t val
);
```

<br>

###### `duckdb_create_hugeint` {#docs:current:clients:c:value::duckdb_create_hugeint}

Creates a value from a hugeint

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_hugeint(
  duckdb_hugeint input
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `input`: The hugeint value

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_uhugeint` {#docs:current:clients:c:value::duckdb_create_uhugeint}

Creates a value from a uhugeint

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_uhugeint(
  duckdb_uhugeint input
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `input`: The uhugeint value

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_bignum` {#docs:current:clients:c:value::duckdb_create_bignum}

Creates a BIGNUM value from a duckdb_bignum

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_bignum(
  duckdb_bignum input
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `input`: The duckdb_bignum value

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_decimal` {#docs:current:clients:c:value::duckdb_create_decimal}

Creates a DECIMAL value from a duckdb_decimal

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_decimal(
  duckdb_decimal input
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `input`: The duckdb_decimal value

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_float` {#docs:current:clients:c:value::duckdb_create_float}

Creates a value from a float

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_float(
  float input
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `input`: The float value

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_double` {#docs:current:clients:c:value::duckdb_create_double}

Creates a value from a double

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_double(
  double input
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `input`: The double value

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_date` {#docs:current:clients:c:value::duckdb_create_date}

Creates a value from a date

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_date(
  duckdb_date input
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `input`: The date value

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_time` {#docs:current:clients:c:value::duckdb_create_time}

Creates a value from a time

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_time(
  duckdb_time input
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `input`: The time value

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_time_ns` {#docs:current:clients:c:value::duckdb_create_time_ns}

Creates a value from a time_ns

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_time_ns(
  duckdb_time_ns input
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `input`: The time value

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_time_tz_value` {#docs:current:clients:c:value::duckdb_create_time_tz_value}

Creates a value from a time_tz.
Not to be confused with `duckdb_create_time_tz`, which creates a duckdb_time_tz_t.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_time_tz_value(
  duckdb_time_tz value
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `value`: The time_tz value

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_timestamp` {#docs:current:clients:c:value::duckdb_create_timestamp}

Creates a TIMESTAMP value from a duckdb_timestamp

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_timestamp(
  duckdb_timestamp input
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `input`: The duckdb_timestamp value

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_timestamp_tz` {#docs:current:clients:c:value::duckdb_create_timestamp_tz}

Creates a TIMESTAMP_TZ value from a duckdb_timestamp

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_timestamp_tz(
  duckdb_timestamp input
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `input`: The duckdb_timestamp value

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_timestamp_s` {#docs:current:clients:c:value::duckdb_create_timestamp_s}

Creates a TIMESTAMP_S value from a duckdb_timestamp_s

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_timestamp_s(
  duckdb_timestamp_s input
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `input`: The duckdb_timestamp_s value

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_timestamp_ms` {#docs:current:clients:c:value::duckdb_create_timestamp_ms}

Creates a TIMESTAMP_MS value from a duckdb_timestamp_ms

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_timestamp_ms(
  duckdb_timestamp_ms input
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `input`: The duckdb_timestamp_ms value

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_timestamp_ns` {#docs:current:clients:c:value::duckdb_create_timestamp_ns}

Creates a TIMESTAMP_NS value from a duckdb_timestamp_ns

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_timestamp_ns(
  duckdb_timestamp_ns input
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `input`: The duckdb_timestamp_ns value

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_interval` {#docs:current:clients:c:value::duckdb_create_interval}

Creates a value from an interval

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_interval(
  duckdb_interval input
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `input`: The interval value

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_blob` {#docs:current:clients:c:value::duckdb_create_blob}

Creates a value from a blob

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_blob(
  const uint8_t *data,
  idx_t length
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `data`: The blob data
* `length`: The length of the blob data

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_bit` {#docs:current:clients:c:value::duckdb_create_bit}

Creates a BIT value from a duckdb_bit

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_bit(
  duckdb_bit input
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `input`: The duckdb_bit value

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_uuid` {#docs:current:clients:c:value::duckdb_create_uuid}

Creates a UUID value from a uhugeint

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_uuid(
  duckdb_uhugeint input
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `input`: The duckdb_uhugeint containing the UUID

####### Return Value {#docs:current:clients:c:value::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_get_bool` {#docs:current:clients:c:value::duckdb_get_bool}

Returns the boolean value of the given value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            bool duckdb_get_bool(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value containing a boolean

####### Return Value {#docs:current:clients:c:value::return-value}

A boolean, or false if the value cannot be converted

<br>

###### `duckdb_get_int8` {#docs:current:clients:c:value::duckdb_get_int8}

Returns the int8_t value of the given value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            int8_t duckdb_get_int8(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value containing a tinyint

####### Return Value {#docs:current:clients:c:value::return-value}

A int8_t, or MinValue<int8> if the value cannot be converted

<br>

###### `duckdb_get_uint8` {#docs:current:clients:c:value::duckdb_get_uint8}

Returns the uint8_t value of the given value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            uint8_t duckdb_get_uint8(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value containing a utinyint

####### Return Value {#docs:current:clients:c:value::return-value}

A uint8_t, or MinValue<uint8> if the value cannot be converted

<br>

###### `duckdb_get_int16` {#docs:current:clients:c:value::duckdb_get_int16}

Returns the int16_t value of the given value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            int16_t duckdb_get_int16(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value containing a smallint

####### Return Value {#docs:current:clients:c:value::return-value}

A int16_t, or MinValue<int16> if the value cannot be converted

<br>

###### `duckdb_get_uint16` {#docs:current:clients:c:value::duckdb_get_uint16}

Returns the uint16_t value of the given value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            uint16_t duckdb_get_uint16(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value containing a usmallint

####### Return Value {#docs:current:clients:c:value::return-value}

A uint16_t, or MinValue<uint16> if the value cannot be converted

<br>

###### `duckdb_get_int32` {#docs:current:clients:c:value::duckdb_get_int32}

Returns the int32_t value of the given value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            int32_t duckdb_get_int32(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value containing an integer

####### Return Value {#docs:current:clients:c:value::return-value}

A int32_t, or MinValue<int32> if the value cannot be converted

<br>

###### `duckdb_get_uint32` {#docs:current:clients:c:value::duckdb_get_uint32}

Returns the uint32_t value of the given value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            uint32_t duckdb_get_uint32(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value containing a uinteger

####### Return Value {#docs:current:clients:c:value::return-value}

A uint32_t, or MinValue<uint32> if the value cannot be converted

<br>

###### `duckdb_get_int64` {#docs:current:clients:c:value::duckdb_get_int64}

Returns the int64_t value of the given value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            int64_t duckdb_get_int64(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value containing a bigint

####### Return Value {#docs:current:clients:c:value::return-value}

A int64_t, or MinValue<int64> if the value cannot be converted

<br>

###### `duckdb_get_uint64` {#docs:current:clients:c:value::duckdb_get_uint64}

Returns the uint64_t value of the given value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            uint64_t duckdb_get_uint64(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value containing a ubigint

####### Return Value {#docs:current:clients:c:value::return-value}

A uint64_t, or MinValue<uint64> if the value cannot be converted

<br>

###### `duckdb_get_hugeint` {#docs:current:clients:c:value::duckdb_get_hugeint}

Returns the hugeint value of the given value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_hugeint duckdb_get_hugeint(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value containing a hugeint

####### Return Value {#docs:current:clients:c:value::return-value}

A duckdb_hugeint, or MinValue<hugeint> if the value cannot be converted

<br>

###### `duckdb_get_uhugeint` {#docs:current:clients:c:value::duckdb_get_uhugeint}

Returns the uhugeint value of the given value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_uhugeint duckdb_get_uhugeint(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value containing a uhugeint

####### Return Value {#docs:current:clients:c:value::return-value}

A duckdb_uhugeint, or MinValue<uhugeint> if the value cannot be converted

<br>

###### `duckdb_get_bignum` {#docs:current:clients:c:value::duckdb_get_bignum}

Returns the duckdb_bignum value of the given value.
The `data` field must be destroyed with `duckdb_free`.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_bignum duckdb_get_bignum(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value containing a BIGNUM

####### Return Value {#docs:current:clients:c:value::return-value}

A duckdb_bignum. The `data` field must be destroyed with `duckdb_free`.

<br>

###### `duckdb_get_decimal` {#docs:current:clients:c:value::duckdb_get_decimal}

Returns the duckdb_decimal value of the given value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_decimal duckdb_get_decimal(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value containing a DECIMAL

####### Return Value {#docs:current:clients:c:value::return-value}

A duckdb_decimal, or MinValue<decimal> if the value cannot be converted

<br>

###### `duckdb_get_float` {#docs:current:clients:c:value::duckdb_get_float}

Returns the float value of the given value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            float duckdb_get_float(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value containing a float

####### Return Value {#docs:current:clients:c:value::return-value}

A float, or NAN if the value cannot be converted

<br>

###### `duckdb_get_double` {#docs:current:clients:c:value::duckdb_get_double}

Returns the double value of the given value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            double duckdb_get_double(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value containing a double

####### Return Value {#docs:current:clients:c:value::return-value}

A double, or NAN if the value cannot be converted

<br>

###### `duckdb_get_date` {#docs:current:clients:c:value::duckdb_get_date}

Returns the date value of the given value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_date duckdb_get_date(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value containing a date

####### Return Value {#docs:current:clients:c:value::return-value}

A duckdb_date, or MinValue<date> if the value cannot be converted

<br>

###### `duckdb_get_time` {#docs:current:clients:c:value::duckdb_get_time}

Returns the time value of the given value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_time duckdb_get_time(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value containing a time

####### Return Value {#docs:current:clients:c:value::return-value}

A duckdb_time, or MinValue<time> if the value cannot be converted

<br>

###### `duckdb_get_time_ns` {#docs:current:clients:c:value::duckdb_get_time_ns}

Returns the time_ns value of the given value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_time_ns duckdb_get_time_ns(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value containing a time_ns

####### Return Value {#docs:current:clients:c:value::return-value}

A duckdb_time_ns, or MinValue<time_ns> if the value cannot be converted

<br>

###### `duckdb_get_time_tz` {#docs:current:clients:c:value::duckdb_get_time_tz}

Returns the time_tz value of the given value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_time_tz duckdb_get_time_tz(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value containing a time_tz

####### Return Value {#docs:current:clients:c:value::return-value}

A duckdb_time_tz, or MinValue<time_tz> if the value cannot be converted

<br>

###### `duckdb_get_timestamp` {#docs:current:clients:c:value::duckdb_get_timestamp}

Returns the TIMESTAMP value of the given value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_timestamp duckdb_get_timestamp(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value containing a TIMESTAMP

####### Return Value {#docs:current:clients:c:value::return-value}

A duckdb_timestamp, or MinValue<timestamp> if the value cannot be converted

<br>

###### `duckdb_get_timestamp_tz` {#docs:current:clients:c:value::duckdb_get_timestamp_tz}

Returns the TIMESTAMP_TZ value of the given value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_timestamp duckdb_get_timestamp_tz(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value containing a TIMESTAMP_TZ

####### Return Value {#docs:current:clients:c:value::return-value}

A duckdb_timestamp, or MinValue<timestamp_tz> if the value cannot be converted

<br>

###### `duckdb_get_timestamp_s` {#docs:current:clients:c:value::duckdb_get_timestamp_s}

Returns the duckdb_timestamp_s value of the given value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_timestamp_s duckdb_get_timestamp_s(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value containing a TIMESTAMP_S

####### Return Value {#docs:current:clients:c:value::return-value}

A duckdb_timestamp_s, or MinValue<timestamp_s> if the value cannot be converted

<br>

###### `duckdb_get_timestamp_ms` {#docs:current:clients:c:value::duckdb_get_timestamp_ms}

Returns the duckdb_timestamp_ms value of the given value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_timestamp_ms duckdb_get_timestamp_ms(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value containing a TIMESTAMP_MS

####### Return Value {#docs:current:clients:c:value::return-value}

A duckdb_timestamp_ms, or MinValue<timestamp_ms> if the value cannot be converted

<br>

###### `duckdb_get_timestamp_ns` {#docs:current:clients:c:value::duckdb_get_timestamp_ns}

Returns the duckdb_timestamp_ns value of the given value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_timestamp_ns duckdb_get_timestamp_ns(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value containing a TIMESTAMP_NS

####### Return Value {#docs:current:clients:c:value::return-value}

A duckdb_timestamp_ns, or MinValue<timestamp_ns> if the value cannot be converted

<br>

###### `duckdb_get_interval` {#docs:current:clients:c:value::duckdb_get_interval}

Returns the interval value of the given value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_interval duckdb_get_interval(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value containing a interval

####### Return Value {#docs:current:clients:c:value::return-value}

A duckdb_interval, or MinValue<interval> if the value cannot be converted

<br>

###### `duckdb_get_value_type` {#docs:current:clients:c:value::duckdb_get_value_type}

Returns the type of the given value. The type is valid as long as the value is not destroyed.
The type itself must not be destroyed.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_logical_type duckdb_get_value_type(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value

####### Return Value {#docs:current:clients:c:value::return-value}

A duckdb_logical_type.

<br>

###### `duckdb_get_blob` {#docs:current:clients:c:value::duckdb_get_blob}

Returns the blob value of the given value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_blob duckdb_get_blob(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value containing a blob

####### Return Value {#docs:current:clients:c:value::return-value}

A duckdb_blob

<br>

###### `duckdb_get_bit` {#docs:current:clients:c:value::duckdb_get_bit}

Returns the duckdb_bit value of the given value.
The `data` field must be destroyed with `duckdb_free`.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_bit duckdb_get_bit(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value containing a BIT

####### Return Value {#docs:current:clients:c:value::return-value}

A duckdb_bit

<br>

###### `duckdb_get_uuid` {#docs:current:clients:c:value::duckdb_get_uuid}

Returns a duckdb_uhugeint representing the UUID value of the given value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_uhugeint duckdb_get_uuid(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `val`: A duckdb_value containing a UUID

####### Return Value {#docs:current:clients:c:value::return-value}

A duckdb_uhugeint representing the UUID value

<br>

###### `duckdb_get_varchar` {#docs:current:clients:c:value::duckdb_get_varchar}

Obtains a string representation of the given value.
The result must be destroyed with `duckdb_free`.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            char *duckdb_get_varchar(
  duckdb_value value
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `value`: The value

####### Return Value {#docs:current:clients:c:value::return-value}

The string value. This must be destroyed with `duckdb_free`.

<br>

###### `duckdb_create_struct_value` {#docs:current:clients:c:value::duckdb_create_struct_value}

Creates a struct value from a type and an array of values. Must be destroyed with `duckdb_destroy_value`.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_struct_value(
  duckdb_logical_type type,
  duckdb_value *values
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `type`: The type of the struct
* `values`: The values for the struct fields

####### Return Value {#docs:current:clients:c:value::return-value}

The struct value, or nullptr, if any child type is `DUCKDB_TYPE_ANY` or `DUCKDB_TYPE_INVALID`.

<br>

###### `duckdb_create_list_value` {#docs:current:clients:c:value::duckdb_create_list_value}

Creates a list value from a child (element) type and an array of values of length `value_count`.
Must be destroyed with `duckdb_destroy_value`.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_list_value(
  duckdb_logical_type type,
  duckdb_value *values,
  idx_t value_count
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `type`: The type of the list
* `values`: The values for the list
* `value_count`: The number of values in the list

####### Return Value {#docs:current:clients:c:value::return-value}

The list value, or nullptr, if the child type is `DUCKDB_TYPE_ANY` or `DUCKDB_TYPE_INVALID`.

<br>

###### `duckdb_create_array_value` {#docs:current:clients:c:value::duckdb_create_array_value}

Creates an array value from a child (element) type and an array of values of length `value_count`.
Must be destroyed with `duckdb_destroy_value`.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_array_value(
  duckdb_logical_type type,
  duckdb_value *values,
  idx_t value_count
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `type`: The type of the array
* `values`: The values for the array
* `value_count`: The number of values in the array

####### Return Value {#docs:current:clients:c:value::return-value}

The array value, or nullptr, if the child type is `DUCKDB_TYPE_ANY` or `DUCKDB_TYPE_INVALID`.

<br>

###### `duckdb_create_map_value` {#docs:current:clients:c:value::duckdb_create_map_value}

Creates a map value from a map type and two arrays, one for the keys and one for the values, each of length
`entry_count`. Must be destroyed with `duckdb_destroy_value`.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_map_value(
  duckdb_logical_type map_type,
  duckdb_value *keys,
  duckdb_value *values,
  idx_t entry_count
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `map_type`: The map type
* `keys`: The keys of the map
* `values`: The values of the map
* `entry_count`: The number of entries (key-value pairs) in the map

####### Return Value {#docs:current:clients:c:value::return-value}

The map value, or nullptr, if the parameters are invalid.

<br>

###### `duckdb_create_union_value` {#docs:current:clients:c:value::duckdb_create_union_value}

Creates a union value from a union type, a tag index and a value.
Must be destroyed with `duckdb_destroy_value`.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_union_value(
  duckdb_logical_type union_type,
  idx_t tag_index,
  duckdb_value value
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `union_type`: The union type
* `tag_index`: The index of the tag of the union
* `value`: The value of the union for that tag

####### Return Value {#docs:current:clients:c:value::return-value}

The union value, or nullptr, if the parameters are invalid.

<br>

###### `duckdb_get_map_size` {#docs:current:clients:c:value::duckdb_get_map_size}

Returns the number of elements in a MAP value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            idx_t duckdb_get_map_size(
  duckdb_value value
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `value`: The MAP value.

####### Return Value {#docs:current:clients:c:value::return-value}

The number of elements in the map.

<br>

###### `duckdb_get_map_key` {#docs:current:clients:c:value::duckdb_get_map_key}

Returns the MAP key at index as a duckdb_value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_get_map_key(
  duckdb_value value,
  idx_t index
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `value`: The MAP value.
* `index`: The index of the key.

####### Return Value {#docs:current:clients:c:value::return-value}

The key as a duckdb_value.

<br>

###### `duckdb_get_map_value` {#docs:current:clients:c:value::duckdb_get_map_value}

Returns the MAP value at index as a duckdb_value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_get_map_value(
  duckdb_value value,
  idx_t index
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `value`: The MAP value.
* `index`: The index of the value.

####### Return Value {#docs:current:clients:c:value::return-value}

The value as a duckdb_value.

<br>

###### `duckdb_is_null_value` {#docs:current:clients:c:value::duckdb_is_null_value}

Returns whether the value's type is SQLNULL or not.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            bool duckdb_is_null_value(
  duckdb_value value
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `value`: The value to check.

####### Return Value {#docs:current:clients:c:value::return-value}

True, if the value's type is SQLNULL, otherwise false.

<br>

###### `duckdb_create_null_value` {#docs:current:clients:c:value::duckdb_create_null_value}

Creates a value of type SQLNULL.


####### Return Value {#docs:current:clients:c:value::return-value}

The duckdb_value representing SQLNULL. This must be destroyed with `duckdb_destroy_value`.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_null_value(

);
```

<br>

###### `duckdb_get_list_size` {#docs:current:clients:c:value::duckdb_get_list_size}

Returns the number of elements in a LIST value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            idx_t duckdb_get_list_size(
  duckdb_value value
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `value`: The LIST value.

####### Return Value {#docs:current:clients:c:value::return-value}

The number of elements in the list.

<br>

###### `duckdb_get_list_child` {#docs:current:clients:c:value::duckdb_get_list_child}

Returns the LIST child at index as a duckdb_value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_get_list_child(
  duckdb_value value,
  idx_t index
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `value`: The LIST value.
* `index`: The index of the child.

####### Return Value {#docs:current:clients:c:value::return-value}

The child as a duckdb_value.

<br>

###### `duckdb_create_enum_value` {#docs:current:clients:c:value::duckdb_create_enum_value}

Creates an enum value from a type and a value. Must be destroyed with `duckdb_destroy_value`.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_create_enum_value(
  duckdb_logical_type type,
  uint64_t value
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `type`: The type of the enum
* `value`: The value for the enum

####### Return Value {#docs:current:clients:c:value::return-value}

The enum value, or nullptr.

<br>

###### `duckdb_get_enum_value` {#docs:current:clients:c:value::duckdb_get_enum_value}

Returns the enum value of the given value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            uint64_t duckdb_get_enum_value(
  duckdb_value value
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `value`: A duckdb_value containing an enum

####### Return Value {#docs:current:clients:c:value::return-value}

A uint64_t, or MinValue<uint64> if the value cannot be converted

<br>

###### `duckdb_get_struct_child` {#docs:current:clients:c:value::duckdb_get_struct_child}

Returns the STRUCT child at index as a duckdb_value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            duckdb_value duckdb_get_struct_child(
  duckdb_value value,
  idx_t index
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `value`: The STRUCT value.
* `index`: The index of the child.

####### Return Value {#docs:current:clients:c:value::return-value}

The child as a duckdb_value.

<br>

###### `duckdb_value_to_string` {#docs:current:clients:c:value::duckdb_value_to_string}

Returns the SQL string representation of the given value.

####### Syntax {#docs:current:clients:c:value::syntax}

```c
            char *duckdb_value_to_string(
  duckdb_value value
);
```


####### Parameters {#docs:current:clients:c:value::parameters}

* `value`: A duckdb_value.

####### Return Value {#docs:current:clients:c:value::return-value}

The SQL string representation as a null-terminated string. The result must be freed with `duckdb_free`.

<br>

### Types {#docs:current:clients:c:types}

DuckDB is a strongly typed database system. As such, every column has a single type specified. This type is constant
over the entire column. That is to say, a column that is labeled as an `INTEGER` column will only contain `INTEGER`
values.

DuckDB also supports columns of composite types. For example, it is possible to define an array of integers (` INTEGER[]`). It is also possible to define types as arbitrary structs (` ROW(i INTEGER, j VARCHAR)`). For that reason, native DuckDB type objects are not mere enums, but a class that can potentially be nested.

Types in the C API are modeled using an enum (` duckdb_type`) and a complex class (` duckdb_logical_type`). For most primitive types, e.g., integers or varchars, the enum is sufficient. For more complex types, such as lists, structs or decimals, the logical type must be used.

```c
typedef enum DUCKDB_TYPE {
  DUCKDB_TYPE_INVALID = 0,
  DUCKDB_TYPE_BOOLEAN = 1,
  DUCKDB_TYPE_TINYINT = 2,
  DUCKDB_TYPE_SMALLINT = 3,
  DUCKDB_TYPE_INTEGER = 4,
  DUCKDB_TYPE_BIGINT = 5,
  DUCKDB_TYPE_UTINYINT = 6,
  DUCKDB_TYPE_USMALLINT = 7,
  DUCKDB_TYPE_UINTEGER = 8,
  DUCKDB_TYPE_UBIGINT = 9,
  DUCKDB_TYPE_FLOAT = 10,
  DUCKDB_TYPE_DOUBLE = 11,
  DUCKDB_TYPE_TIMESTAMP = 12,
  DUCKDB_TYPE_DATE = 13,
  DUCKDB_TYPE_TIME = 14,
  DUCKDB_TYPE_INTERVAL = 15,
  DUCKDB_TYPE_HUGEINT = 16,
  DUCKDB_TYPE_UHUGEINT = 32,
  DUCKDB_TYPE_VARCHAR = 17,
  DUCKDB_TYPE_BLOB = 18,
  DUCKDB_TYPE_DECIMAL = 19,
  DUCKDB_TYPE_TIMESTAMP_S = 20,
  DUCKDB_TYPE_TIMESTAMP_MS = 21,
  DUCKDB_TYPE_TIMESTAMP_NS = 22,
  DUCKDB_TYPE_ENUM = 23,
  DUCKDB_TYPE_LIST = 24,
  DUCKDB_TYPE_STRUCT = 25,
  DUCKDB_TYPE_MAP = 26,
  DUCKDB_TYPE_ARRAY = 33,
  DUCKDB_TYPE_UUID = 27,
  DUCKDB_TYPE_UNION = 28,
  DUCKDB_TYPE_BIT = 29,
  DUCKDB_TYPE_TIME_TZ = 30,
  DUCKDB_TYPE_TIMESTAMP_TZ = 31,
} duckdb_type;
```

#### Functions {#docs:current:clients:c:types::functions}

The enum type of a column in the result can be obtained using the `duckdb_column_type` function. The logical type of a column can be obtained using the `duckdb_column_logical_type` function.

##### `duckdb_value` {#docs:current:clients:c:types::duckdb_value}

The `duckdb_value` functions will auto-cast values as required. For example, it is no problem to use
`duckdb_value_double` on a column of type `duckdb_value_int32`. The value will be auto-cast and returned as a double.
Note that in certain cases the cast may fail. For example, this can happen if we request a `duckdb_value_int8` and the value does not fit within an `int8` value. In this case, a default value will be returned (usually `0` or `nullptr`). The same default value will also be returned if the corresponding value is `NULL`.

The `duckdb_value_is_null` function can be used to check if a specific value is `NULL` or not.

The exception to the auto-cast rule is the `duckdb_value_varchar_internal` function. This function does not auto-cast and only works for `VARCHAR` columns. The reason this function exists is that the result does not need to be freed.

> `duckdb_value_varchar` and `duckdb_value_blob` require the result to be de-allocated using `duckdb_free`.

##### `duckdb_fetch_chunk` {#docs:current:clients:c:types::duckdb_fetch_chunk}

The `duckdb_fetch_chunk` function can be used to read data chunks from a DuckDB result set, and is the most efficient way of reading data from a DuckDB result using the C API. It is also the only way of reading data of certain types from a DuckDB result. For example, the `duckdb_value` functions do not support structural reading of composite types (lists or structs) or more complex types like enums or decimals.

For more information about data chunks, see the [documentation on data chunks](#docs:current:clients:c:data_chunk).

#### API Reference Overview {#docs:current:clients:c:types::api-reference-overview}



```c
            duckdb_data_chunk duckdb_result_get_chunk(duckdb_result result, idx_t chunk_index);
bool duckdb_result_is_streaming(duckdb_result result);
idx_t duckdb_result_chunk_count(duckdb_result result);
duckdb_result_type duckdb_result_return_type(duckdb_result result);
```


##### Date Time Timestamp Helpers {#docs:current:clients:c:types::date-time-timestamp-helpers}

```c
            duckdb_date_struct duckdb_from_date(duckdb_date date);
duckdb_date duckdb_to_date(duckdb_date_struct date);
bool duckdb_is_finite_date(duckdb_date date);
duckdb_time_struct duckdb_from_time(duckdb_time time);
duckdb_time_tz duckdb_create_time_tz(int64_t micros, int32_t offset);
duckdb_time_tz_struct duckdb_from_time_tz(duckdb_time_tz micros);
duckdb_time duckdb_to_time(duckdb_time_struct time);
duckdb_timestamp_struct duckdb_from_timestamp(duckdb_timestamp ts);
duckdb_timestamp duckdb_to_timestamp(duckdb_timestamp_struct ts);
bool duckdb_is_finite_timestamp(duckdb_timestamp ts);
bool duckdb_is_finite_timestamp_s(duckdb_timestamp_s ts);
bool duckdb_is_finite_timestamp_ms(duckdb_timestamp_ms ts);
bool duckdb_is_finite_timestamp_ns(duckdb_timestamp_ns ts);
```


##### Hugeint Helpers {#docs:current:clients:c:types::hugeint-helpers}

```c
            double duckdb_hugeint_to_double(duckdb_hugeint val);
duckdb_hugeint duckdb_double_to_hugeint(double val);
```


##### Decimal Helpers {#docs:current:clients:c:types::decimal-helpers}

```c
            duckdb_decimal duckdb_double_to_decimal(double val, uint8_t width, uint8_t scale);
double duckdb_decimal_to_double(duckdb_decimal val);
```


##### Logical Type Interface {#docs:current:clients:c:types::logical-type-interface}

```c
            duckdb_logical_type duckdb_create_logical_type(duckdb_type type);
char *duckdb_logical_type_get_alias(duckdb_logical_type type);
void duckdb_logical_type_set_alias(duckdb_logical_type type, const char *alias);
duckdb_logical_type duckdb_create_list_type(duckdb_logical_type type);
duckdb_logical_type duckdb_create_array_type(duckdb_logical_type type, idx_t array_size);
duckdb_logical_type duckdb_create_map_type(duckdb_logical_type key_type, duckdb_logical_type value_type);
duckdb_logical_type duckdb_create_union_type(duckdb_logical_type *member_types, const char **member_names, idx_t member_count);
duckdb_logical_type duckdb_create_struct_type(duckdb_logical_type *member_types, const char **member_names, idx_t member_count);
duckdb_logical_type duckdb_create_enum_type(const char **member_names, idx_t member_count);
duckdb_logical_type duckdb_create_decimal_type(uint8_t width, uint8_t scale);
duckdb_type duckdb_get_type_id(duckdb_logical_type type);
uint8_t duckdb_decimal_width(duckdb_logical_type type);
uint8_t duckdb_decimal_scale(duckdb_logical_type type);
duckdb_type duckdb_decimal_internal_type(duckdb_logical_type type);
duckdb_type duckdb_enum_internal_type(duckdb_logical_type type);
uint32_t duckdb_enum_dictionary_size(duckdb_logical_type type);
char *duckdb_enum_dictionary_value(duckdb_logical_type type, idx_t index);
duckdb_logical_type duckdb_list_type_child_type(duckdb_logical_type type);
duckdb_logical_type duckdb_array_type_child_type(duckdb_logical_type type);
idx_t duckdb_array_type_array_size(duckdb_logical_type type);
duckdb_logical_type duckdb_map_type_key_type(duckdb_logical_type type);
duckdb_logical_type duckdb_map_type_value_type(duckdb_logical_type type);
idx_t duckdb_struct_type_child_count(duckdb_logical_type type);
char *duckdb_struct_type_child_name(duckdb_logical_type type, idx_t index);
duckdb_logical_type duckdb_struct_type_child_type(duckdb_logical_type type, idx_t index);
idx_t duckdb_union_type_member_count(duckdb_logical_type type);
char *duckdb_union_type_member_name(duckdb_logical_type type, idx_t index);
duckdb_logical_type duckdb_union_type_member_type(duckdb_logical_type type, idx_t index);
void duckdb_destroy_logical_type(duckdb_logical_type *type);
duckdb_state duckdb_register_logical_type(duckdb_connection con, duckdb_logical_type type, duckdb_create_type_info info);
```


###### `duckdb_result_get_chunk` {#docs:current:clients:c:types::duckdb_result_get_chunk}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.

Fetches a data chunk from the duckdb_result. This function should be called repeatedly until the result is exhausted.

The result must be destroyed with `duckdb_destroy_data_chunk`.

This function supersedes all `duckdb_value` functions, as well as the `duckdb_column_data` and `duckdb_nullmask_data`
functions. It results in significantly better performance, and should be preferred in newer code-bases.

If this function is used, none of the other result functions can be used and vice versa (i.e., this function cannot be
mixed with the legacy result functions).

Use `duckdb_result_chunk_count` to figure out how many chunks there are in the result.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_data_chunk duckdb_result_get_chunk(
  duckdb_result result,
  idx_t chunk_index
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `result`: The result object to fetch the data chunk from.
* `chunk_index`: The chunk index to fetch from.

####### Return Value {#docs:current:clients:c:types::return-value}

The resulting data chunk. Returns `NULL` if the chunk index is out of bounds.

<br>

###### `duckdb_result_is_streaming` {#docs:current:clients:c:types::duckdb_result_is_streaming}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.

Checks if the type of the internal result is StreamQueryResult.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            bool duckdb_result_is_streaming(
  duckdb_result result
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `result`: The result object to check.

####### Return Value {#docs:current:clients:c:types::return-value}

Whether or not the result object is of the type StreamQueryResult

<br>

###### `duckdb_result_chunk_count` {#docs:current:clients:c:types::duckdb_result_chunk_count}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.

Returns the number of data chunks present in the result.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            idx_t duckdb_result_chunk_count(
  duckdb_result result
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `result`: The result object

####### Return Value {#docs:current:clients:c:types::return-value}

Number of data chunks present in the result.

<br>

###### `duckdb_result_return_type` {#docs:current:clients:c:types::duckdb_result_return_type}

Returns the return_type of the given result, or DUCKDB_RETURN_TYPE_INVALID on error

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_result_type duckdb_result_return_type(
  duckdb_result result
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `result`: The result object

####### Return Value {#docs:current:clients:c:types::return-value}

The return_type

<br>

###### `duckdb_from_date` {#docs:current:clients:c:types::duckdb_from_date}

Decompose a `duckdb_date` object into year, month and date (stored as `duckdb_date_struct`).

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_date_struct duckdb_from_date(
  duckdb_date date
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `date`: The date object, as obtained from a `DUCKDB_TYPE_DATE` column.

####### Return Value {#docs:current:clients:c:types::return-value}

The `duckdb_date_struct` with the decomposed elements.

<br>

###### `duckdb_to_date` {#docs:current:clients:c:types::duckdb_to_date}

Re-compose a `duckdb_date` from year, month and date (` duckdb_date_struct`).

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_date duckdb_to_date(
  duckdb_date_struct date
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `date`: The year, month and date stored in a `duckdb_date_struct`.

####### Return Value {#docs:current:clients:c:types::return-value}

The `duckdb_date` element.

<br>

###### `duckdb_is_finite_date` {#docs:current:clients:c:types::duckdb_is_finite_date}

Test a `duckdb_date` to see if it is a finite value.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            bool duckdb_is_finite_date(
  duckdb_date date
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `date`: The date object, as obtained from a `DUCKDB_TYPE_DATE` column.

####### Return Value {#docs:current:clients:c:types::return-value}

True if the date is finite, false if it is ±infinity.

<br>

###### `duckdb_from_time` {#docs:current:clients:c:types::duckdb_from_time}

Decompose a `duckdb_time` object into hour, minute, second and microsecond (stored as `duckdb_time_struct`).

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_time_struct duckdb_from_time(
  duckdb_time time
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `time`: The time object, as obtained from a `DUCKDB_TYPE_TIME` column.

####### Return Value {#docs:current:clients:c:types::return-value}

The `duckdb_time_struct` with the decomposed elements.

<br>

###### `duckdb_create_time_tz` {#docs:current:clients:c:types::duckdb_create_time_tz}

Create a `duckdb_time_tz` object from micros and a timezone offset.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_time_tz duckdb_create_time_tz(
  int64_t micros,
  int32_t offset
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `micros`: The microsecond component of the time.
* `offset`: The timezone offset component of the time.

####### Return Value {#docs:current:clients:c:types::return-value}

The `duckdb_time_tz` element.

<br>

###### `duckdb_from_time_tz` {#docs:current:clients:c:types::duckdb_from_time_tz}

Decompose a TIME_TZ objects into micros and a timezone offset.

Use `duckdb_from_time` to further decompose the micros into hour, minute, second and microsecond.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_time_tz_struct duckdb_from_time_tz(
  duckdb_time_tz micros
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `micros`: The time object, as obtained from a `DUCKDB_TYPE_TIME_TZ` column.

<br>

###### `duckdb_to_time` {#docs:current:clients:c:types::duckdb_to_time}

Re-compose a `duckdb_time` from hour, minute, second and microsecond (` duckdb_time_struct`).

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_time duckdb_to_time(
  duckdb_time_struct time
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `time`: The hour, minute, second and microsecond in a `duckdb_time_struct`.

####### Return Value {#docs:current:clients:c:types::return-value}

The `duckdb_time` element.

<br>

###### `duckdb_from_timestamp` {#docs:current:clients:c:types::duckdb_from_timestamp}

Decompose a `duckdb_timestamp` object into a `duckdb_timestamp_struct`.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_timestamp_struct duckdb_from_timestamp(
  duckdb_timestamp ts
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `ts`: The ts object, as obtained from a `DUCKDB_TYPE_TIMESTAMP` column.

####### Return Value {#docs:current:clients:c:types::return-value}

The `duckdb_timestamp_struct` with the decomposed elements.

<br>

###### `duckdb_to_timestamp` {#docs:current:clients:c:types::duckdb_to_timestamp}

Re-compose a `duckdb_timestamp` from a duckdb_timestamp_struct.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_timestamp duckdb_to_timestamp(
  duckdb_timestamp_struct ts
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `ts`: The de-composed elements in a `duckdb_timestamp_struct`.

####### Return Value {#docs:current:clients:c:types::return-value}

The `duckdb_timestamp` element.

<br>

###### `duckdb_is_finite_timestamp` {#docs:current:clients:c:types::duckdb_is_finite_timestamp}

Test a `duckdb_timestamp` to see if it is a finite value.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            bool duckdb_is_finite_timestamp(
  duckdb_timestamp ts
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `ts`: The duckdb_timestamp object, as obtained from a `DUCKDB_TYPE_TIMESTAMP` column.

####### Return Value {#docs:current:clients:c:types::return-value}

True if the timestamp is finite, false if it is ±infinity.

<br>

###### `duckdb_is_finite_timestamp_s` {#docs:current:clients:c:types::duckdb_is_finite_timestamp_s}

Test a `duckdb_timestamp_s` to see if it is a finite value.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            bool duckdb_is_finite_timestamp_s(
  duckdb_timestamp_s ts
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `ts`: The duckdb_timestamp_s object, as obtained from a `DUCKDB_TYPE_TIMESTAMP_S` column.

####### Return Value {#docs:current:clients:c:types::return-value}

True if the timestamp is finite, false if it is ±infinity.

<br>

###### `duckdb_is_finite_timestamp_ms` {#docs:current:clients:c:types::duckdb_is_finite_timestamp_ms}

Test a `duckdb_timestamp_ms` to see if it is a finite value.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            bool duckdb_is_finite_timestamp_ms(
  duckdb_timestamp_ms ts
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `ts`: The duckdb_timestamp_ms object, as obtained from a `DUCKDB_TYPE_TIMESTAMP_MS` column.

####### Return Value {#docs:current:clients:c:types::return-value}

True if the timestamp is finite, false if it is ±infinity.

<br>

###### `duckdb_is_finite_timestamp_ns` {#docs:current:clients:c:types::duckdb_is_finite_timestamp_ns}

Test a `duckdb_timestamp_ns` to see if it is a finite value.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            bool duckdb_is_finite_timestamp_ns(
  duckdb_timestamp_ns ts
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `ts`: The duckdb_timestamp_ns object, as obtained from a `DUCKDB_TYPE_TIMESTAMP_NS` column.

####### Return Value {#docs:current:clients:c:types::return-value}

True if the timestamp is finite, false if it is ±infinity.

<br>

###### `duckdb_hugeint_to_double` {#docs:current:clients:c:types::duckdb_hugeint_to_double}

Converts a duckdb_hugeint object (as obtained from a `DUCKDB_TYPE_HUGEINT` column) into a double.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            double duckdb_hugeint_to_double(
  duckdb_hugeint val
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `val`: The hugeint value.

####### Return Value {#docs:current:clients:c:types::return-value}

The converted `double` element.

<br>

###### `duckdb_double_to_hugeint` {#docs:current:clients:c:types::duckdb_double_to_hugeint}

Converts a double value to a duckdb_hugeint object.

If the conversion fails because the double value is too big the result will be 0.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_hugeint duckdb_double_to_hugeint(
  double val
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `val`: The double value.

####### Return Value {#docs:current:clients:c:types::return-value}

The converted `duckdb_hugeint` element.

<br>

###### `duckdb_double_to_decimal` {#docs:current:clients:c:types::duckdb_double_to_decimal}

Converts a double value to a duckdb_decimal object.

If the conversion fails because the double value is too big, or the width/scale are invalid the result will be 0.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_decimal duckdb_double_to_decimal(
  double val,
  uint8_t width,
  uint8_t scale
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `val`: The double value.

####### Return Value {#docs:current:clients:c:types::return-value}

The converted `duckdb_decimal` element.

<br>

###### `duckdb_decimal_to_double` {#docs:current:clients:c:types::duckdb_decimal_to_double}

Converts a duckdb_decimal object (as obtained from a `DUCKDB_TYPE_DECIMAL` column) into a double.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            double duckdb_decimal_to_double(
  duckdb_decimal val
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `val`: The decimal value.

####### Return Value {#docs:current:clients:c:types::return-value}

The converted `double` element.

<br>

###### `duckdb_create_logical_type` {#docs:current:clients:c:types::duckdb_create_logical_type}

Creates a `duckdb_logical_type` from a primitive type.
The resulting logical type must be destroyed with `duckdb_destroy_logical_type`.

Returns an invalid logical type, if type is: `DUCKDB_TYPE_INVALID`, `DUCKDB_TYPE_DECIMAL`, `DUCKDB_TYPE_ENUM`,
`DUCKDB_TYPE_LIST`, `DUCKDB_TYPE_STRUCT`, `DUCKDB_TYPE_MAP`, `DUCKDB_TYPE_ARRAY`, or `DUCKDB_TYPE_UNION`.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_logical_type duckdb_create_logical_type(
  duckdb_type type
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `type`: The primitive type to create.

####### Return Value {#docs:current:clients:c:types::return-value}

The logical type.

<br>

###### `duckdb_logical_type_get_alias` {#docs:current:clients:c:types::duckdb_logical_type_get_alias}

Returns the alias of a duckdb_logical_type, if set, else `nullptr`.
The result must be destroyed with `duckdb_free`.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            char *duckdb_logical_type_get_alias(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `type`: The logical type

####### Return Value {#docs:current:clients:c:types::return-value}

The alias or `nullptr`

<br>

###### `duckdb_logical_type_set_alias` {#docs:current:clients:c:types::duckdb_logical_type_set_alias}

Sets the alias of a duckdb_logical_type.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            void duckdb_logical_type_set_alias(
  duckdb_logical_type type,
  const char *alias
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `type`: The logical type
* `alias`: The alias to set

<br>

###### `duckdb_create_list_type` {#docs:current:clients:c:types::duckdb_create_list_type}

Creates a LIST type from its child type.
The return type must be destroyed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_logical_type duckdb_create_list_type(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `type`: The child type of the list

####### Return Value {#docs:current:clients:c:types::return-value}

The logical type.

<br>

###### `duckdb_create_array_type` {#docs:current:clients:c:types::duckdb_create_array_type}

Creates an ARRAY type from its child type.
The return type must be destroyed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_logical_type duckdb_create_array_type(
  duckdb_logical_type type,
  idx_t array_size
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `type`: The child type of the array.
* `array_size`: The number of elements in the array.

####### Return Value {#docs:current:clients:c:types::return-value}

The logical type.

<br>

###### `duckdb_create_map_type` {#docs:current:clients:c:types::duckdb_create_map_type}

Creates a MAP type from its key type and value type.
The return type must be destroyed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_logical_type duckdb_create_map_type(
  duckdb_logical_type key_type,
  duckdb_logical_type value_type
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `key_type`: The map's key type.
* `value_type`: The map's value type.

####### Return Value {#docs:current:clients:c:types::return-value}

The logical type.

<br>

###### `duckdb_create_union_type` {#docs:current:clients:c:types::duckdb_create_union_type}

Creates a UNION type from the passed arrays.
The return type must be destroyed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_logical_type duckdb_create_union_type(
  duckdb_logical_type *member_types,
  const char **member_names,
  idx_t member_count
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `member_types`: The array of union member types.
* `member_names`: The union member names.
* `member_count`: The number of union members.

####### Return Value {#docs:current:clients:c:types::return-value}

The logical type.

<br>

###### `duckdb_create_struct_type` {#docs:current:clients:c:types::duckdb_create_struct_type}

Creates a STRUCT type based on the member types and names.
The resulting type must be destroyed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_logical_type duckdb_create_struct_type(
  duckdb_logical_type *member_types,
  const char **member_names,
  idx_t member_count
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `member_types`: The array of types of the struct members.
* `member_names`: The array of names of the struct members.
* `member_count`: The number of members of the struct.

####### Return Value {#docs:current:clients:c:types::return-value}

The logical type.

<br>

###### `duckdb_create_enum_type` {#docs:current:clients:c:types::duckdb_create_enum_type}

Creates an ENUM type from the passed member name array.
The resulting type should be destroyed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_logical_type duckdb_create_enum_type(
  const char **member_names,
  idx_t member_count
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `member_names`: The array of names that the enum should consist of.
* `member_count`: The number of elements that were specified in the array.

####### Return Value {#docs:current:clients:c:types::return-value}

The logical type.

<br>

###### `duckdb_create_decimal_type` {#docs:current:clients:c:types::duckdb_create_decimal_type}

Creates a DECIMAL type with the specified width and scale.
The resulting type should be destroyed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_logical_type duckdb_create_decimal_type(
  uint8_t width,
  uint8_t scale
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `width`: The width of the decimal type
* `scale`: The scale of the decimal type

####### Return Value {#docs:current:clients:c:types::return-value}

The logical type.

<br>

###### `duckdb_get_type_id` {#docs:current:clients:c:types::duckdb_get_type_id}

Retrieves the enum `duckdb_type` of a `duckdb_logical_type`.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_type duckdb_get_type_id(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `type`: The logical type.

####### Return Value {#docs:current:clients:c:types::return-value}

The `duckdb_type` id.

<br>

###### `duckdb_decimal_width` {#docs:current:clients:c:types::duckdb_decimal_width}

Retrieves the width of a decimal type.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            uint8_t duckdb_decimal_width(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `type`: The logical type object

####### Return Value {#docs:current:clients:c:types::return-value}

The width of the decimal type

<br>

###### `duckdb_decimal_scale` {#docs:current:clients:c:types::duckdb_decimal_scale}

Retrieves the scale of a decimal type.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            uint8_t duckdb_decimal_scale(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `type`: The logical type object

####### Return Value {#docs:current:clients:c:types::return-value}

The scale of the decimal type

<br>

###### `duckdb_decimal_internal_type` {#docs:current:clients:c:types::duckdb_decimal_internal_type}

Retrieves the internal storage type of a decimal type.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_type duckdb_decimal_internal_type(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `type`: The logical type object

####### Return Value {#docs:current:clients:c:types::return-value}

The internal type of the decimal type

<br>

###### `duckdb_enum_internal_type` {#docs:current:clients:c:types::duckdb_enum_internal_type}

Retrieves the internal storage type of an enum type.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_type duckdb_enum_internal_type(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `type`: The logical type object

####### Return Value {#docs:current:clients:c:types::return-value}

The internal type of the enum type

<br>

###### `duckdb_enum_dictionary_size` {#docs:current:clients:c:types::duckdb_enum_dictionary_size}

Retrieves the dictionary size of the enum type.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            uint32_t duckdb_enum_dictionary_size(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `type`: The logical type object

####### Return Value {#docs:current:clients:c:types::return-value}

The dictionary size of the enum type

<br>

###### `duckdb_enum_dictionary_value` {#docs:current:clients:c:types::duckdb_enum_dictionary_value}

Retrieves the dictionary value at the specified position from the enum.

The result must be freed with `duckdb_free`.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            char *duckdb_enum_dictionary_value(
  duckdb_logical_type type,
  idx_t index
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `type`: The logical type object
* `index`: The index in the dictionary

####### Return Value {#docs:current:clients:c:types::return-value}

The string value of the enum type. Must be freed with `duckdb_free`.

<br>

###### `duckdb_list_type_child_type` {#docs:current:clients:c:types::duckdb_list_type_child_type}

Retrieves the child type of the given LIST type. Also accepts MAP types.
The result must be freed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_logical_type duckdb_list_type_child_type(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `type`: The logical type, either LIST or MAP.

####### Return Value {#docs:current:clients:c:types::return-value}

The child type of the LIST or MAP type.

<br>

###### `duckdb_array_type_child_type` {#docs:current:clients:c:types::duckdb_array_type_child_type}

Retrieves the child type of the given ARRAY type.

The result must be freed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_logical_type duckdb_array_type_child_type(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `type`: The logical type. Must be ARRAY.

####### Return Value {#docs:current:clients:c:types::return-value}

The child type of the ARRAY type.

<br>

###### `duckdb_array_type_array_size` {#docs:current:clients:c:types::duckdb_array_type_array_size}

Retrieves the array size of the given array type.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            idx_t duckdb_array_type_array_size(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `type`: The logical type object

####### Return Value {#docs:current:clients:c:types::return-value}

The fixed number of elements the values of this array type can store.

<br>

###### `duckdb_map_type_key_type` {#docs:current:clients:c:types::duckdb_map_type_key_type}

Retrieves the key type of the given map type.

The result must be freed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_logical_type duckdb_map_type_key_type(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `type`: The logical type object

####### Return Value {#docs:current:clients:c:types::return-value}

The key type of the map type. Must be destroyed with `duckdb_destroy_logical_type`.

<br>

###### `duckdb_map_type_value_type` {#docs:current:clients:c:types::duckdb_map_type_value_type}

Retrieves the value type of the given map type.

The result must be freed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_logical_type duckdb_map_type_value_type(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `type`: The logical type object

####### Return Value {#docs:current:clients:c:types::return-value}

The value type of the map type. Must be destroyed with `duckdb_destroy_logical_type`.

<br>

###### `duckdb_struct_type_child_count` {#docs:current:clients:c:types::duckdb_struct_type_child_count}

Returns the number of children of a struct type.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            idx_t duckdb_struct_type_child_count(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `type`: The logical type object

####### Return Value {#docs:current:clients:c:types::return-value}

The number of children of a struct type.

<br>

###### `duckdb_struct_type_child_name` {#docs:current:clients:c:types::duckdb_struct_type_child_name}

Retrieves the name of the struct child.

The result must be freed with `duckdb_free`.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            char *duckdb_struct_type_child_name(
  duckdb_logical_type type,
  idx_t index
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `type`: The logical type object
* `index`: The child index

####### Return Value {#docs:current:clients:c:types::return-value}

The name of the struct type. Must be freed with `duckdb_free`.

<br>

###### `duckdb_struct_type_child_type` {#docs:current:clients:c:types::duckdb_struct_type_child_type}

Retrieves the child type of the given struct type at the specified index.

The result must be freed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_logical_type duckdb_struct_type_child_type(
  duckdb_logical_type type,
  idx_t index
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `type`: The logical type object
* `index`: The child index

####### Return Value {#docs:current:clients:c:types::return-value}

The child type of the struct type. Must be destroyed with `duckdb_destroy_logical_type`.

<br>

###### `duckdb_union_type_member_count` {#docs:current:clients:c:types::duckdb_union_type_member_count}

Returns the number of members that the union type has.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            idx_t duckdb_union_type_member_count(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `type`: The logical type (union) object

####### Return Value {#docs:current:clients:c:types::return-value}

The number of members of a union type.

<br>

###### `duckdb_union_type_member_name` {#docs:current:clients:c:types::duckdb_union_type_member_name}

Retrieves the name of the union member.

The result must be freed with `duckdb_free`.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            char *duckdb_union_type_member_name(
  duckdb_logical_type type,
  idx_t index
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `type`: The logical type object
* `index`: The child index

####### Return Value {#docs:current:clients:c:types::return-value}

The name of the union member. Must be freed with `duckdb_free`.

<br>

###### `duckdb_union_type_member_type` {#docs:current:clients:c:types::duckdb_union_type_member_type}

Retrieves the child type of the given union member at the specified index.

The result must be freed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_logical_type duckdb_union_type_member_type(
  duckdb_logical_type type,
  idx_t index
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `type`: The logical type object
* `index`: The child index

####### Return Value {#docs:current:clients:c:types::return-value}

The child type of the union member. Must be destroyed with `duckdb_destroy_logical_type`.

<br>

###### `duckdb_destroy_logical_type` {#docs:current:clients:c:types::duckdb_destroy_logical_type}

Destroys the logical type and de-allocates all memory allocated for that type.

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            void duckdb_destroy_logical_type(
  duckdb_logical_type *type
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `type`: The logical type to destroy.

<br>

###### `duckdb_register_logical_type` {#docs:current:clients:c:types::duckdb_register_logical_type}

Registers a custom type within the given connection.
The type must have an alias

####### Syntax {#docs:current:clients:c:types::syntax}

```c
            duckdb_state duckdb_register_logical_type(
  duckdb_connection con,
  duckdb_logical_type type,
  duckdb_create_type_info info
);
```


####### Parameters {#docs:current:clients:c:types::parameters}

* `con`: The connection to use
* `type`: The custom type to register

####### Return Value {#docs:current:clients:c:types::return-value}

Whether or not the registration was successful.

<br>

### Prepared Statements {#docs:current:clients:c:prepared}



A prepared statement is a parameterized query. The query is prepared with question marks (` ?`) or dollar symbols (` $1`) indicating the parameters of the query. Values can then be bound to these parameters, after which the prepared statement can be executed using those parameters. A single query can be prepared once and executed many times.

Prepared statements are useful to:

* Easily supply parameters to functions while avoiding string concatenation/SQL injection attacks.
* Speeding up queries that will be executed many times with different parameters.

DuckDB supports prepared statements in the C API with the `duckdb_prepare` method. The `duckdb_bind` family of functions is used to supply values for subsequent execution of the prepared statement using `duckdb_execute_prepared`. After we are done with the prepared statement it can be cleaned up using the `duckdb_destroy_prepare` method.

#### Example {#docs:current:clients:c:prepared::example}

```c
duckdb_prepared_statement stmt;
duckdb_result result;
if (duckdb_prepare(con, "INSERT INTO integers VALUES ($1, $2)", &stmt) == DuckDBError) {
    // handle error
}

duckdb_bind_int32(stmt, 1, 42); // the parameter index starts counting at 1!
duckdb_bind_int32(stmt, 2, 43);
// NULL as second parameter means no result set is requested
duckdb_execute_prepared(stmt, NULL);
duckdb_destroy_prepare(&stmt);

// we can also query result sets using prepared statements
if (duckdb_prepare(con, "SELECT * FROM integers WHERE i = ?", &stmt) == DuckDBError) {
    // handle error
}
duckdb_bind_int32(stmt, 1, 42);
duckdb_execute_prepared(stmt, &result);

// do something with result

// clean up
duckdb_destroy_result(&result);
duckdb_destroy_prepare(&stmt);
```

After calling `duckdb_prepare`, the prepared statement parameters can be inspected using `duckdb_nparams` and `duckdb_param_type`. In case the prepare fails, the error can be obtained through `duckdb_prepare_error`.

It is not required that the `duckdb_bind` family of functions matches the prepared statement parameter type exactly. The values will be auto-cast to the required value as required. For example, calling `duckdb_bind_int8` on a parameter type of `DUCKDB_TYPE_INTEGER` will work as expected.

> **Warning.** Do **not** use prepared statements to insert large amounts of data into DuckDB. Instead it is recommended to use the [Appender](#docs:current:clients:c:appender).

#### API Reference Overview {#docs:current:clients:c:prepared::api-reference-overview}



```c
            duckdb_state duckdb_prepare(duckdb_connection connection, const char *query, duckdb_prepared_statement *out_prepared_statement);
void duckdb_destroy_prepare(duckdb_prepared_statement *prepared_statement);
const char *duckdb_prepare_error(duckdb_prepared_statement prepared_statement);
idx_t duckdb_nparams(duckdb_prepared_statement prepared_statement);
const char *duckdb_parameter_name(duckdb_prepared_statement prepared_statement, idx_t index);
duckdb_type duckdb_param_type(duckdb_prepared_statement prepared_statement, idx_t param_idx);
duckdb_logical_type duckdb_param_logical_type(duckdb_prepared_statement prepared_statement, idx_t param_idx);
duckdb_state duckdb_clear_bindings(duckdb_prepared_statement prepared_statement);
duckdb_statement_type duckdb_prepared_statement_type(duckdb_prepared_statement statement);
idx_t duckdb_prepared_statement_column_count(duckdb_prepared_statement prepared_statement);
const char *duckdb_prepared_statement_column_name(duckdb_prepared_statement prepared_statement, idx_t col_idx);
duckdb_logical_type duckdb_prepared_statement_column_logical_type(duckdb_prepared_statement prepared_statement, idx_t col_idx);
duckdb_type duckdb_prepared_statement_column_type(duckdb_prepared_statement prepared_statement, idx_t col_idx);
```


###### `duckdb_prepare` {#docs:current:clients:c:prepared::duckdb_prepare}

Create a prepared statement object from a query.

Note that after calling `duckdb_prepare`, the prepared statement should always be destroyed using
`duckdb_destroy_prepare`, even if the prepare fails.

If the prepare fails, `duckdb_prepare_error` can be called to obtain the reason why the prepare failed.

####### Syntax {#docs:current:clients:c:prepared::syntax}

```c
            duckdb_state duckdb_prepare(
  duckdb_connection connection,
  const char *query,
  duckdb_prepared_statement *out_prepared_statement
);
```


####### Parameters {#docs:current:clients:c:prepared::parameters}

* `connection`: The connection object
* `query`: The SQL query to prepare
* `out_prepared_statement`: The resulting prepared statement object

####### Return Value {#docs:current:clients:c:prepared::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_destroy_prepare` {#docs:current:clients:c:prepared::duckdb_destroy_prepare}

Closes the prepared statement and de-allocates all memory allocated for the statement.

####### Syntax {#docs:current:clients:c:prepared::syntax}

```c
            void duckdb_destroy_prepare(
  duckdb_prepared_statement *prepared_statement
);
```


####### Parameters {#docs:current:clients:c:prepared::parameters}

* `prepared_statement`: The prepared statement to destroy.

<br>

###### `duckdb_prepare_error` {#docs:current:clients:c:prepared::duckdb_prepare_error}

Returns the error message associated with the given prepared statement.
If the prepared statement has no error message, this returns `nullptr` instead.

The error message should not be freed. It will be de-allocated when `duckdb_destroy_prepare` is called.

####### Syntax {#docs:current:clients:c:prepared::syntax}

```c
            const char *duckdb_prepare_error(
  duckdb_prepared_statement prepared_statement
);
```


####### Parameters {#docs:current:clients:c:prepared::parameters}

* `prepared_statement`: The prepared statement to obtain the error from.

####### Return Value {#docs:current:clients:c:prepared::return-value}

The error message, or `nullptr` if there is none.

<br>

###### `duckdb_nparams` {#docs:current:clients:c:prepared::duckdb_nparams}

Returns the number of parameters that can be provided to the given prepared statement.

Returns 0 if the query was not successfully prepared.

####### Syntax {#docs:current:clients:c:prepared::syntax}

```c
            idx_t duckdb_nparams(
  duckdb_prepared_statement prepared_statement
);
```


####### Parameters {#docs:current:clients:c:prepared::parameters}

* `prepared_statement`: The prepared statement to obtain the number of parameters for.

<br>

###### `duckdb_parameter_name` {#docs:current:clients:c:prepared::duckdb_parameter_name}

Returns the name used to identify the parameter
The returned string should be freed using `duckdb_free`.

Returns NULL if the index is out of range for the provided prepared statement.

####### Syntax {#docs:current:clients:c:prepared::syntax}

```c
            const char *duckdb_parameter_name(
  duckdb_prepared_statement prepared_statement,
  idx_t index
);
```


####### Parameters {#docs:current:clients:c:prepared::parameters}

* `prepared_statement`: The prepared statement for which to get the parameter name from.

<br>

###### `duckdb_param_type` {#docs:current:clients:c:prepared::duckdb_param_type}

Returns the parameter type for the parameter at the given index.

Returns `DUCKDB_TYPE_INVALID` if the parameter index is out of range or the statement was not successfully prepared.

####### Syntax {#docs:current:clients:c:prepared::syntax}

```c
            duckdb_type duckdb_param_type(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx
);
```


####### Parameters {#docs:current:clients:c:prepared::parameters}

* `prepared_statement`: The prepared statement.
* `param_idx`: The parameter index.

####### Return Value {#docs:current:clients:c:prepared::return-value}

The parameter type

<br>

###### `duckdb_param_logical_type` {#docs:current:clients:c:prepared::duckdb_param_logical_type}

Returns the logical type for the parameter at the given index.

Returns `nullptr` if the parameter index is out of range or the statement was not successfully prepared.

The return type of this call should be destroyed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:prepared::syntax}

```c
            duckdb_logical_type duckdb_param_logical_type(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx
);
```


####### Parameters {#docs:current:clients:c:prepared::parameters}

* `prepared_statement`: The prepared statement.
* `param_idx`: The parameter index.

####### Return Value {#docs:current:clients:c:prepared::return-value}

The logical type of the parameter

<br>

###### `duckdb_clear_bindings` {#docs:current:clients:c:prepared::duckdb_clear_bindings}

Clear the params bind to the prepared statement.

####### Syntax {#docs:current:clients:c:prepared::syntax}

```c
            duckdb_state duckdb_clear_bindings(
  duckdb_prepared_statement prepared_statement
);
```

<br>

###### `duckdb_prepared_statement_type` {#docs:current:clients:c:prepared::duckdb_prepared_statement_type}

Returns the statement type of the statement to be executed

####### Syntax {#docs:current:clients:c:prepared::syntax}

```c
            duckdb_statement_type duckdb_prepared_statement_type(
  duckdb_prepared_statement statement
);
```


####### Parameters {#docs:current:clients:c:prepared::parameters}

* `statement`: The prepared statement.

####### Return Value {#docs:current:clients:c:prepared::return-value}

duckdb_statement_type value or DUCKDB_STATEMENT_TYPE_INVALID

<br>

###### `duckdb_prepared_statement_column_count` {#docs:current:clients:c:prepared::duckdb_prepared_statement_column_count}

Returns the number of columns present in the result of the prepared statement. If any of the column types are invalid,
the result will be 1.

####### Syntax {#docs:current:clients:c:prepared::syntax}

```c
            idx_t duckdb_prepared_statement_column_count(
  duckdb_prepared_statement prepared_statement
);
```


####### Parameters {#docs:current:clients:c:prepared::parameters}

* `prepared_statement`: The prepared statement.

####### Return Value {#docs:current:clients:c:prepared::return-value}

The number of columns present in the result of the prepared statement.

<br>

###### `duckdb_prepared_statement_column_name` {#docs:current:clients:c:prepared::duckdb_prepared_statement_column_name}

Returns the name of the specified column of the result of the prepared_statement.
The returned string should be freed using `duckdb_free`.

Returns `nullptr` if the column is out of range.

####### Syntax {#docs:current:clients:c:prepared::syntax}

```c
            const char *duckdb_prepared_statement_column_name(
  duckdb_prepared_statement prepared_statement,
  idx_t col_idx
);
```


####### Parameters {#docs:current:clients:c:prepared::parameters}

* `prepared_statement`: The prepared statement.
* `col_idx`: The column index.

####### Return Value {#docs:current:clients:c:prepared::return-value}

The column name of the specified column.

<br>

###### `duckdb_prepared_statement_column_logical_type` {#docs:current:clients:c:prepared::duckdb_prepared_statement_column_logical_type}

Returns the column type of the specified column of the result of the prepared_statement.

Returns `DUCKDB_TYPE_INVALID` if the column is out of range.
The return type of this call should be destroyed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:prepared::syntax}

```c
            duckdb_logical_type duckdb_prepared_statement_column_logical_type(
  duckdb_prepared_statement prepared_statement,
  idx_t col_idx
);
```


####### Parameters {#docs:current:clients:c:prepared::parameters}

* `prepared_statement`: The prepared statement to fetch the column type from.
* `col_idx`: The column index.

####### Return Value {#docs:current:clients:c:prepared::return-value}

The logical type of the specified column.

<br>

###### `duckdb_prepared_statement_column_type` {#docs:current:clients:c:prepared::duckdb_prepared_statement_column_type}

Returns the column type of the specified column of the result of the prepared_statement.

Returns `DUCKDB_TYPE_INVALID` if the column is out of range.

####### Syntax {#docs:current:clients:c:prepared::syntax}

```c
            duckdb_type duckdb_prepared_statement_column_type(
  duckdb_prepared_statement prepared_statement,
  idx_t col_idx
);
```


####### Parameters {#docs:current:clients:c:prepared::parameters}

* `prepared_statement`: The prepared statement to fetch the column type from.
* `col_idx`: The column index.

####### Return Value {#docs:current:clients:c:prepared::return-value}

The type of the specified column.

<br>

### Appender {#docs:current:clients:c:appender}



Appenders are the most efficient way of loading data into DuckDB from within the C interface, and are recommended for
fast data loading. The appender is much faster than using prepared statements or individual `INSERT INTO` statements.

Appends are made in row-wise format. For every column, a `duckdb_append_[type]` call should be made, after which
the row should be finished by calling `duckdb_appender_end_row`. After all rows have been appended,
`duckdb_appender_destroy` should be used to finalize the appender and clean up the resulting memory.

Note that `duckdb_appender_destroy` should always be called on the resulting appender, even if the function returns
`DuckDBError`.

#### Example {#docs:current:clients:c:appender::example}

```c
duckdb_query(con, "CREATE TABLE people (id INTEGER, name VARCHAR)", NULL);

duckdb_appender appender;
if (duckdb_appender_create(con, NULL, "people", &appender) == DuckDBError) {
  // handle error
}
// append the first row (1, Mark)
duckdb_append_int32(appender, 1);
duckdb_append_varchar(appender, "Mark");
duckdb_appender_end_row(appender);

// append the second row (2, Hannes)
duckdb_append_int32(appender, 2);
duckdb_append_varchar(appender, "Hannes");
duckdb_appender_end_row(appender);

// finish appending and flush all the rows to the table
duckdb_appender_destroy(&appender);
```

#### API Reference Overview {#docs:current:clients:c:appender::api-reference-overview}



```c
            duckdb_state duckdb_appender_create(duckdb_connection connection, const char *schema, const char *table, duckdb_appender *out_appender);
duckdb_state duckdb_appender_create_ext(duckdb_connection connection, const char *catalog, const char *schema, const char *table, duckdb_appender *out_appender);
duckdb_state duckdb_appender_create_query(duckdb_connection connection, const char *query, idx_t column_count, duckdb_logical_type *types, const char *table_name, const char **column_names, duckdb_appender *out_appender);
idx_t duckdb_appender_column_count(duckdb_appender appender);
duckdb_logical_type duckdb_appender_column_type(duckdb_appender appender, idx_t col_idx);
const char *duckdb_appender_error(duckdb_appender appender);
duckdb_error_data duckdb_appender_error_data(duckdb_appender appender);
duckdb_state duckdb_appender_flush(duckdb_appender appender);
duckdb_state duckdb_appender_close(duckdb_appender appender);
duckdb_state duckdb_appender_destroy(duckdb_appender *appender);
duckdb_state duckdb_appender_add_column(duckdb_appender appender, const char *name);
duckdb_state duckdb_appender_clear_columns(duckdb_appender appender);
duckdb_state duckdb_appender_begin_row(duckdb_appender appender);
duckdb_state duckdb_appender_end_row(duckdb_appender appender);
duckdb_state duckdb_append_default(duckdb_appender appender);
duckdb_state duckdb_append_default_to_chunk(duckdb_appender appender, duckdb_data_chunk chunk, idx_t col, idx_t row);
duckdb_state duckdb_append_bool(duckdb_appender appender, bool value);
duckdb_state duckdb_append_int8(duckdb_appender appender, int8_t value);
duckdb_state duckdb_append_int16(duckdb_appender appender, int16_t value);
duckdb_state duckdb_append_int32(duckdb_appender appender, int32_t value);
duckdb_state duckdb_append_int64(duckdb_appender appender, int64_t value);
duckdb_state duckdb_append_hugeint(duckdb_appender appender, duckdb_hugeint value);
duckdb_state duckdb_append_uint8(duckdb_appender appender, uint8_t value);
duckdb_state duckdb_append_uint16(duckdb_appender appender, uint16_t value);
duckdb_state duckdb_append_uint32(duckdb_appender appender, uint32_t value);
duckdb_state duckdb_append_uint64(duckdb_appender appender, uint64_t value);
duckdb_state duckdb_append_uhugeint(duckdb_appender appender, duckdb_uhugeint value);
duckdb_state duckdb_append_float(duckdb_appender appender, float value);
duckdb_state duckdb_append_double(duckdb_appender appender, double value);
duckdb_state duckdb_append_date(duckdb_appender appender, duckdb_date value);
duckdb_state duckdb_append_time(duckdb_appender appender, duckdb_time value);
duckdb_state duckdb_append_timestamp(duckdb_appender appender, duckdb_timestamp value);
duckdb_state duckdb_append_interval(duckdb_appender appender, duckdb_interval value);
duckdb_state duckdb_append_varchar(duckdb_appender appender, const char *val);
duckdb_state duckdb_append_varchar_length(duckdb_appender appender, const char *val, idx_t length);
duckdb_state duckdb_append_blob(duckdb_appender appender, const void *data, idx_t length);
duckdb_state duckdb_append_null(duckdb_appender appender);
duckdb_state duckdb_append_value(duckdb_appender appender, duckdb_value value);
duckdb_state duckdb_append_data_chunk(duckdb_appender appender, duckdb_data_chunk chunk);
```


###### `duckdb_appender_create` {#docs:current:clients:c:appender::duckdb_appender_create}

Creates an appender object.

Note that the object must be destroyed with `duckdb_appender_destroy`.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_appender_create(
  duckdb_connection connection,
  const char *schema,
  const char *table,
  duckdb_appender *out_appender
);
```


####### Parameters {#docs:current:clients:c:appender::parameters}

* `connection`: The connection context to create the appender in.
* `schema`: The schema of the table to append to, or `nullptr` for the default schema.
* `table`: The table name to append to.
* `out_appender`: The resulting appender object.

####### Return Value {#docs:current:clients:c:appender::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_appender_create_ext` {#docs:current:clients:c:appender::duckdb_appender_create_ext}

Creates an appender object.

Note that the object must be destroyed with `duckdb_appender_destroy`.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_appender_create_ext(
  duckdb_connection connection,
  const char *catalog,
  const char *schema,
  const char *table,
  duckdb_appender *out_appender
);
```


####### Parameters {#docs:current:clients:c:appender::parameters}

* `connection`: The connection context to create the appender in.
* `catalog`: The catalog of the table to append to, or `nullptr` for the default catalog.
* `schema`: The schema of the table to append to, or `nullptr` for the default schema.
* `table`: The table name to append to.
* `out_appender`: The resulting appender object.

####### Return Value {#docs:current:clients:c:appender::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_appender_create_query` {#docs:current:clients:c:appender::duckdb_appender_create_query}

Creates an appender object that executes the given query with any data appended to it.

Note that the object must be destroyed with `duckdb_appender_destroy`.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_appender_create_query(
  duckdb_connection connection,
  const char *query,
  idx_t column_count,
  duckdb_logical_type *types,
  const char *table_name,
  const char **column_names,
  duckdb_appender *out_appender
);
```


####### Parameters {#docs:current:clients:c:appender::parameters}

* `connection`: The connection context to create the appender in.
* `query`: The query to execute, can be an INSERT, DELETE, UPDATE or MERGE INTO statement.
* `column_count`: The number of columns to append.
* `types`: The types of the columns to append.
* `table_name`: (optionally) the table name used to refer to the appended data, defaults to "appended_data".
* `column_names`: (optionally) the list of column names, defaults to "col1", "col2", ...
* `out_appender`: The resulting appender object.

####### Return Value {#docs:current:clients:c:appender::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_appender_column_count` {#docs:current:clients:c:appender::duckdb_appender_column_count}

Returns the number of columns that belong to the appender.
If there is no active column list, then this equals the table's physical columns.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            idx_t duckdb_appender_column_count(
  duckdb_appender appender
);
```


####### Parameters {#docs:current:clients:c:appender::parameters}

* `appender`: The appender to get the column count from.

####### Return Value {#docs:current:clients:c:appender::return-value}

The number of columns in the data chunks.

<br>

###### `duckdb_appender_column_type` {#docs:current:clients:c:appender::duckdb_appender_column_type}

Returns the type of the column at the specified index. This is either a type in the active column list, or the same type
as a column in the receiving table.

Note: The resulting type must be destroyed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_logical_type duckdb_appender_column_type(
  duckdb_appender appender,
  idx_t col_idx
);
```


####### Parameters {#docs:current:clients:c:appender::parameters}

* `appender`: The appender to get the column type from.
* `col_idx`: The index of the column to get the type of.

####### Return Value {#docs:current:clients:c:appender::return-value}

The `duckdb_logical_type` of the column.

<br>

###### `duckdb_appender_error` {#docs:current:clients:c:appender::duckdb_appender_error}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.
Use duckdb_appender_error_data instead.

Returns the error message associated with the appender.
If the appender has no error message, this returns `nullptr` instead.

The error message should not be freed. It will be de-allocated when `duckdb_appender_destroy` is called.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            const char *duckdb_appender_error(
  duckdb_appender appender
);
```


####### Parameters {#docs:current:clients:c:appender::parameters}

* `appender`: The appender to get the error from.

####### Return Value {#docs:current:clients:c:appender::return-value}

The error message, or `nullptr` if there is none.

<br>

###### `duckdb_appender_error_data` {#docs:current:clients:c:appender::duckdb_appender_error_data}

Returns the error data associated with the appender.
Must be destroyed with duckdb_destroy_error_data.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_error_data duckdb_appender_error_data(
  duckdb_appender appender
);
```


####### Parameters {#docs:current:clients:c:appender::parameters}

* `appender`: The appender to get the error data from.

####### Return Value {#docs:current:clients:c:appender::return-value}

The error data.

<br>

###### `duckdb_appender_flush` {#docs:current:clients:c:appender::duckdb_appender_flush}

Flush the appender to the table, forcing the cache of the appender to be cleared. If flushing the data triggers a
constraint violation or any other error, then all data is invalidated, and this function returns DuckDBError.
It is not possible to append more values. Call duckdb_appender_error_data to obtain the error data followed by
duckdb_appender_destroy to destroy the invalidated appender.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_appender_flush(
  duckdb_appender appender
);
```


####### Parameters {#docs:current:clients:c:appender::parameters}

* `appender`: The appender to flush.

####### Return Value {#docs:current:clients:c:appender::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_appender_close` {#docs:current:clients:c:appender::duckdb_appender_close}

Closes the appender by flushing all intermediate states and closing it for further appends. If flushing the data
triggers a constraint violation or any other error, then all data is invalidated, and this function returns DuckDBError.
Call duckdb_appender_error_data to obtain the error data followed by duckdb_appender_destroy to destroy the invalidated
appender.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_appender_close(
  duckdb_appender appender
);
```


####### Parameters {#docs:current:clients:c:appender::parameters}

* `appender`: The appender to flush and close.

####### Return Value {#docs:current:clients:c:appender::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_appender_destroy` {#docs:current:clients:c:appender::duckdb_appender_destroy}

Closes the appender by flushing all intermediate states to the table and destroying it. By destroying it, this function
de-allocates all memory associated with the appender. If flushing the data triggers a constraint violation,
then all data is invalidated, and this function returns DuckDBError. Due to the destruction of the appender, it is no
longer possible to obtain the specific error message with duckdb_appender_error. Therefore, call duckdb_appender_close
before destroying the appender, if you need insights into the specific error.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_appender_destroy(
  duckdb_appender *appender
);
```


####### Parameters {#docs:current:clients:c:appender::parameters}

* `appender`: The appender to flush, close and destroy.

####### Return Value {#docs:current:clients:c:appender::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_appender_add_column` {#docs:current:clients:c:appender::duckdb_appender_add_column}

Appends a column to the active column list of the appender. Immediately flushes all previous data.

The active column list specifies all columns that are expected when flushing the data. Any non-active columns are filled
with their default values, or NULL.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_appender_add_column(
  duckdb_appender appender,
  const char *name
);
```


####### Parameters {#docs:current:clients:c:appender::parameters}

* `appender`: The appender to add the column to.

####### Return Value {#docs:current:clients:c:appender::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_appender_clear_columns` {#docs:current:clients:c:appender::duckdb_appender_clear_columns}

Removes all columns from the active column list of the appender, resetting the appender to treat all columns as active.
Immediately flushes all previous data.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_appender_clear_columns(
  duckdb_appender appender
);
```


####### Parameters {#docs:current:clients:c:appender::parameters}

* `appender`: The appender to clear the columns from.

####### Return Value {#docs:current:clients:c:appender::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_appender_begin_row` {#docs:current:clients:c:appender::duckdb_appender_begin_row}

A nop function, provided for backwards compatibility reasons. Does nothing. Only `duckdb_appender_end_row` is required.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_appender_begin_row(
  duckdb_appender appender
);
```

<br>

###### `duckdb_appender_end_row` {#docs:current:clients:c:appender::duckdb_appender_end_row}

Finish the current row of appends. After end_row is called, the next row can be appended.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_appender_end_row(
  duckdb_appender appender
);
```


####### Parameters {#docs:current:clients:c:appender::parameters}

* `appender`: The appender.

####### Return Value {#docs:current:clients:c:appender::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_append_default` {#docs:current:clients:c:appender::duckdb_append_default}

Append a DEFAULT value (NULL if DEFAULT not available for column) to the appender.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_append_default(
  duckdb_appender appender
);
```

<br>

###### `duckdb_append_default_to_chunk` {#docs:current:clients:c:appender::duckdb_append_default_to_chunk}

Append a DEFAULT value, at the specified row and column, (NULL if DEFAULT not available for column) to the chunk created
from the specified appender. The default value of the column must be a constant value. Non-deterministic expressions
like nextval('seq') or random() are not supported.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_append_default_to_chunk(
  duckdb_appender appender,
  duckdb_data_chunk chunk,
  idx_t col,
  idx_t row
);
```


####### Parameters {#docs:current:clients:c:appender::parameters}

* `appender`: The appender to get the default value from.
* `chunk`: The data chunk to append the default value to.
* `col`: The chunk column index to append the default value to.
* `row`: The chunk row index to append the default value to.

####### Return Value {#docs:current:clients:c:appender::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_append_bool` {#docs:current:clients:c:appender::duckdb_append_bool}

Append a bool value to the appender.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_append_bool(
  duckdb_appender appender,
  bool value
);
```

<br>

###### `duckdb_append_int8` {#docs:current:clients:c:appender::duckdb_append_int8}

Append an int8_t value to the appender.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_append_int8(
  duckdb_appender appender,
  int8_t value
);
```

<br>

###### `duckdb_append_int16` {#docs:current:clients:c:appender::duckdb_append_int16}

Append an int16_t value to the appender.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_append_int16(
  duckdb_appender appender,
  int16_t value
);
```

<br>

###### `duckdb_append_int32` {#docs:current:clients:c:appender::duckdb_append_int32}

Append an int32_t value to the appender.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_append_int32(
  duckdb_appender appender,
  int32_t value
);
```

<br>

###### `duckdb_append_int64` {#docs:current:clients:c:appender::duckdb_append_int64}

Append an int64_t value to the appender.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_append_int64(
  duckdb_appender appender,
  int64_t value
);
```

<br>

###### `duckdb_append_hugeint` {#docs:current:clients:c:appender::duckdb_append_hugeint}

Append a duckdb_hugeint value to the appender.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_append_hugeint(
  duckdb_appender appender,
  duckdb_hugeint value
);
```

<br>

###### `duckdb_append_uint8` {#docs:current:clients:c:appender::duckdb_append_uint8}

Append a uint8_t value to the appender.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_append_uint8(
  duckdb_appender appender,
  uint8_t value
);
```

<br>

###### `duckdb_append_uint16` {#docs:current:clients:c:appender::duckdb_append_uint16}

Append a uint16_t value to the appender.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_append_uint16(
  duckdb_appender appender,
  uint16_t value
);
```

<br>

###### `duckdb_append_uint32` {#docs:current:clients:c:appender::duckdb_append_uint32}

Append a uint32_t value to the appender.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_append_uint32(
  duckdb_appender appender,
  uint32_t value
);
```

<br>

###### `duckdb_append_uint64` {#docs:current:clients:c:appender::duckdb_append_uint64}

Append a uint64_t value to the appender.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_append_uint64(
  duckdb_appender appender,
  uint64_t value
);
```

<br>

###### `duckdb_append_uhugeint` {#docs:current:clients:c:appender::duckdb_append_uhugeint}

Append a duckdb_uhugeint value to the appender.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_append_uhugeint(
  duckdb_appender appender,
  duckdb_uhugeint value
);
```

<br>

###### `duckdb_append_float` {#docs:current:clients:c:appender::duckdb_append_float}

Append a float value to the appender.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_append_float(
  duckdb_appender appender,
  float value
);
```

<br>

###### `duckdb_append_double` {#docs:current:clients:c:appender::duckdb_append_double}

Append a double value to the appender.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_append_double(
  duckdb_appender appender,
  double value
);
```

<br>

###### `duckdb_append_date` {#docs:current:clients:c:appender::duckdb_append_date}

Append a duckdb_date value to the appender.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_append_date(
  duckdb_appender appender,
  duckdb_date value
);
```

<br>

###### `duckdb_append_time` {#docs:current:clients:c:appender::duckdb_append_time}

Append a duckdb_time value to the appender.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_append_time(
  duckdb_appender appender,
  duckdb_time value
);
```

<br>

###### `duckdb_append_timestamp` {#docs:current:clients:c:appender::duckdb_append_timestamp}

Append a duckdb_timestamp value to the appender.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_append_timestamp(
  duckdb_appender appender,
  duckdb_timestamp value
);
```

<br>

###### `duckdb_append_interval` {#docs:current:clients:c:appender::duckdb_append_interval}

Append a duckdb_interval value to the appender.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_append_interval(
  duckdb_appender appender,
  duckdb_interval value
);
```

<br>

###### `duckdb_append_varchar` {#docs:current:clients:c:appender::duckdb_append_varchar}

Append a varchar value to the appender.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_append_varchar(
  duckdb_appender appender,
  const char *val
);
```

<br>

###### `duckdb_append_varchar_length` {#docs:current:clients:c:appender::duckdb_append_varchar_length}

Append a varchar value to the appender.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_append_varchar_length(
  duckdb_appender appender,
  const char *val,
  idx_t length
);
```

<br>

###### `duckdb_append_blob` {#docs:current:clients:c:appender::duckdb_append_blob}

Append a blob value to the appender.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_append_blob(
  duckdb_appender appender,
  const void *data,
  idx_t length
);
```

<br>

###### `duckdb_append_null` {#docs:current:clients:c:appender::duckdb_append_null}

Append a NULL value to the appender (of any type).

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_append_null(
  duckdb_appender appender
);
```

<br>

###### `duckdb_append_value` {#docs:current:clients:c:appender::duckdb_append_value}

Append a duckdb_value to the appender.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_append_value(
  duckdb_appender appender,
  duckdb_value value
);
```

<br>

###### `duckdb_append_data_chunk` {#docs:current:clients:c:appender::duckdb_append_data_chunk}

Appends a pre-filled data chunk to the specified appender.
 Attempts casting, if the data chunk types do not match the active appender types.

####### Syntax {#docs:current:clients:c:appender::syntax}

```c
            duckdb_state duckdb_append_data_chunk(
  duckdb_appender appender,
  duckdb_data_chunk chunk
);
```


####### Parameters {#docs:current:clients:c:appender::parameters}

* `appender`: The appender to append to.
* `chunk`: The data chunk to append.

####### Return Value {#docs:current:clients:c:appender::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

### Table Functions {#docs:current:clients:c:table_functions}



The table function API can be used to define a table function that can then be called from within DuckDB in the `FROM` clause of a query.

#### API Reference Overview {#docs:current:clients:c:table_functions::api-reference-overview}



```c
            duckdb_table_function duckdb_create_table_function();
void duckdb_destroy_table_function(duckdb_table_function *table_function);
void duckdb_table_function_set_name(duckdb_table_function table_function, const char *name);
void duckdb_table_function_add_parameter(duckdb_table_function table_function, duckdb_logical_type type);
void duckdb_table_function_add_named_parameter(duckdb_table_function table_function, const char *name, duckdb_logical_type type);
void duckdb_table_function_set_extra_info(duckdb_table_function table_function, void *extra_info, duckdb_delete_callback_t destroy);
void duckdb_table_function_set_bind(duckdb_table_function table_function, duckdb_table_function_bind_t bind);
void duckdb_table_function_set_init(duckdb_table_function table_function, duckdb_table_function_init_t init);
void duckdb_table_function_set_local_init(duckdb_table_function table_function, duckdb_table_function_init_t init);
void duckdb_table_function_set_function(duckdb_table_function table_function, duckdb_table_function_t function);
void duckdb_table_function_supports_projection_pushdown(duckdb_table_function table_function, bool pushdown);
duckdb_state duckdb_register_table_function(duckdb_connection con, duckdb_table_function function);
```


##### Table Function Bind {#docs:current:clients:c:table_functions::table-function-bind}

```c
            void *duckdb_bind_get_extra_info(duckdb_bind_info info);
void duckdb_table_function_get_client_context(duckdb_bind_info info, duckdb_client_context *out_context);
void duckdb_bind_add_result_column(duckdb_bind_info info, const char *name, duckdb_logical_type type);
idx_t duckdb_bind_get_parameter_count(duckdb_bind_info info);
duckdb_value duckdb_bind_get_parameter(duckdb_bind_info info, idx_t index);
duckdb_value duckdb_bind_get_named_parameter(duckdb_bind_info info, const char *name);
void duckdb_bind_set_bind_data(duckdb_bind_info info, void *bind_data, duckdb_delete_callback_t destroy);
void duckdb_bind_set_cardinality(duckdb_bind_info info, idx_t cardinality, bool is_exact);
void duckdb_bind_set_error(duckdb_bind_info info, const char *error);
```


##### Table Function Init {#docs:current:clients:c:table_functions::table-function-init}

```c
            void *duckdb_init_get_extra_info(duckdb_init_info info);
void *duckdb_init_get_bind_data(duckdb_init_info info);
void duckdb_init_set_init_data(duckdb_init_info info, void *init_data, duckdb_delete_callback_t destroy);
idx_t duckdb_init_get_column_count(duckdb_init_info info);
idx_t duckdb_init_get_column_index(duckdb_init_info info, idx_t column_index);
void duckdb_init_set_max_threads(duckdb_init_info info, idx_t max_threads);
void duckdb_init_set_error(duckdb_init_info info, const char *error);
```


##### Table Function {#docs:current:clients:c:table_functions::table-function}

```c
            void *duckdb_function_get_extra_info(duckdb_function_info info);
void *duckdb_function_get_bind_data(duckdb_function_info info);
void *duckdb_function_get_init_data(duckdb_function_info info);
void *duckdb_function_get_local_init_data(duckdb_function_info info);
void duckdb_function_set_error(duckdb_function_info info, const char *error);
```


###### `duckdb_create_table_function` {#docs:current:clients:c:table_functions::duckdb_create_table_function}

Creates a new empty table function.

The return value should be destroyed with `duckdb_destroy_table_function`.


####### Return Value {#docs:current:clients:c:table_functions::return-value}

The table function object.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            duckdb_table_function duckdb_create_table_function(

);
```

<br>

###### `duckdb_destroy_table_function` {#docs:current:clients:c:table_functions::duckdb_destroy_table_function}

Destroys the given table function object.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            void duckdb_destroy_table_function(
  duckdb_table_function *table_function
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `table_function`: The table function to destroy

<br>

###### `duckdb_table_function_set_name` {#docs:current:clients:c:table_functions::duckdb_table_function_set_name}

Sets the name of the given table function.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            void duckdb_table_function_set_name(
  duckdb_table_function table_function,
  const char *name
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `table_function`: The table function
* `name`: The name of the table function

<br>

###### `duckdb_table_function_add_parameter` {#docs:current:clients:c:table_functions::duckdb_table_function_add_parameter}

Adds a parameter to the table function.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            void duckdb_table_function_add_parameter(
  duckdb_table_function table_function,
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `table_function`: The table function.
* `type`: The parameter type. Cannot contain INVALID.

<br>

###### `duckdb_table_function_add_named_parameter` {#docs:current:clients:c:table_functions::duckdb_table_function_add_named_parameter}

Adds a named parameter to the table function.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            void duckdb_table_function_add_named_parameter(
  duckdb_table_function table_function,
  const char *name,
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `table_function`: The table function.
* `name`: The parameter name.
* `type`: The parameter type. Cannot contain INVALID.

<br>

###### `duckdb_table_function_set_extra_info` {#docs:current:clients:c:table_functions::duckdb_table_function_set_extra_info}

Assigns extra information to the table function that can be fetched during binding, etc.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            void duckdb_table_function_set_extra_info(
  duckdb_table_function table_function,
  void *extra_info,
  duckdb_delete_callback_t destroy
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `table_function`: The table function
* `extra_info`: The extra information
* `destroy`: The callback that will be called to destroy the extra information (if any)

<br>

###### `duckdb_table_function_set_bind` {#docs:current:clients:c:table_functions::duckdb_table_function_set_bind}

Sets the bind function of the table function.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            void duckdb_table_function_set_bind(
  duckdb_table_function table_function,
  duckdb_table_function_bind_t bind
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `table_function`: The table function
* `bind`: The bind function

<br>

###### `duckdb_table_function_set_init` {#docs:current:clients:c:table_functions::duckdb_table_function_set_init}

Sets the init function of the table function.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            void duckdb_table_function_set_init(
  duckdb_table_function table_function,
  duckdb_table_function_init_t init
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `table_function`: The table function
* `init`: The init function

<br>

###### `duckdb_table_function_set_local_init` {#docs:current:clients:c:table_functions::duckdb_table_function_set_local_init}

Sets the thread-local init function of the table function.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            void duckdb_table_function_set_local_init(
  duckdb_table_function table_function,
  duckdb_table_function_init_t init
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `table_function`: The table function
* `init`: The init function

<br>

###### `duckdb_table_function_set_function` {#docs:current:clients:c:table_functions::duckdb_table_function_set_function}

Sets the main function of the table function.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            void duckdb_table_function_set_function(
  duckdb_table_function table_function,
  duckdb_table_function_t function
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `table_function`: The table function
* `function`: The function

<br>

###### `duckdb_table_function_supports_projection_pushdown` {#docs:current:clients:c:table_functions::duckdb_table_function_supports_projection_pushdown}

Sets whether or not the given table function supports projection pushdown.

If this is set to true, the system will provide a list of all required columns in the `init` stage through
the `duckdb_init_get_column_count` and `duckdb_init_get_column_index` functions.
If this is set to false (the default), the system will expect all columns to be projected.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            void duckdb_table_function_supports_projection_pushdown(
  duckdb_table_function table_function,
  bool pushdown
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `table_function`: The table function
* `pushdown`: True if the table function supports projection pushdown, false otherwise.

<br>

###### `duckdb_register_table_function` {#docs:current:clients:c:table_functions::duckdb_register_table_function}

Register the table function object within the given connection.

The function requires at least a name, a bind function, an init function and a main function.

If the function is incomplete or a function with this name already exists DuckDBError is returned.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            duckdb_state duckdb_register_table_function(
  duckdb_connection con,
  duckdb_table_function function
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `con`: The connection to register it in.
* `function`: The function pointer

####### Return Value {#docs:current:clients:c:table_functions::return-value}

Whether or not the registration was successful.

<br>

###### `duckdb_bind_get_extra_info` {#docs:current:clients:c:table_functions::duckdb_bind_get_extra_info}

Retrieves the extra info of the function as set in `duckdb_table_function_set_extra_info`.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            void *duckdb_bind_get_extra_info(
  duckdb_bind_info info
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `info`: The info object

####### Return Value {#docs:current:clients:c:table_functions::return-value}

The extra info

<br>

###### `duckdb_table_function_get_client_context` {#docs:current:clients:c:table_functions::duckdb_table_function_get_client_context}

Retrieves the client context of the bind info of a table function.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            void duckdb_table_function_get_client_context(
  duckdb_bind_info info,
  duckdb_client_context *out_context
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `info`: The bind info object of the table function.
* `out_context`: The client context of the bind info. Must be destroyed with `duckdb_destroy_client_context`.

<br>

###### `duckdb_bind_add_result_column` {#docs:current:clients:c:table_functions::duckdb_bind_add_result_column}

Adds a result column to the output of the table function.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            void duckdb_bind_add_result_column(
  duckdb_bind_info info,
  const char *name,
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `info`: The table function's bind info.
* `name`: The column name.
* `type`: The logical column type.

<br>

###### `duckdb_bind_get_parameter_count` {#docs:current:clients:c:table_functions::duckdb_bind_get_parameter_count}

Retrieves the number of regular (non-named) parameters to the function.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            idx_t duckdb_bind_get_parameter_count(
  duckdb_bind_info info
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `info`: The info object

####### Return Value {#docs:current:clients:c:table_functions::return-value}

The number of parameters

<br>

###### `duckdb_bind_get_parameter` {#docs:current:clients:c:table_functions::duckdb_bind_get_parameter}

Retrieves the parameter at the given index.

The result must be destroyed with `duckdb_destroy_value`.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            duckdb_value duckdb_bind_get_parameter(
  duckdb_bind_info info,
  idx_t index
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `info`: The info object
* `index`: The index of the parameter to get

####### Return Value {#docs:current:clients:c:table_functions::return-value}

The value of the parameter. Must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_bind_get_named_parameter` {#docs:current:clients:c:table_functions::duckdb_bind_get_named_parameter}

Retrieves a named parameter with the given name.

The result must be destroyed with `duckdb_destroy_value`.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            duckdb_value duckdb_bind_get_named_parameter(
  duckdb_bind_info info,
  const char *name
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `info`: The info object
* `name`: The name of the parameter

####### Return Value {#docs:current:clients:c:table_functions::return-value}

The value of the parameter. Must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_bind_set_bind_data` {#docs:current:clients:c:table_functions::duckdb_bind_set_bind_data}

Sets the user-provided bind data in the bind object of the table function.
This object can be retrieved again during execution.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            void duckdb_bind_set_bind_data(
  duckdb_bind_info info,
  void *bind_data,
  duckdb_delete_callback_t destroy
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `info`: The bind info of the table function.
* `bind_data`: The bind data object.
* `destroy`: The callback to destroy the bind data (if any).

<br>

###### `duckdb_bind_set_cardinality` {#docs:current:clients:c:table_functions::duckdb_bind_set_cardinality}

Sets the cardinality estimate for the table function, used for optimization.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            void duckdb_bind_set_cardinality(
  duckdb_bind_info info,
  idx_t cardinality,
  bool is_exact
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `info`: The bind data object.
* `is_exact`: Whether or not the cardinality estimate is exact, or an approximation

<br>

###### `duckdb_bind_set_error` {#docs:current:clients:c:table_functions::duckdb_bind_set_error}

Report that an error has occurred while calling bind on a table function.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            void duckdb_bind_set_error(
  duckdb_bind_info info,
  const char *error
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `info`: The info object
* `error`: The error message

<br>

###### `duckdb_init_get_extra_info` {#docs:current:clients:c:table_functions::duckdb_init_get_extra_info}

Retrieves the extra info of the function as set in `duckdb_table_function_set_extra_info`.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            void *duckdb_init_get_extra_info(
  duckdb_init_info info
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `info`: The info object

####### Return Value {#docs:current:clients:c:table_functions::return-value}

The extra info

<br>

###### `duckdb_init_get_bind_data` {#docs:current:clients:c:table_functions::duckdb_init_get_bind_data}

Gets the bind data set by `duckdb_bind_set_bind_data` during the bind.

Note that the bind data should be considered as read-only.
For tracking state, use the init data instead.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            void *duckdb_init_get_bind_data(
  duckdb_init_info info
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `info`: The info object

####### Return Value {#docs:current:clients:c:table_functions::return-value}

The bind data object

<br>

###### `duckdb_init_set_init_data` {#docs:current:clients:c:table_functions::duckdb_init_set_init_data}

Sets the user-provided init data in the init object. This object can be retrieved again during execution.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            void duckdb_init_set_init_data(
  duckdb_init_info info,
  void *init_data,
  duckdb_delete_callback_t destroy
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `info`: The info object
* `init_data`: The init data object.
* `destroy`: The callback that will be called to destroy the init data (if any)

<br>

###### `duckdb_init_get_column_count` {#docs:current:clients:c:table_functions::duckdb_init_get_column_count}

Returns the number of projected columns.

This function must be used if projection pushdown is enabled to figure out which columns to emit.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            idx_t duckdb_init_get_column_count(
  duckdb_init_info info
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `info`: The info object

####### Return Value {#docs:current:clients:c:table_functions::return-value}

The number of projected columns.

<br>

###### `duckdb_init_get_column_index` {#docs:current:clients:c:table_functions::duckdb_init_get_column_index}

Returns the column index of the projected column at the specified position.

This function must be used if projection pushdown is enabled to figure out which columns to emit.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            idx_t duckdb_init_get_column_index(
  duckdb_init_info info,
  idx_t column_index
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `info`: The info object
* `column_index`: The index at which to get the projected column index, from 0..duckdb_init_get_column_count(info)

####### Return Value {#docs:current:clients:c:table_functions::return-value}

The column index of the projected column.

<br>

###### `duckdb_init_set_max_threads` {#docs:current:clients:c:table_functions::duckdb_init_set_max_threads}

Sets how many threads can process this table function in parallel (default: 1)

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            void duckdb_init_set_max_threads(
  duckdb_init_info info,
  idx_t max_threads
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `info`: The info object
* `max_threads`: The maximum amount of threads that can process this table function

<br>

###### `duckdb_init_set_error` {#docs:current:clients:c:table_functions::duckdb_init_set_error}

Report that an error has occurred while calling init.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            void duckdb_init_set_error(
  duckdb_init_info info,
  const char *error
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `info`: The info object
* `error`: The error message

<br>

###### `duckdb_function_get_extra_info` {#docs:current:clients:c:table_functions::duckdb_function_get_extra_info}

Retrieves the extra info of the function as set in `duckdb_table_function_set_extra_info`.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            void *duckdb_function_get_extra_info(
  duckdb_function_info info
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `info`: The info object

####### Return Value {#docs:current:clients:c:table_functions::return-value}

The extra info

<br>

###### `duckdb_function_get_bind_data` {#docs:current:clients:c:table_functions::duckdb_function_get_bind_data}

Gets the table function's bind data set by `duckdb_bind_set_bind_data`.

Note that the bind data is read-only.
For tracking state, use the init data instead.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            void *duckdb_function_get_bind_data(
  duckdb_function_info info
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `info`: The function info object.

####### Return Value {#docs:current:clients:c:table_functions::return-value}

The bind data object.

<br>

###### `duckdb_function_get_init_data` {#docs:current:clients:c:table_functions::duckdb_function_get_init_data}

Gets the init data set by `duckdb_init_set_init_data` during the init.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            void *duckdb_function_get_init_data(
  duckdb_function_info info
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `info`: The info object

####### Return Value {#docs:current:clients:c:table_functions::return-value}

The init data object

<br>

###### `duckdb_function_get_local_init_data` {#docs:current:clients:c:table_functions::duckdb_function_get_local_init_data}

Gets the thread-local init data set by `duckdb_init_set_init_data` during the local_init.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            void *duckdb_function_get_local_init_data(
  duckdb_function_info info
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `info`: The info object

####### Return Value {#docs:current:clients:c:table_functions::return-value}

The init data object

<br>

###### `duckdb_function_set_error` {#docs:current:clients:c:table_functions::duckdb_function_set_error}

Report that an error has occurred while executing the function.

####### Syntax {#docs:current:clients:c:table_functions::syntax}

```c
            void duckdb_function_set_error(
  duckdb_function_info info,
  const char *error
);
```


####### Parameters {#docs:current:clients:c:table_functions::parameters}

* `info`: The info object
* `error`: The error message

<br>

### Replacement Scans {#docs:current:clients:c:replacement_scans}



The replacement scan API can be used to register a callback that is called when a table is read that does not exist in the catalog. For example, when a query such as `SELECT * FROM my_table` is executed and `my_table` does not exist, the replacement scan callback will be called with `my_table` as parameter. The replacement scan can then insert a table function with a specific parameter to replace the read of the table.

#### API Reference Overview {#docs:current:clients:c:replacement_scans::api-reference-overview}



```c
            void duckdb_add_replacement_scan(duckdb_database db, duckdb_replacement_callback_t replacement, void *extra_data, duckdb_delete_callback_t delete_callback);
void duckdb_replacement_scan_set_function_name(duckdb_replacement_scan_info info, const char *function_name);
void duckdb_replacement_scan_add_parameter(duckdb_replacement_scan_info info, duckdb_value parameter);
void duckdb_replacement_scan_set_error(duckdb_replacement_scan_info info, const char *error);
```


###### `duckdb_add_replacement_scan` {#docs:current:clients:c:replacement_scans::duckdb_add_replacement_scan}

Add a replacement scan definition to the specified database.

####### Syntax {#docs:current:clients:c:replacement_scans::syntax}

```c
            void duckdb_add_replacement_scan(
  duckdb_database db,
  duckdb_replacement_callback_t replacement,
  void *extra_data,
  duckdb_delete_callback_t delete_callback
);
```


####### Parameters {#docs:current:clients:c:replacement_scans::parameters}

* `db`: The database object to add the replacement scan to
* `replacement`: The replacement scan callback
* `extra_data`: Extra data that is passed back into the specified callback
* `delete_callback`: The delete callback to call on the extra data, if any

<br>

###### `duckdb_replacement_scan_set_function_name` {#docs:current:clients:c:replacement_scans::duckdb_replacement_scan_set_function_name}

Sets the replacement function name. If this function is called in the replacement callback,
the replacement scan is performed. If it is not called, the replacement callback is not performed.

####### Syntax {#docs:current:clients:c:replacement_scans::syntax}

```c
            void duckdb_replacement_scan_set_function_name(
  duckdb_replacement_scan_info info,
  const char *function_name
);
```


####### Parameters {#docs:current:clients:c:replacement_scans::parameters}

* `info`: The info object
* `function_name`: The function name to substitute.

<br>

###### `duckdb_replacement_scan_add_parameter` {#docs:current:clients:c:replacement_scans::duckdb_replacement_scan_add_parameter}

Adds a parameter to the replacement scan function.

####### Syntax {#docs:current:clients:c:replacement_scans::syntax}

```c
            void duckdb_replacement_scan_add_parameter(
  duckdb_replacement_scan_info info,
  duckdb_value parameter
);
```


####### Parameters {#docs:current:clients:c:replacement_scans::parameters}

* `info`: The info object
* `parameter`: The parameter to add.

<br>

###### `duckdb_replacement_scan_set_error` {#docs:current:clients:c:replacement_scans::duckdb_replacement_scan_set_error}

Report that an error has occurred while executing the replacement scan.

####### Syntax {#docs:current:clients:c:replacement_scans::syntax}

```c
            void duckdb_replacement_scan_set_error(
  duckdb_replacement_scan_info info,
  const char *error
);
```


####### Parameters {#docs:current:clients:c:replacement_scans::parameters}

* `info`: The info object
* `error`: The error message

<br>

### Complete API {#docs:current:clients:c:api}



This page contains the reference for DuckDB's C API.

> The reference contains several deprecation notices. These concern methods whose long-term availability is not guaranteed as they may be removed in the future. That said, DuckDB's developers plan to carry out deprecations slowly as several of the deprecated methods do not yet have a fully functional alternative. Therefore, they will not be removed before the alternative is available, and even then, there will be a grace period of a few minor versions before removing them. The reason that the methods are already deprecated in v1.0 is to denote that they are not part of the v1.0 stable API, which contains methods that are available long-term.

#### API Reference Overview {#docs:current:clients:c:api::api-reference-overview}



##### Open Connect {#docs:current:clients:c:api::open-connect}

```c
            duckdb_instance_cache duckdb_create_instance_cache();
duckdb_state duckdb_get_or_create_from_cache(duckdb_instance_cache instance_cache, const char *path, duckdb_database *out_database, duckdb_config config, char **out_error);
void duckdb_destroy_instance_cache(duckdb_instance_cache *instance_cache);
duckdb_state duckdb_open(const char *path, duckdb_database *out_database);
duckdb_state duckdb_open_ext(const char *path, duckdb_database *out_database, duckdb_config config, char **out_error);
void duckdb_close(duckdb_database *database);
duckdb_state duckdb_connect(duckdb_database database, duckdb_connection *out_connection);
void duckdb_interrupt(duckdb_connection connection);
duckdb_query_progress_type duckdb_query_progress(duckdb_connection connection);
void duckdb_disconnect(duckdb_connection *connection);
void duckdb_connection_get_client_context(duckdb_connection connection, duckdb_client_context *out_context);
void duckdb_connection_get_arrow_options(duckdb_connection connection, duckdb_arrow_options *out_arrow_options);
idx_t duckdb_client_context_get_connection_id(duckdb_client_context context);
void duckdb_destroy_client_context(duckdb_client_context *context);
void duckdb_destroy_arrow_options(duckdb_arrow_options *arrow_options);
const char *duckdb_library_version();
duckdb_value duckdb_get_table_names(duckdb_connection connection, const char *query, bool qualified);
```


##### Configuration {#docs:current:clients:c:api::configuration}

```c
            duckdb_state duckdb_create_config(duckdb_config *out_config);
size_t duckdb_config_count();
duckdb_state duckdb_get_config_flag(size_t index, const char **out_name, const char **out_description);
duckdb_state duckdb_set_config(duckdb_config config, const char *name, const char *option);
void duckdb_destroy_config(duckdb_config *config);
```


##### Error Data {#docs:current:clients:c:api::error-data}

```c
            duckdb_error_data duckdb_create_error_data(duckdb_error_type type, const char *message);
void duckdb_destroy_error_data(duckdb_error_data *error_data);
duckdb_error_type duckdb_error_data_error_type(duckdb_error_data error_data);
const char *duckdb_error_data_message(duckdb_error_data error_data);
bool duckdb_error_data_has_error(duckdb_error_data error_data);
```


##### Query Execution {#docs:current:clients:c:api::query-execution}

```c
            duckdb_state duckdb_query(duckdb_connection connection, const char *query, duckdb_result *out_result);
void duckdb_destroy_result(duckdb_result *result);
const char *duckdb_column_name(duckdb_result *result, idx_t col);
duckdb_type duckdb_column_type(duckdb_result *result, idx_t col);
duckdb_statement_type duckdb_result_statement_type(duckdb_result result);
duckdb_logical_type duckdb_column_logical_type(duckdb_result *result, idx_t col);
duckdb_arrow_options duckdb_result_get_arrow_options(duckdb_result *result);
idx_t duckdb_column_count(duckdb_result *result);
idx_t duckdb_row_count(duckdb_result *result);
idx_t duckdb_rows_changed(duckdb_result *result);
void *duckdb_column_data(duckdb_result *result, idx_t col);
bool *duckdb_nullmask_data(duckdb_result *result, idx_t col);
const char *duckdb_result_error(duckdb_result *result);
duckdb_error_type duckdb_result_error_type(duckdb_result *result);
```


##### Result Functions {#docs:current:clients:c:api::result-functions}

```c
            duckdb_data_chunk duckdb_result_get_chunk(duckdb_result result, idx_t chunk_index);
bool duckdb_result_is_streaming(duckdb_result result);
idx_t duckdb_result_chunk_count(duckdb_result result);
duckdb_result_type duckdb_result_return_type(duckdb_result result);
```


##### Safe Fetch Functions {#docs:current:clients:c:api::safe-fetch-functions}

```c
            bool duckdb_value_boolean(duckdb_result *result, idx_t col, idx_t row);
int8_t duckdb_value_int8(duckdb_result *result, idx_t col, idx_t row);
int16_t duckdb_value_int16(duckdb_result *result, idx_t col, idx_t row);
int32_t duckdb_value_int32(duckdb_result *result, idx_t col, idx_t row);
int64_t duckdb_value_int64(duckdb_result *result, idx_t col, idx_t row);
duckdb_hugeint duckdb_value_hugeint(duckdb_result *result, idx_t col, idx_t row);
duckdb_uhugeint duckdb_value_uhugeint(duckdb_result *result, idx_t col, idx_t row);
duckdb_decimal duckdb_value_decimal(duckdb_result *result, idx_t col, idx_t row);
uint8_t duckdb_value_uint8(duckdb_result *result, idx_t col, idx_t row);
uint16_t duckdb_value_uint16(duckdb_result *result, idx_t col, idx_t row);
uint32_t duckdb_value_uint32(duckdb_result *result, idx_t col, idx_t row);
uint64_t duckdb_value_uint64(duckdb_result *result, idx_t col, idx_t row);
float duckdb_value_float(duckdb_result *result, idx_t col, idx_t row);
double duckdb_value_double(duckdb_result *result, idx_t col, idx_t row);
duckdb_date duckdb_value_date(duckdb_result *result, idx_t col, idx_t row);
duckdb_time duckdb_value_time(duckdb_result *result, idx_t col, idx_t row);
duckdb_timestamp duckdb_value_timestamp(duckdb_result *result, idx_t col, idx_t row);
duckdb_interval duckdb_value_interval(duckdb_result *result, idx_t col, idx_t row);
char *duckdb_value_varchar(duckdb_result *result, idx_t col, idx_t row);
duckdb_string duckdb_value_string(duckdb_result *result, idx_t col, idx_t row);
char *duckdb_value_varchar_internal(duckdb_result *result, idx_t col, idx_t row);
duckdb_string duckdb_value_string_internal(duckdb_result *result, idx_t col, idx_t row);
duckdb_blob duckdb_value_blob(duckdb_result *result, idx_t col, idx_t row);
bool duckdb_value_is_null(duckdb_result *result, idx_t col, idx_t row);
```


##### Helpers {#docs:current:clients:c:api::helpers}

```c
            void *duckdb_malloc(size_t size);
void duckdb_free(void *ptr);
idx_t duckdb_vector_size();
bool duckdb_string_is_inlined(duckdb_string_t string);
uint32_t duckdb_string_t_length(duckdb_string_t string);
const char *duckdb_string_t_data(duckdb_string_t *string);
```


##### Date Time Timestamp Helpers {#docs:current:clients:c:api::date-time-timestamp-helpers}

```c
            duckdb_date_struct duckdb_from_date(duckdb_date date);
duckdb_date duckdb_to_date(duckdb_date_struct date);
bool duckdb_is_finite_date(duckdb_date date);
duckdb_time_struct duckdb_from_time(duckdb_time time);
duckdb_time_tz duckdb_create_time_tz(int64_t micros, int32_t offset);
duckdb_time_tz_struct duckdb_from_time_tz(duckdb_time_tz micros);
duckdb_time duckdb_to_time(duckdb_time_struct time);
duckdb_timestamp_struct duckdb_from_timestamp(duckdb_timestamp ts);
duckdb_timestamp duckdb_to_timestamp(duckdb_timestamp_struct ts);
bool duckdb_is_finite_timestamp(duckdb_timestamp ts);
bool duckdb_is_finite_timestamp_s(duckdb_timestamp_s ts);
bool duckdb_is_finite_timestamp_ms(duckdb_timestamp_ms ts);
bool duckdb_is_finite_timestamp_ns(duckdb_timestamp_ns ts);
```


##### Hugeint Helpers {#docs:current:clients:c:api::hugeint-helpers}

```c
            double duckdb_hugeint_to_double(duckdb_hugeint val);
duckdb_hugeint duckdb_double_to_hugeint(double val);
```


##### Unsigned Hugeint Helpers {#docs:current:clients:c:api::unsigned-hugeint-helpers}

```c
            double duckdb_uhugeint_to_double(duckdb_uhugeint val);
duckdb_uhugeint duckdb_double_to_uhugeint(double val);
```


##### Decimal Helpers {#docs:current:clients:c:api::decimal-helpers}

```c
            duckdb_decimal duckdb_double_to_decimal(double val, uint8_t width, uint8_t scale);
double duckdb_decimal_to_double(duckdb_decimal val);
```


##### Prepared Statements {#docs:current:clients:c:api::prepared-statements}

```c
            duckdb_state duckdb_prepare(duckdb_connection connection, const char *query, duckdb_prepared_statement *out_prepared_statement);
void duckdb_destroy_prepare(duckdb_prepared_statement *prepared_statement);
const char *duckdb_prepare_error(duckdb_prepared_statement prepared_statement);
idx_t duckdb_nparams(duckdb_prepared_statement prepared_statement);
const char *duckdb_parameter_name(duckdb_prepared_statement prepared_statement, idx_t index);
duckdb_type duckdb_param_type(duckdb_prepared_statement prepared_statement, idx_t param_idx);
duckdb_logical_type duckdb_param_logical_type(duckdb_prepared_statement prepared_statement, idx_t param_idx);
duckdb_state duckdb_clear_bindings(duckdb_prepared_statement prepared_statement);
duckdb_statement_type duckdb_prepared_statement_type(duckdb_prepared_statement statement);
idx_t duckdb_prepared_statement_column_count(duckdb_prepared_statement prepared_statement);
const char *duckdb_prepared_statement_column_name(duckdb_prepared_statement prepared_statement, idx_t col_idx);
duckdb_logical_type duckdb_prepared_statement_column_logical_type(duckdb_prepared_statement prepared_statement, idx_t col_idx);
duckdb_type duckdb_prepared_statement_column_type(duckdb_prepared_statement prepared_statement, idx_t col_idx);
```


##### Bind Values to Prepared Statements {#docs:current:clients:c:api::bind-values-to-prepared-statements}

```c
            duckdb_state duckdb_bind_value(duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_value val);
duckdb_state duckdb_bind_parameter_index(duckdb_prepared_statement prepared_statement, idx_t *param_idx_out, const char *name);
duckdb_state duckdb_bind_boolean(duckdb_prepared_statement prepared_statement, idx_t param_idx, bool val);
duckdb_state duckdb_bind_int8(duckdb_prepared_statement prepared_statement, idx_t param_idx, int8_t val);
duckdb_state duckdb_bind_int16(duckdb_prepared_statement prepared_statement, idx_t param_idx, int16_t val);
duckdb_state duckdb_bind_int32(duckdb_prepared_statement prepared_statement, idx_t param_idx, int32_t val);
duckdb_state duckdb_bind_int64(duckdb_prepared_statement prepared_statement, idx_t param_idx, int64_t val);
duckdb_state duckdb_bind_hugeint(duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_hugeint val);
duckdb_state duckdb_bind_uhugeint(duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_uhugeint val);
duckdb_state duckdb_bind_decimal(duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_decimal val);
duckdb_state duckdb_bind_uint8(duckdb_prepared_statement prepared_statement, idx_t param_idx, uint8_t val);
duckdb_state duckdb_bind_uint16(duckdb_prepared_statement prepared_statement, idx_t param_idx, uint16_t val);
duckdb_state duckdb_bind_uint32(duckdb_prepared_statement prepared_statement, idx_t param_idx, uint32_t val);
duckdb_state duckdb_bind_uint64(duckdb_prepared_statement prepared_statement, idx_t param_idx, uint64_t val);
duckdb_state duckdb_bind_float(duckdb_prepared_statement prepared_statement, idx_t param_idx, float val);
duckdb_state duckdb_bind_double(duckdb_prepared_statement prepared_statement, idx_t param_idx, double val);
duckdb_state duckdb_bind_date(duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_date val);
duckdb_state duckdb_bind_time(duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_time val);
duckdb_state duckdb_bind_timestamp(duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_timestamp val);
duckdb_state duckdb_bind_timestamp_tz(duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_timestamp val);
duckdb_state duckdb_bind_interval(duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_interval val);
duckdb_state duckdb_bind_varchar(duckdb_prepared_statement prepared_statement, idx_t param_idx, const char *val);
duckdb_state duckdb_bind_varchar_length(duckdb_prepared_statement prepared_statement, idx_t param_idx, const char *val, idx_t length);
duckdb_state duckdb_bind_blob(duckdb_prepared_statement prepared_statement, idx_t param_idx, const void *data, idx_t length);
duckdb_state duckdb_bind_null(duckdb_prepared_statement prepared_statement, idx_t param_idx);
```


##### Execute Prepared Statements {#docs:current:clients:c:api::execute-prepared-statements}

```c
            duckdb_state duckdb_execute_prepared(duckdb_prepared_statement prepared_statement, duckdb_result *out_result);
duckdb_state duckdb_execute_prepared_streaming(duckdb_prepared_statement prepared_statement, duckdb_result *out_result);
```


##### Extract Statements {#docs:current:clients:c:api::extract-statements}

```c
            idx_t duckdb_extract_statements(duckdb_connection connection, const char *query, duckdb_extracted_statements *out_extracted_statements);
duckdb_state duckdb_prepare_extracted_statement(duckdb_connection connection, duckdb_extracted_statements extracted_statements, idx_t index, duckdb_prepared_statement *out_prepared_statement);
const char *duckdb_extract_statements_error(duckdb_extracted_statements extracted_statements);
void duckdb_destroy_extracted(duckdb_extracted_statements *extracted_statements);
```


##### Pending Result Interface {#docs:current:clients:c:api::pending-result-interface}

```c
            duckdb_state duckdb_pending_prepared(duckdb_prepared_statement prepared_statement, duckdb_pending_result *out_result);
duckdb_state duckdb_pending_prepared_streaming(duckdb_prepared_statement prepared_statement, duckdb_pending_result *out_result);
void duckdb_destroy_pending(duckdb_pending_result *pending_result);
const char *duckdb_pending_error(duckdb_pending_result pending_result);
duckdb_pending_state duckdb_pending_execute_task(duckdb_pending_result pending_result);
duckdb_pending_state duckdb_pending_execute_check_state(duckdb_pending_result pending_result);
duckdb_state duckdb_execute_pending(duckdb_pending_result pending_result, duckdb_result *out_result);
bool duckdb_pending_execution_is_finished(duckdb_pending_state pending_state);
```


##### Value Interface {#docs:current:clients:c:api::value-interface}

```c
            void duckdb_destroy_value(duckdb_value *value);
duckdb_value duckdb_create_varchar(const char *text);
duckdb_value duckdb_create_varchar_length(const char *text, idx_t length);
duckdb_value duckdb_create_bool(bool input);
duckdb_value duckdb_create_int8(int8_t input);
duckdb_value duckdb_create_uint8(uint8_t input);
duckdb_value duckdb_create_int16(int16_t input);
duckdb_value duckdb_create_uint16(uint16_t input);
duckdb_value duckdb_create_int32(int32_t input);
duckdb_value duckdb_create_uint32(uint32_t input);
duckdb_value duckdb_create_uint64(uint64_t input);
duckdb_value duckdb_create_int64(int64_t val);
duckdb_value duckdb_create_hugeint(duckdb_hugeint input);
duckdb_value duckdb_create_uhugeint(duckdb_uhugeint input);
duckdb_value duckdb_create_bignum(duckdb_bignum input);
duckdb_value duckdb_create_decimal(duckdb_decimal input);
duckdb_value duckdb_create_float(float input);
duckdb_value duckdb_create_double(double input);
duckdb_value duckdb_create_date(duckdb_date input);
duckdb_value duckdb_create_time(duckdb_time input);
duckdb_value duckdb_create_time_ns(duckdb_time_ns input);
duckdb_value duckdb_create_time_tz_value(duckdb_time_tz value);
duckdb_value duckdb_create_timestamp(duckdb_timestamp input);
duckdb_value duckdb_create_timestamp_tz(duckdb_timestamp input);
duckdb_value duckdb_create_timestamp_s(duckdb_timestamp_s input);
duckdb_value duckdb_create_timestamp_ms(duckdb_timestamp_ms input);
duckdb_value duckdb_create_timestamp_ns(duckdb_timestamp_ns input);
duckdb_value duckdb_create_interval(duckdb_interval input);
duckdb_value duckdb_create_blob(const uint8_t *data, idx_t length);
duckdb_value duckdb_create_bit(duckdb_bit input);
duckdb_value duckdb_create_uuid(duckdb_uhugeint input);
bool duckdb_get_bool(duckdb_value val);
int8_t duckdb_get_int8(duckdb_value val);
uint8_t duckdb_get_uint8(duckdb_value val);
int16_t duckdb_get_int16(duckdb_value val);
uint16_t duckdb_get_uint16(duckdb_value val);
int32_t duckdb_get_int32(duckdb_value val);
uint32_t duckdb_get_uint32(duckdb_value val);
int64_t duckdb_get_int64(duckdb_value val);
uint64_t duckdb_get_uint64(duckdb_value val);
duckdb_hugeint duckdb_get_hugeint(duckdb_value val);
duckdb_uhugeint duckdb_get_uhugeint(duckdb_value val);
duckdb_bignum duckdb_get_bignum(duckdb_value val);
duckdb_decimal duckdb_get_decimal(duckdb_value val);
float duckdb_get_float(duckdb_value val);
double duckdb_get_double(duckdb_value val);
duckdb_date duckdb_get_date(duckdb_value val);
duckdb_time duckdb_get_time(duckdb_value val);
duckdb_time_ns duckdb_get_time_ns(duckdb_value val);
duckdb_time_tz duckdb_get_time_tz(duckdb_value val);
duckdb_timestamp duckdb_get_timestamp(duckdb_value val);
duckdb_timestamp duckdb_get_timestamp_tz(duckdb_value val);
duckdb_timestamp_s duckdb_get_timestamp_s(duckdb_value val);
duckdb_timestamp_ms duckdb_get_timestamp_ms(duckdb_value val);
duckdb_timestamp_ns duckdb_get_timestamp_ns(duckdb_value val);
duckdb_interval duckdb_get_interval(duckdb_value val);
duckdb_logical_type duckdb_get_value_type(duckdb_value val);
duckdb_blob duckdb_get_blob(duckdb_value val);
duckdb_bit duckdb_get_bit(duckdb_value val);
duckdb_uhugeint duckdb_get_uuid(duckdb_value val);
char *duckdb_get_varchar(duckdb_value value);
duckdb_value duckdb_create_struct_value(duckdb_logical_type type, duckdb_value *values);
duckdb_value duckdb_create_list_value(duckdb_logical_type type, duckdb_value *values, idx_t value_count);
duckdb_value duckdb_create_array_value(duckdb_logical_type type, duckdb_value *values, idx_t value_count);
duckdb_value duckdb_create_map_value(duckdb_logical_type map_type, duckdb_value *keys, duckdb_value *values, idx_t entry_count);
duckdb_value duckdb_create_union_value(duckdb_logical_type union_type, idx_t tag_index, duckdb_value value);
idx_t duckdb_get_map_size(duckdb_value value);
duckdb_value duckdb_get_map_key(duckdb_value value, idx_t index);
duckdb_value duckdb_get_map_value(duckdb_value value, idx_t index);
bool duckdb_is_null_value(duckdb_value value);
duckdb_value duckdb_create_null_value();
idx_t duckdb_get_list_size(duckdb_value value);
duckdb_value duckdb_get_list_child(duckdb_value value, idx_t index);
duckdb_value duckdb_create_enum_value(duckdb_logical_type type, uint64_t value);
uint64_t duckdb_get_enum_value(duckdb_value value);
duckdb_value duckdb_get_struct_child(duckdb_value value, idx_t index);
char *duckdb_value_to_string(duckdb_value value);
```


##### Logical Type Interface {#docs:current:clients:c:api::logical-type-interface}

```c
            duckdb_logical_type duckdb_create_logical_type(duckdb_type type);
char *duckdb_logical_type_get_alias(duckdb_logical_type type);
void duckdb_logical_type_set_alias(duckdb_logical_type type, const char *alias);
duckdb_logical_type duckdb_create_list_type(duckdb_logical_type type);
duckdb_logical_type duckdb_create_array_type(duckdb_logical_type type, idx_t array_size);
duckdb_logical_type duckdb_create_map_type(duckdb_logical_type key_type, duckdb_logical_type value_type);
duckdb_logical_type duckdb_create_union_type(duckdb_logical_type *member_types, const char **member_names, idx_t member_count);
duckdb_logical_type duckdb_create_struct_type(duckdb_logical_type *member_types, const char **member_names, idx_t member_count);
duckdb_logical_type duckdb_create_enum_type(const char **member_names, idx_t member_count);
duckdb_logical_type duckdb_create_decimal_type(uint8_t width, uint8_t scale);
duckdb_type duckdb_get_type_id(duckdb_logical_type type);
uint8_t duckdb_decimal_width(duckdb_logical_type type);
uint8_t duckdb_decimal_scale(duckdb_logical_type type);
duckdb_type duckdb_decimal_internal_type(duckdb_logical_type type);
duckdb_type duckdb_enum_internal_type(duckdb_logical_type type);
uint32_t duckdb_enum_dictionary_size(duckdb_logical_type type);
char *duckdb_enum_dictionary_value(duckdb_logical_type type, idx_t index);
duckdb_logical_type duckdb_list_type_child_type(duckdb_logical_type type);
duckdb_logical_type duckdb_array_type_child_type(duckdb_logical_type type);
idx_t duckdb_array_type_array_size(duckdb_logical_type type);
duckdb_logical_type duckdb_map_type_key_type(duckdb_logical_type type);
duckdb_logical_type duckdb_map_type_value_type(duckdb_logical_type type);
idx_t duckdb_struct_type_child_count(duckdb_logical_type type);
char *duckdb_struct_type_child_name(duckdb_logical_type type, idx_t index);
duckdb_logical_type duckdb_struct_type_child_type(duckdb_logical_type type, idx_t index);
idx_t duckdb_union_type_member_count(duckdb_logical_type type);
char *duckdb_union_type_member_name(duckdb_logical_type type, idx_t index);
duckdb_logical_type duckdb_union_type_member_type(duckdb_logical_type type, idx_t index);
void duckdb_destroy_logical_type(duckdb_logical_type *type);
duckdb_state duckdb_register_logical_type(duckdb_connection con, duckdb_logical_type type, duckdb_create_type_info info);
```


##### Data Chunk Interface {#docs:current:clients:c:api::data-chunk-interface}

```c
            duckdb_data_chunk duckdb_create_data_chunk(duckdb_logical_type *types, idx_t column_count);
void duckdb_destroy_data_chunk(duckdb_data_chunk *chunk);
void duckdb_data_chunk_reset(duckdb_data_chunk chunk);
idx_t duckdb_data_chunk_get_column_count(duckdb_data_chunk chunk);
duckdb_vector duckdb_data_chunk_get_vector(duckdb_data_chunk chunk, idx_t col_idx);
idx_t duckdb_data_chunk_get_size(duckdb_data_chunk chunk);
void duckdb_data_chunk_set_size(duckdb_data_chunk chunk, idx_t size);
```


##### Vector Interface {#docs:current:clients:c:api::vector-interface}

```c
            duckdb_vector duckdb_create_vector(duckdb_logical_type type, idx_t capacity);
void duckdb_destroy_vector(duckdb_vector *vector);
duckdb_logical_type duckdb_vector_get_column_type(duckdb_vector vector);
void *duckdb_vector_get_data(duckdb_vector vector);
uint64_t *duckdb_vector_get_validity(duckdb_vector vector);
void duckdb_vector_ensure_validity_writable(duckdb_vector vector);
void duckdb_vector_assign_string_element(duckdb_vector vector, idx_t index, const char *str);
void duckdb_vector_assign_string_element_len(duckdb_vector vector, idx_t index, const char *str, idx_t str_len);
duckdb_vector duckdb_list_vector_get_child(duckdb_vector vector);
idx_t duckdb_list_vector_get_size(duckdb_vector vector);
duckdb_state duckdb_list_vector_set_size(duckdb_vector vector, idx_t size);
duckdb_state duckdb_list_vector_reserve(duckdb_vector vector, idx_t required_capacity);
duckdb_vector duckdb_struct_vector_get_child(duckdb_vector vector, idx_t index);
duckdb_vector duckdb_array_vector_get_child(duckdb_vector vector);
void duckdb_slice_vector(duckdb_vector vector, duckdb_selection_vector sel, idx_t len);
void duckdb_vector_copy_sel(duckdb_vector src, duckdb_vector dst, duckdb_selection_vector sel, idx_t src_count, idx_t src_offset, idx_t dst_offset);
void duckdb_vector_reference_value(duckdb_vector vector, duckdb_value value);
void duckdb_vector_reference_vector(duckdb_vector to_vector, duckdb_vector from_vector);
```


##### Validity Mask Functions {#docs:current:clients:c:api::validity-mask-functions}

```c
            bool duckdb_validity_row_is_valid(uint64_t *validity, idx_t row);
void duckdb_validity_set_row_validity(uint64_t *validity, idx_t row, bool valid);
void duckdb_validity_set_row_invalid(uint64_t *validity, idx_t row);
void duckdb_validity_set_row_valid(uint64_t *validity, idx_t row);
```


##### Scalar Functions {#docs:current:clients:c:api::scalar-functions}

```c
            duckdb_scalar_function duckdb_create_scalar_function();
void duckdb_destroy_scalar_function(duckdb_scalar_function *scalar_function);
void duckdb_scalar_function_set_name(duckdb_scalar_function scalar_function, const char *name);
void duckdb_scalar_function_set_varargs(duckdb_scalar_function scalar_function, duckdb_logical_type type);
void duckdb_scalar_function_set_special_handling(duckdb_scalar_function scalar_function);
void duckdb_scalar_function_set_volatile(duckdb_scalar_function scalar_function);
void duckdb_scalar_function_add_parameter(duckdb_scalar_function scalar_function, duckdb_logical_type type);
void duckdb_scalar_function_set_return_type(duckdb_scalar_function scalar_function, duckdb_logical_type type);
void duckdb_scalar_function_set_extra_info(duckdb_scalar_function scalar_function, void *extra_info, duckdb_delete_callback_t destroy);
void duckdb_scalar_function_set_bind(duckdb_scalar_function scalar_function, duckdb_scalar_function_bind_t bind);
void duckdb_scalar_function_set_bind_data(duckdb_bind_info info, void *bind_data, duckdb_delete_callback_t destroy);
void duckdb_scalar_function_set_bind_data_copy(duckdb_bind_info info, duckdb_copy_callback_t copy);
void duckdb_scalar_function_bind_set_error(duckdb_bind_info info, const char *error);
void duckdb_scalar_function_set_function(duckdb_scalar_function scalar_function, duckdb_scalar_function_t function);
duckdb_state duckdb_register_scalar_function(duckdb_connection con, duckdb_scalar_function scalar_function);
void *duckdb_scalar_function_get_extra_info(duckdb_function_info info);
void *duckdb_scalar_function_bind_get_extra_info(duckdb_bind_info info);
void *duckdb_scalar_function_get_bind_data(duckdb_function_info info);
void duckdb_scalar_function_get_client_context(duckdb_bind_info info, duckdb_client_context *out_context);
void duckdb_scalar_function_set_error(duckdb_function_info info, const char *error);
duckdb_scalar_function_set duckdb_create_scalar_function_set(const char *name);
void duckdb_destroy_scalar_function_set(duckdb_scalar_function_set *scalar_function_set);
duckdb_state duckdb_add_scalar_function_to_set(duckdb_scalar_function_set set, duckdb_scalar_function function);
duckdb_state duckdb_register_scalar_function_set(duckdb_connection con, duckdb_scalar_function_set set);
idx_t duckdb_scalar_function_bind_get_argument_count(duckdb_bind_info info);
duckdb_expression duckdb_scalar_function_bind_get_argument(duckdb_bind_info info, idx_t index);
```


##### Selection Vector Interface {#docs:current:clients:c:api::selection-vector-interface}

```c
            duckdb_selection_vector duckdb_create_selection_vector(idx_t size);
void duckdb_destroy_selection_vector(duckdb_selection_vector sel);
sel_t *duckdb_selection_vector_get_data_ptr(duckdb_selection_vector sel);
```


##### Aggregate Functions {#docs:current:clients:c:api::aggregate-functions}

```c
            duckdb_aggregate_function duckdb_create_aggregate_function();
void duckdb_destroy_aggregate_function(duckdb_aggregate_function *aggregate_function);
void duckdb_aggregate_function_set_name(duckdb_aggregate_function aggregate_function, const char *name);
void duckdb_aggregate_function_add_parameter(duckdb_aggregate_function aggregate_function, duckdb_logical_type type);
void duckdb_aggregate_function_set_return_type(duckdb_aggregate_function aggregate_function, duckdb_logical_type type);
void duckdb_aggregate_function_set_functions(duckdb_aggregate_function aggregate_function, duckdb_aggregate_state_size state_size, duckdb_aggregate_init_t state_init, duckdb_aggregate_update_t update, duckdb_aggregate_combine_t combine, duckdb_aggregate_finalize_t finalize);
void duckdb_aggregate_function_set_destructor(duckdb_aggregate_function aggregate_function, duckdb_aggregate_destroy_t destroy);
duckdb_state duckdb_register_aggregate_function(duckdb_connection con, duckdb_aggregate_function aggregate_function);
void duckdb_aggregate_function_set_special_handling(duckdb_aggregate_function aggregate_function);
void duckdb_aggregate_function_set_extra_info(duckdb_aggregate_function aggregate_function, void *extra_info, duckdb_delete_callback_t destroy);
void *duckdb_aggregate_function_get_extra_info(duckdb_function_info info);
void duckdb_aggregate_function_set_error(duckdb_function_info info, const char *error);
duckdb_aggregate_function_set duckdb_create_aggregate_function_set(const char *name);
void duckdb_destroy_aggregate_function_set(duckdb_aggregate_function_set *aggregate_function_set);
duckdb_state duckdb_add_aggregate_function_to_set(duckdb_aggregate_function_set set, duckdb_aggregate_function function);
duckdb_state duckdb_register_aggregate_function_set(duckdb_connection con, duckdb_aggregate_function_set set);
```


##### Table Functions {#docs:current:clients:c:api::table-functions}

```c
            duckdb_table_function duckdb_create_table_function();
void duckdb_destroy_table_function(duckdb_table_function *table_function);
void duckdb_table_function_set_name(duckdb_table_function table_function, const char *name);
void duckdb_table_function_add_parameter(duckdb_table_function table_function, duckdb_logical_type type);
void duckdb_table_function_add_named_parameter(duckdb_table_function table_function, const char *name, duckdb_logical_type type);
void duckdb_table_function_set_extra_info(duckdb_table_function table_function, void *extra_info, duckdb_delete_callback_t destroy);
void duckdb_table_function_set_bind(duckdb_table_function table_function, duckdb_table_function_bind_t bind);
void duckdb_table_function_set_init(duckdb_table_function table_function, duckdb_table_function_init_t init);
void duckdb_table_function_set_local_init(duckdb_table_function table_function, duckdb_table_function_init_t init);
void duckdb_table_function_set_function(duckdb_table_function table_function, duckdb_table_function_t function);
void duckdb_table_function_supports_projection_pushdown(duckdb_table_function table_function, bool pushdown);
duckdb_state duckdb_register_table_function(duckdb_connection con, duckdb_table_function function);
```


##### Table Function Bind {#docs:current:clients:c:api::table-function-bind}

```c
            void *duckdb_bind_get_extra_info(duckdb_bind_info info);
void duckdb_table_function_get_client_context(duckdb_bind_info info, duckdb_client_context *out_context);
void duckdb_bind_add_result_column(duckdb_bind_info info, const char *name, duckdb_logical_type type);
idx_t duckdb_bind_get_parameter_count(duckdb_bind_info info);
duckdb_value duckdb_bind_get_parameter(duckdb_bind_info info, idx_t index);
duckdb_value duckdb_bind_get_named_parameter(duckdb_bind_info info, const char *name);
void duckdb_bind_set_bind_data(duckdb_bind_info info, void *bind_data, duckdb_delete_callback_t destroy);
void duckdb_bind_set_cardinality(duckdb_bind_info info, idx_t cardinality, bool is_exact);
void duckdb_bind_set_error(duckdb_bind_info info, const char *error);
```


##### Table Function Init {#docs:current:clients:c:api::table-function-init}

```c
            void *duckdb_init_get_extra_info(duckdb_init_info info);
void *duckdb_init_get_bind_data(duckdb_init_info info);
void duckdb_init_set_init_data(duckdb_init_info info, void *init_data, duckdb_delete_callback_t destroy);
idx_t duckdb_init_get_column_count(duckdb_init_info info);
idx_t duckdb_init_get_column_index(duckdb_init_info info, idx_t column_index);
void duckdb_init_set_max_threads(duckdb_init_info info, idx_t max_threads);
void duckdb_init_set_error(duckdb_init_info info, const char *error);
```


##### Table Function {#docs:current:clients:c:api::table-function}

```c
            void *duckdb_function_get_extra_info(duckdb_function_info info);
void *duckdb_function_get_bind_data(duckdb_function_info info);
void *duckdb_function_get_init_data(duckdb_function_info info);
void *duckdb_function_get_local_init_data(duckdb_function_info info);
void duckdb_function_set_error(duckdb_function_info info, const char *error);
```


##### Replacement Scans {#docs:current:clients:c:api::replacement-scans}

```c
            void duckdb_add_replacement_scan(duckdb_database db, duckdb_replacement_callback_t replacement, void *extra_data, duckdb_delete_callback_t delete_callback);
void duckdb_replacement_scan_set_function_name(duckdb_replacement_scan_info info, const char *function_name);
void duckdb_replacement_scan_add_parameter(duckdb_replacement_scan_info info, duckdb_value parameter);
void duckdb_replacement_scan_set_error(duckdb_replacement_scan_info info, const char *error);
```


##### Profiling Info {#docs:current:clients:c:api::profiling-info}

```c
            duckdb_profiling_info duckdb_get_profiling_info(duckdb_connection connection);
duckdb_value duckdb_profiling_info_get_value(duckdb_profiling_info info, const char *key);
duckdb_value duckdb_profiling_info_get_metrics(duckdb_profiling_info info);
idx_t duckdb_profiling_info_get_child_count(duckdb_profiling_info info);
duckdb_profiling_info duckdb_profiling_info_get_child(duckdb_profiling_info info, idx_t index);
```


##### Appender {#docs:current:clients:c:api::appender}

```c
            duckdb_state duckdb_appender_create(duckdb_connection connection, const char *schema, const char *table, duckdb_appender *out_appender);
duckdb_state duckdb_appender_create_ext(duckdb_connection connection, const char *catalog, const char *schema, const char *table, duckdb_appender *out_appender);
duckdb_state duckdb_appender_create_query(duckdb_connection connection, const char *query, idx_t column_count, duckdb_logical_type *types, const char *table_name, const char **column_names, duckdb_appender *out_appender);
idx_t duckdb_appender_column_count(duckdb_appender appender);
duckdb_logical_type duckdb_appender_column_type(duckdb_appender appender, idx_t col_idx);
const char *duckdb_appender_error(duckdb_appender appender);
duckdb_error_data duckdb_appender_error_data(duckdb_appender appender);
duckdb_state duckdb_appender_flush(duckdb_appender appender);
duckdb_state duckdb_appender_close(duckdb_appender appender);
duckdb_state duckdb_appender_destroy(duckdb_appender *appender);
duckdb_state duckdb_appender_add_column(duckdb_appender appender, const char *name);
duckdb_state duckdb_appender_clear_columns(duckdb_appender appender);
duckdb_state duckdb_appender_begin_row(duckdb_appender appender);
duckdb_state duckdb_appender_end_row(duckdb_appender appender);
duckdb_state duckdb_append_default(duckdb_appender appender);
duckdb_state duckdb_append_default_to_chunk(duckdb_appender appender, duckdb_data_chunk chunk, idx_t col, idx_t row);
duckdb_state duckdb_append_bool(duckdb_appender appender, bool value);
duckdb_state duckdb_append_int8(duckdb_appender appender, int8_t value);
duckdb_state duckdb_append_int16(duckdb_appender appender, int16_t value);
duckdb_state duckdb_append_int32(duckdb_appender appender, int32_t value);
duckdb_state duckdb_append_int64(duckdb_appender appender, int64_t value);
duckdb_state duckdb_append_hugeint(duckdb_appender appender, duckdb_hugeint value);
duckdb_state duckdb_append_uint8(duckdb_appender appender, uint8_t value);
duckdb_state duckdb_append_uint16(duckdb_appender appender, uint16_t value);
duckdb_state duckdb_append_uint32(duckdb_appender appender, uint32_t value);
duckdb_state duckdb_append_uint64(duckdb_appender appender, uint64_t value);
duckdb_state duckdb_append_uhugeint(duckdb_appender appender, duckdb_uhugeint value);
duckdb_state duckdb_append_float(duckdb_appender appender, float value);
duckdb_state duckdb_append_double(duckdb_appender appender, double value);
duckdb_state duckdb_append_date(duckdb_appender appender, duckdb_date value);
duckdb_state duckdb_append_time(duckdb_appender appender, duckdb_time value);
duckdb_state duckdb_append_timestamp(duckdb_appender appender, duckdb_timestamp value);
duckdb_state duckdb_append_interval(duckdb_appender appender, duckdb_interval value);
duckdb_state duckdb_append_varchar(duckdb_appender appender, const char *val);
duckdb_state duckdb_append_varchar_length(duckdb_appender appender, const char *val, idx_t length);
duckdb_state duckdb_append_blob(duckdb_appender appender, const void *data, idx_t length);
duckdb_state duckdb_append_null(duckdb_appender appender);
duckdb_state duckdb_append_value(duckdb_appender appender, duckdb_value value);
duckdb_state duckdb_append_data_chunk(duckdb_appender appender, duckdb_data_chunk chunk);
```


##### Table Description {#docs:current:clients:c:api::table-description}

```c
            duckdb_state duckdb_table_description_create(duckdb_connection connection, const char *schema, const char *table, duckdb_table_description *out);
duckdb_state duckdb_table_description_create_ext(duckdb_connection connection, const char *catalog, const char *schema, const char *table, duckdb_table_description *out);
void duckdb_table_description_destroy(duckdb_table_description *table_description);
const char *duckdb_table_description_error(duckdb_table_description table_description);
duckdb_state duckdb_column_has_default(duckdb_table_description table_description, idx_t index, bool *out);
char *duckdb_table_description_get_column_name(duckdb_table_description table_description, idx_t index);
```


##### Arrow Interface {#docs:current:clients:c:api::arrow-interface}

```c
            duckdb_error_data duckdb_to_arrow_schema(duckdb_arrow_options arrow_options, duckdb_logical_type *types, const char **names, idx_t column_count, struct ArrowSchema *out_schema);
duckdb_error_data duckdb_data_chunk_to_arrow(duckdb_arrow_options arrow_options, duckdb_data_chunk chunk, struct ArrowArray *out_arrow_array);
duckdb_error_data duckdb_schema_from_arrow(duckdb_connection connection, struct ArrowSchema *schema, duckdb_arrow_converted_schema *out_types);
duckdb_error_data duckdb_data_chunk_from_arrow(duckdb_connection connection, struct ArrowArray *arrow_array, duckdb_arrow_converted_schema converted_schema, duckdb_data_chunk *out_chunk);
void duckdb_destroy_arrow_converted_schema(duckdb_arrow_converted_schema *arrow_converted_schema);
duckdb_state duckdb_query_arrow(duckdb_connection connection, const char *query, duckdb_arrow *out_result);
duckdb_state duckdb_query_arrow_schema(duckdb_arrow result, duckdb_arrow_schema *out_schema);
duckdb_state duckdb_prepared_arrow_schema(duckdb_prepared_statement prepared, duckdb_arrow_schema *out_schema);
void duckdb_result_arrow_array(duckdb_result result, duckdb_data_chunk chunk, duckdb_arrow_array *out_array);
duckdb_state duckdb_query_arrow_array(duckdb_arrow result, duckdb_arrow_array *out_array);
idx_t duckdb_arrow_column_count(duckdb_arrow result);
idx_t duckdb_arrow_row_count(duckdb_arrow result);
idx_t duckdb_arrow_rows_changed(duckdb_arrow result);
const char *duckdb_query_arrow_error(duckdb_arrow result);
void duckdb_destroy_arrow(duckdb_arrow *result);
void duckdb_destroy_arrow_stream(duckdb_arrow_stream *stream_p);
duckdb_state duckdb_execute_prepared_arrow(duckdb_prepared_statement prepared_statement, duckdb_arrow *out_result);
duckdb_state duckdb_arrow_scan(duckdb_connection connection, const char *table_name, duckdb_arrow_stream arrow);
duckdb_state duckdb_arrow_array_scan(duckdb_connection connection, const char *table_name, duckdb_arrow_schema arrow_schema, duckdb_arrow_array arrow_array, duckdb_arrow_stream *out_stream);
```


##### Threading Information {#docs:current:clients:c:api::threading-information}

```c
            void duckdb_execute_tasks(duckdb_database database, idx_t max_tasks);
duckdb_task_state duckdb_create_task_state(duckdb_database database);
void duckdb_execute_tasks_state(duckdb_task_state state);
idx_t duckdb_execute_n_tasks_state(duckdb_task_state state, idx_t max_tasks);
void duckdb_finish_execution(duckdb_task_state state);
bool duckdb_task_state_is_finished(duckdb_task_state state);
void duckdb_destroy_task_state(duckdb_task_state state);
bool duckdb_execution_is_finished(duckdb_connection con);
```


##### Streaming Result Interface {#docs:current:clients:c:api::streaming-result-interface}

```c
            duckdb_data_chunk duckdb_stream_fetch_chunk(duckdb_result result);
duckdb_data_chunk duckdb_fetch_chunk(duckdb_result result);
```


##### Cast Functions {#docs:current:clients:c:api::cast-functions}

```c
            duckdb_cast_function duckdb_create_cast_function();
void duckdb_cast_function_set_source_type(duckdb_cast_function cast_function, duckdb_logical_type source_type);
void duckdb_cast_function_set_target_type(duckdb_cast_function cast_function, duckdb_logical_type target_type);
void duckdb_cast_function_set_implicit_cast_cost(duckdb_cast_function cast_function, int64_t cost);
void duckdb_cast_function_set_function(duckdb_cast_function cast_function, duckdb_cast_function_t function);
void duckdb_cast_function_set_extra_info(duckdb_cast_function cast_function, void *extra_info, duckdb_delete_callback_t destroy);
void *duckdb_cast_function_get_extra_info(duckdb_function_info info);
duckdb_cast_mode duckdb_cast_function_get_cast_mode(duckdb_function_info info);
void duckdb_cast_function_set_error(duckdb_function_info info, const char *error);
void duckdb_cast_function_set_row_error(duckdb_function_info info, const char *error, idx_t row, duckdb_vector output);
duckdb_state duckdb_register_cast_function(duckdb_connection con, duckdb_cast_function cast_function);
void duckdb_destroy_cast_function(duckdb_cast_function *cast_function);
```


##### Expression Interface {#docs:current:clients:c:api::expression-interface}

```c
            void duckdb_destroy_expression(duckdb_expression *expr);
duckdb_logical_type duckdb_expression_return_type(duckdb_expression expr);
bool duckdb_expression_is_foldable(duckdb_expression expr);
duckdb_error_data duckdb_expression_fold(duckdb_client_context context, duckdb_expression expr, duckdb_value *out_value);
```


###### `duckdb_create_instance_cache` {#docs:current:clients:c:api::duckdb_create_instance_cache}

Creates a new database instance cache.
The instance cache is necessary if a client/program (re)opens multiple databases to the same file within the same
process. Must be destroyed with 'duckdb_destroy_instance_cache'.


####### Return Value {#docs:current:clients:c:api::return-value}

The database instance cache.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_instance_cache duckdb_create_instance_cache(

);
```

<br>

###### `duckdb_get_or_create_from_cache` {#docs:current:clients:c:api::duckdb_get_or_create_from_cache}

Creates a new database instance in the instance cache, or retrieves an existing database instance.
Must be closed with 'duckdb_close'.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_get_or_create_from_cache(
  duckdb_instance_cache instance_cache,
  const char *path,
  duckdb_database *out_database,
  duckdb_config config,
  char **out_error
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `instance_cache`: The instance cache in which to create the database, or from which to take the database.
* `path`: Path to the database file on disk. Both `nullptr` and `:memory:` open or retrieve an in-memory database.
* `out_database`: The resulting cached database.
* `config`: (Optional) configuration used to create the database.
* `out_error`: If set and the function returns `DuckDBError`, this contains the error message.
Note that the error message must be freed using `duckdb_free`.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_destroy_instance_cache` {#docs:current:clients:c:api::duckdb_destroy_instance_cache}

Destroys an existing database instance cache and de-allocates its memory.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_destroy_instance_cache(
  duckdb_instance_cache *instance_cache
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `instance_cache`: The instance cache to destroy.

<br>

###### `duckdb_open` {#docs:current:clients:c:api::duckdb_open}

Creates a new database or opens an existing database file stored at the given path.
If no path is given a new in-memory database is created instead.
The database must be closed with 'duckdb_close'.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_open(
  const char *path,
  duckdb_database *out_database
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `path`: Path to the database file on disk. Both `nullptr` and `:memory:` open an in-memory database.
* `out_database`: The result database object.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_open_ext` {#docs:current:clients:c:api::duckdb_open_ext}

Extended version of duckdb_open. Creates a new database or opens an existing database file stored at the given path.
The database must be closed with 'duckdb_close'.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_open_ext(
  const char *path,
  duckdb_database *out_database,
  duckdb_config config,
  char **out_error
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `path`: Path to the database file on disk. Both `nullptr` and `:memory:` open an in-memory database.
* `out_database`: The result database object.
* `config`: (Optional) configuration used to start up the database.
* `out_error`: If set and the function returns `DuckDBError`, this contains the error message.
Note that the error message must be freed using `duckdb_free`.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_close` {#docs:current:clients:c:api::duckdb_close}

Closes the specified database and de-allocates all memory allocated for that database.
This should be called after you are done with any database allocated through `duckdb_open` or `duckdb_open_ext`.
Note that failing to call `duckdb_close` (in case of e.g., a program crash) will not cause data corruption.
Still, it is recommended to always correctly close a database object after you are done with it.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_close(
  duckdb_database *database
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `database`: The database object to shut down.

<br>

###### `duckdb_connect` {#docs:current:clients:c:api::duckdb_connect}

Opens a connection to a database. Connections are required to query the database, and store transactional state
associated with the connection.
The instantiated connection should be closed using 'duckdb_disconnect'.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_connect(
  duckdb_database database,
  duckdb_connection *out_connection
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `database`: The database file to connect to.
* `out_connection`: The result connection object.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_interrupt` {#docs:current:clients:c:api::duckdb_interrupt}

Interrupt running query

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_interrupt(
  duckdb_connection connection
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `connection`: The connection to interrupt

<br>

###### `duckdb_query_progress` {#docs:current:clients:c:api::duckdb_query_progress}

Get progress of the running query

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_query_progress_type duckdb_query_progress(
  duckdb_connection connection
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `connection`: The working connection

####### Return Value {#docs:current:clients:c:api::return-value}

-1 if no progress or a percentage of the progress

<br>

###### `duckdb_disconnect` {#docs:current:clients:c:api::duckdb_disconnect}

Closes the specified connection and de-allocates all memory allocated for that connection.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_disconnect(
  duckdb_connection *connection
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `connection`: The connection to close.

<br>

###### `duckdb_connection_get_client_context` {#docs:current:clients:c:api::duckdb_connection_get_client_context}

Retrieves the client context of the connection.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_connection_get_client_context(
  duckdb_connection connection,
  duckdb_client_context *out_context
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `connection`: The connection.
* `out_context`: The client context of the connection. Must be destroyed with `duckdb_destroy_client_context`.

<br>

###### `duckdb_connection_get_arrow_options` {#docs:current:clients:c:api::duckdb_connection_get_arrow_options}

Retrieves the arrow options of the connection.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_connection_get_arrow_options(
  duckdb_connection connection,
  duckdb_arrow_options *out_arrow_options
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `connection`: The connection.

<br>

###### `duckdb_client_context_get_connection_id` {#docs:current:clients:c:api::duckdb_client_context_get_connection_id}

Returns the connection id of the client context.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            idx_t duckdb_client_context_get_connection_id(
  duckdb_client_context context
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `context`: The client context.

####### Return Value {#docs:current:clients:c:api::return-value}

The connection id of the client context.

<br>

###### `duckdb_destroy_client_context` {#docs:current:clients:c:api::duckdb_destroy_client_context}

Destroys the client context and deallocates its memory.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_destroy_client_context(
  duckdb_client_context *context
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `context`: The client context to destroy.

<br>

###### `duckdb_destroy_arrow_options` {#docs:current:clients:c:api::duckdb_destroy_arrow_options}

Destroys the arrow options and deallocates its memory.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_destroy_arrow_options(
  duckdb_arrow_options *arrow_options
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `arrow_options`: The arrow options to destroy.

<br>

###### `duckdb_library_version` {#docs:current:clients:c:api::duckdb_library_version}

Returns the version of the linked DuckDB, with a version postfix for dev versions

Usually used for developing C extensions that must return this for a compatibility check.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            const char *duckdb_library_version(

);
```

<br>

###### `duckdb_get_table_names` {#docs:current:clients:c:api::duckdb_get_table_names}

Get the list of (fully qualified) table names of the query.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_get_table_names(
  duckdb_connection connection,
  const char *query,
  bool qualified
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `connection`: The connection for which to get the table names.
* `query`: The query for which to get the table names.
* `qualified`: Returns fully qualified table names (catalog.schema.table), if set to true, else only the (not
escaped) table names.

####### Return Value {#docs:current:clients:c:api::return-value}

A duckdb_value of type VARCHAR[] containing the (fully qualified) table names of the query. Must be destroyed
with duckdb_destroy_value.

<br>

###### `duckdb_create_config` {#docs:current:clients:c:api::duckdb_create_config}

Initializes an empty configuration object that can be used to provide start-up options for the DuckDB instance
through `duckdb_open_ext`.
The duckdb_config must be destroyed using 'duckdb_destroy_config'

This will always succeed unless there is a malloc failure.

Note that `duckdb_destroy_config` should always be called on the resulting config, even if the function returns
`DuckDBError`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_create_config(
  duckdb_config *out_config
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `out_config`: The result configuration object.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_config_count` {#docs:current:clients:c:api::duckdb_config_count}

This returns the total amount of configuration options available for usage with `duckdb_get_config_flag`.

This should not be called in a loop as it internally loops over all the options.


####### Return Value {#docs:current:clients:c:api::return-value}

The amount of config options available.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            size_t duckdb_config_count(

);
```

<br>

###### `duckdb_get_config_flag` {#docs:current:clients:c:api::duckdb_get_config_flag}

Obtains a human-readable name and description of a specific configuration option. This can be used to e.g.
display configuration options. This will succeed unless `index` is out of range (i.e., `>= duckdb_config_count`).

The result name or description MUST NOT be freed.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_get_config_flag(
  size_t index,
  const char **out_name,
  const char **out_description
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `index`: The index of the configuration option (between 0 and `duckdb_config_count`)
* `out_name`: A name of the configuration flag.
* `out_description`: A description of the configuration flag.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_set_config` {#docs:current:clients:c:api::duckdb_set_config}

Sets the specified option for the specified configuration. The configuration option is indicated by name.
To obtain a list of config options, see `duckdb_get_config_flag`.

In the source code, configuration options are defined in `config.cpp`.

This can fail if either the name is invalid, or if the value provided for the option is invalid.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_set_config(
  duckdb_config config,
  const char *name,
  const char *option
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `config`: The configuration object to set the option on.
* `name`: The name of the configuration flag to set.
* `option`: The value to set the configuration flag to.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_destroy_config` {#docs:current:clients:c:api::duckdb_destroy_config}

Destroys the specified configuration object and de-allocates all memory allocated for the object.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_destroy_config(
  duckdb_config *config
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `config`: The configuration object to destroy.

<br>

###### `duckdb_create_error_data` {#docs:current:clients:c:api::duckdb_create_error_data}

Creates duckdb_error_data.
Must be destroyed with `duckdb_destroy_error_data`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_error_data duckdb_create_error_data(
  duckdb_error_type type,
  const char *message
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The error type.
* `message`: The error message.

####### Return Value {#docs:current:clients:c:api::return-value}

The error data.

<br>

###### `duckdb_destroy_error_data` {#docs:current:clients:c:api::duckdb_destroy_error_data}

Destroys the error data and deallocates its memory.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_destroy_error_data(
  duckdb_error_data *error_data
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `error_data`: The error data to destroy.

<br>

###### `duckdb_error_data_error_type` {#docs:current:clients:c:api::duckdb_error_data_error_type}

Returns the duckdb_error_type of the error data.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_error_type duckdb_error_data_error_type(
  duckdb_error_data error_data
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `error_data`: The error data.

####### Return Value {#docs:current:clients:c:api::return-value}

The error type.

<br>

###### `duckdb_error_data_message` {#docs:current:clients:c:api::duckdb_error_data_message}

Returns the error message of the error data. Must not be freed.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            const char *duckdb_error_data_message(
  duckdb_error_data error_data
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `error_data`: The error data.

####### Return Value {#docs:current:clients:c:api::return-value}

The error message.

<br>

###### `duckdb_error_data_has_error` {#docs:current:clients:c:api::duckdb_error_data_has_error}

Returns whether the error data contains an error or not.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            bool duckdb_error_data_has_error(
  duckdb_error_data error_data
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `error_data`: The error data.

####### Return Value {#docs:current:clients:c:api::return-value}

True, if the error data contains an exception, else false.

<br>

###### `duckdb_query` {#docs:current:clients:c:api::duckdb_query}

Executes a SQL query within a connection and stores the full (materialized) result in the out_result pointer.
If the query fails to execute, DuckDBError is returned and the error message can be retrieved by calling
`duckdb_result_error`.

Note that after running `duckdb_query`, `duckdb_destroy_result` must be called on the result object even if the
query fails, otherwise the error stored within the result will not be freed correctly.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_query(
  duckdb_connection connection,
  const char *query,
  duckdb_result *out_result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `connection`: The connection to perform the query in.
* `query`: The SQL query to run.
* `out_result`: The query result.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_destroy_result` {#docs:current:clients:c:api::duckdb_destroy_result}

Closes the result and de-allocates all memory allocated for that result.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_destroy_result(
  duckdb_result *result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `result`: The result to destroy.

<br>

###### `duckdb_column_name` {#docs:current:clients:c:api::duckdb_column_name}

Returns the column name of the specified column. The result should not need to be freed; the column names will
automatically be destroyed when the result is destroyed.

Returns `NULL` if the column is out of range.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            const char *duckdb_column_name(
  duckdb_result *result,
  idx_t col
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `result`: The result object to fetch the column name from.
* `col`: The column index.

####### Return Value {#docs:current:clients:c:api::return-value}

The column name of the specified column.

<br>

###### `duckdb_column_type` {#docs:current:clients:c:api::duckdb_column_type}

Returns the column type of the specified column.

Returns `DUCKDB_TYPE_INVALID` if the column is out of range.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_type duckdb_column_type(
  duckdb_result *result,
  idx_t col
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `result`: The result object to fetch the column type from.
* `col`: The column index.

####### Return Value {#docs:current:clients:c:api::return-value}

The column type of the specified column.

<br>

###### `duckdb_result_statement_type` {#docs:current:clients:c:api::duckdb_result_statement_type}

Returns the statement type of the statement that was executed

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_statement_type duckdb_result_statement_type(
  duckdb_result result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `result`: The result object to fetch the statement type from.

####### Return Value {#docs:current:clients:c:api::return-value}

duckdb_statement_type value or DUCKDB_STATEMENT_TYPE_INVALID

<br>

###### `duckdb_column_logical_type` {#docs:current:clients:c:api::duckdb_column_logical_type}

Returns the logical column type of the specified column.

The return type of this call should be destroyed with `duckdb_destroy_logical_type`.

Returns `NULL` if the column is out of range.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_logical_type duckdb_column_logical_type(
  duckdb_result *result,
  idx_t col
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `result`: The result object to fetch the column type from.
* `col`: The column index.

####### Return Value {#docs:current:clients:c:api::return-value}

The logical column type of the specified column.

<br>

###### `duckdb_result_get_arrow_options` {#docs:current:clients:c:api::duckdb_result_get_arrow_options}

Returns the arrow options associated with the given result. These options are definitions of how the arrow arrays/schema
should be produced.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_arrow_options duckdb_result_get_arrow_options(
  duckdb_result *result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `result`: The result object to fetch arrow options from.

####### Return Value {#docs:current:clients:c:api::return-value}

The arrow options associated with the given result. This must be destroyed with
`duckdb_destroy_arrow_options`.

<br>

###### `duckdb_column_count` {#docs:current:clients:c:api::duckdb_column_count}

Returns the number of columns present in a the result object.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            idx_t duckdb_column_count(
  duckdb_result *result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `result`: The result object.

####### Return Value {#docs:current:clients:c:api::return-value}

The number of columns present in the result object.

<br>

###### `duckdb_row_count` {#docs:current:clients:c:api::duckdb_row_count}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.

Returns the number of rows present in the result object.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            idx_t duckdb_row_count(
  duckdb_result *result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `result`: The result object.

####### Return Value {#docs:current:clients:c:api::return-value}

The number of rows present in the result object.

<br>

###### `duckdb_rows_changed` {#docs:current:clients:c:api::duckdb_rows_changed}

Returns the number of rows changed by the query stored in the result. This is relevant only for INSERT/UPDATE/DELETE
queries. For other queries the rows_changed will be 0.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            idx_t duckdb_rows_changed(
  duckdb_result *result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `result`: The result object.

####### Return Value {#docs:current:clients:c:api::return-value}

The number of rows changed.

<br>

###### `duckdb_column_data` {#docs:current:clients:c:api::duckdb_column_data}

> **Deprecated.** This method has been deprecated. Prefer using `duckdb_result_get_chunk` instead.

Returns the data of a specific column of a result in columnar format.

The function returns a dense array which contains the result data. The exact type stored in the array depends on the
corresponding duckdb_type (as provided by `duckdb_column_type`). For the exact type by which the data should be
accessed, see the comments in [the types section](#types) or the `DUCKDB_TYPE` enum.

For example, for a column of type `DUCKDB_TYPE_INTEGER`, rows can be accessed in the following manner:
```c
int32_t *data = (int32_t *) duckdb_column_data(&result, 0);
printf("Data for row %d: %d\n", row, data[row]);
```

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void *duckdb_column_data(
  duckdb_result *result,
  idx_t col
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `result`: The result object to fetch the column data from.
* `col`: The column index.

####### Return Value {#docs:current:clients:c:api::return-value}

The column data of the specified column.

<br>

###### `duckdb_nullmask_data` {#docs:current:clients:c:api::duckdb_nullmask_data}

> **Deprecated.** This method has been deprecated. Prefer using `duckdb_result_get_chunk` instead.

Returns the nullmask of a specific column of a result in columnar format. The nullmask indicates for every row
whether or not the corresponding row is `NULL`. If a row is `NULL`, the values present in the array provided
by `duckdb_column_data` are undefined.

```c
int32_t *data = (int32_t *) duckdb_column_data(&result, 0);
bool *nullmask = duckdb_nullmask_data(&result, 0);
if (nullmask[row]) {
    printf("Data for row %d: NULL\n", row);
} else {
    printf("Data for row %d: %d\n", row, data[row]);
}
```

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            bool *duckdb_nullmask_data(
  duckdb_result *result,
  idx_t col
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `result`: The result object to fetch the nullmask from.
* `col`: The column index.

####### Return Value {#docs:current:clients:c:api::return-value}

The nullmask of the specified column.

<br>

###### `duckdb_result_error` {#docs:current:clients:c:api::duckdb_result_error}

Returns the error message contained within the result. The error is only set if `duckdb_query` returns `DuckDBError`.

The result of this function must not be freed. It will be cleaned up when `duckdb_destroy_result` is called.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            const char *duckdb_result_error(
  duckdb_result *result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `result`: The result object to fetch the error from.

####### Return Value {#docs:current:clients:c:api::return-value}

The error of the result.

<br>

###### `duckdb_result_error_type` {#docs:current:clients:c:api::duckdb_result_error_type}

Returns the result error type contained within the result. The error is only set if `duckdb_query` returns
`DuckDBError`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_error_type duckdb_result_error_type(
  duckdb_result *result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `result`: The result object to fetch the error from.

####### Return Value {#docs:current:clients:c:api::return-value}

The error type of the result.

<br>

###### `duckdb_result_get_chunk` {#docs:current:clients:c:api::duckdb_result_get_chunk}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.

Fetches a data chunk from the duckdb_result. This function should be called repeatedly until the result is exhausted.

The result must be destroyed with `duckdb_destroy_data_chunk`.

This function supersedes all `duckdb_value` functions, as well as the `duckdb_column_data` and `duckdb_nullmask_data`
functions. It results in significantly better performance, and should be preferred in newer code-bases.

If this function is used, none of the other result functions can be used and vice versa (i.e., this function cannot be
mixed with the legacy result functions).

Use `duckdb_result_chunk_count` to figure out how many chunks there are in the result.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_data_chunk duckdb_result_get_chunk(
  duckdb_result result,
  idx_t chunk_index
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `result`: The result object to fetch the data chunk from.
* `chunk_index`: The chunk index to fetch from.

####### Return Value {#docs:current:clients:c:api::return-value}

The resulting data chunk. Returns `NULL` if the chunk index is out of bounds.

<br>

###### `duckdb_result_is_streaming` {#docs:current:clients:c:api::duckdb_result_is_streaming}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.

Checks if the type of the internal result is StreamQueryResult.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            bool duckdb_result_is_streaming(
  duckdb_result result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `result`: The result object to check.

####### Return Value {#docs:current:clients:c:api::return-value}

Whether or not the result object is of the type StreamQueryResult

<br>

###### `duckdb_result_chunk_count` {#docs:current:clients:c:api::duckdb_result_chunk_count}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.

Returns the number of data chunks present in the result.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            idx_t duckdb_result_chunk_count(
  duckdb_result result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `result`: The result object

####### Return Value {#docs:current:clients:c:api::return-value}

Number of data chunks present in the result.

<br>

###### `duckdb_result_return_type` {#docs:current:clients:c:api::duckdb_result_return_type}

Returns the return_type of the given result, or DUCKDB_RETURN_TYPE_INVALID on error

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_result_type duckdb_result_return_type(
  duckdb_result result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `result`: The result object

####### Return Value {#docs:current:clients:c:api::return-value}

The return_type

<br>

###### `duckdb_value_boolean` {#docs:current:clients:c:api::duckdb_value_boolean}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.


####### Return Value {#docs:current:clients:c:api::return-value}

The boolean value at the specified location, or false if the value cannot be converted.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            bool duckdb_value_boolean(
  duckdb_result *result,
  idx_t col,
  idx_t row
);
```

<br>

###### `duckdb_value_int8` {#docs:current:clients:c:api::duckdb_value_int8}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.


####### Return Value {#docs:current:clients:c:api::return-value}

The int8_t value at the specified location, or 0 if the value cannot be converted.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            int8_t duckdb_value_int8(
  duckdb_result *result,
  idx_t col,
  idx_t row
);
```

<br>

###### `duckdb_value_int16` {#docs:current:clients:c:api::duckdb_value_int16}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.


####### Return Value {#docs:current:clients:c:api::return-value}

The int16_t value at the specified location, or 0 if the value cannot be converted.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            int16_t duckdb_value_int16(
  duckdb_result *result,
  idx_t col,
  idx_t row
);
```

<br>

###### `duckdb_value_int32` {#docs:current:clients:c:api::duckdb_value_int32}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.


####### Return Value {#docs:current:clients:c:api::return-value}

The int32_t value at the specified location, or 0 if the value cannot be converted.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            int32_t duckdb_value_int32(
  duckdb_result *result,
  idx_t col,
  idx_t row
);
```

<br>

###### `duckdb_value_int64` {#docs:current:clients:c:api::duckdb_value_int64}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.


####### Return Value {#docs:current:clients:c:api::return-value}

The int64_t value at the specified location, or 0 if the value cannot be converted.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            int64_t duckdb_value_int64(
  duckdb_result *result,
  idx_t col,
  idx_t row
);
```

<br>

###### `duckdb_value_hugeint` {#docs:current:clients:c:api::duckdb_value_hugeint}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.


####### Return Value {#docs:current:clients:c:api::return-value}

The duckdb_hugeint value at the specified location, or 0 if the value cannot be converted.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_hugeint duckdb_value_hugeint(
  duckdb_result *result,
  idx_t col,
  idx_t row
);
```

<br>

###### `duckdb_value_uhugeint` {#docs:current:clients:c:api::duckdb_value_uhugeint}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.


####### Return Value {#docs:current:clients:c:api::return-value}

The duckdb_uhugeint value at the specified location, or 0 if the value cannot be converted.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_uhugeint duckdb_value_uhugeint(
  duckdb_result *result,
  idx_t col,
  idx_t row
);
```

<br>

###### `duckdb_value_decimal` {#docs:current:clients:c:api::duckdb_value_decimal}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.


####### Return Value {#docs:current:clients:c:api::return-value}

The duckdb_decimal value at the specified location, or 0 if the value cannot be converted.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_decimal duckdb_value_decimal(
  duckdb_result *result,
  idx_t col,
  idx_t row
);
```

<br>

###### `duckdb_value_uint8` {#docs:current:clients:c:api::duckdb_value_uint8}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.


####### Return Value {#docs:current:clients:c:api::return-value}

The uint8_t value at the specified location, or 0 if the value cannot be converted.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            uint8_t duckdb_value_uint8(
  duckdb_result *result,
  idx_t col,
  idx_t row
);
```

<br>

###### `duckdb_value_uint16` {#docs:current:clients:c:api::duckdb_value_uint16}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.


####### Return Value {#docs:current:clients:c:api::return-value}

The uint16_t value at the specified location, or 0 if the value cannot be converted.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            uint16_t duckdb_value_uint16(
  duckdb_result *result,
  idx_t col,
  idx_t row
);
```

<br>

###### `duckdb_value_uint32` {#docs:current:clients:c:api::duckdb_value_uint32}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.


####### Return Value {#docs:current:clients:c:api::return-value}

The uint32_t value at the specified location, or 0 if the value cannot be converted.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            uint32_t duckdb_value_uint32(
  duckdb_result *result,
  idx_t col,
  idx_t row
);
```

<br>

###### `duckdb_value_uint64` {#docs:current:clients:c:api::duckdb_value_uint64}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.


####### Return Value {#docs:current:clients:c:api::return-value}

The uint64_t value at the specified location, or 0 if the value cannot be converted.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            uint64_t duckdb_value_uint64(
  duckdb_result *result,
  idx_t col,
  idx_t row
);
```

<br>

###### `duckdb_value_float` {#docs:current:clients:c:api::duckdb_value_float}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.


####### Return Value {#docs:current:clients:c:api::return-value}

The float value at the specified location, or 0 if the value cannot be converted.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            float duckdb_value_float(
  duckdb_result *result,
  idx_t col,
  idx_t row
);
```

<br>

###### `duckdb_value_double` {#docs:current:clients:c:api::duckdb_value_double}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.


####### Return Value {#docs:current:clients:c:api::return-value}

The double value at the specified location, or 0 if the value cannot be converted.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            double duckdb_value_double(
  duckdb_result *result,
  idx_t col,
  idx_t row
);
```

<br>

###### `duckdb_value_date` {#docs:current:clients:c:api::duckdb_value_date}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.


####### Return Value {#docs:current:clients:c:api::return-value}

The duckdb_date value at the specified location, or 0 if the value cannot be converted.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_date duckdb_value_date(
  duckdb_result *result,
  idx_t col,
  idx_t row
);
```

<br>

###### `duckdb_value_time` {#docs:current:clients:c:api::duckdb_value_time}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.


####### Return Value {#docs:current:clients:c:api::return-value}

The duckdb_time value at the specified location, or 0 if the value cannot be converted.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_time duckdb_value_time(
  duckdb_result *result,
  idx_t col,
  idx_t row
);
```

<br>

###### `duckdb_value_timestamp` {#docs:current:clients:c:api::duckdb_value_timestamp}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.


####### Return Value {#docs:current:clients:c:api::return-value}

The duckdb_timestamp value at the specified location, or 0 if the value cannot be converted.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_timestamp duckdb_value_timestamp(
  duckdb_result *result,
  idx_t col,
  idx_t row
);
```

<br>

###### `duckdb_value_interval` {#docs:current:clients:c:api::duckdb_value_interval}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.


####### Return Value {#docs:current:clients:c:api::return-value}

The duckdb_interval value at the specified location, or 0 if the value cannot be converted.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_interval duckdb_value_interval(
  duckdb_result *result,
  idx_t col,
  idx_t row
);
```

<br>

###### `duckdb_value_varchar` {#docs:current:clients:c:api::duckdb_value_varchar}

> **Deprecated.** This method has been deprecated. Use duckdb_value_string instead. This function does not work correctly if the string contains null
bytes.


####### Return Value {#docs:current:clients:c:api::return-value}

The text value at the specified location as a null-terminated string, or nullptr if the value cannot be
converted. The result must be freed with `duckdb_free`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            char *duckdb_value_varchar(
  duckdb_result *result,
  idx_t col,
  idx_t row
);
```

<br>

###### `duckdb_value_string` {#docs:current:clients:c:api::duckdb_value_string}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.

No support for nested types, and for other complex types.
The resulting field "string.data" must be freed with `duckdb_free.`


####### Return Value {#docs:current:clients:c:api::return-value}

The string value at the specified location. Attempts to cast the result value to string.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_string duckdb_value_string(
  duckdb_result *result,
  idx_t col,
  idx_t row
);
```

<br>

###### `duckdb_value_varchar_internal` {#docs:current:clients:c:api::duckdb_value_varchar_internal}

> **Deprecated.** This method has been deprecated. Use duckdb_value_string_internal instead. This function does not work correctly if the string contains
null bytes.


####### Return Value {#docs:current:clients:c:api::return-value}

The char* value at the specified location. ONLY works on VARCHAR columns and does not auto-cast.
If the column is NOT a VARCHAR column this function will return NULL.

The result must NOT be freed.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            char *duckdb_value_varchar_internal(
  duckdb_result *result,
  idx_t col,
  idx_t row
);
```

<br>

###### `duckdb_value_string_internal` {#docs:current:clients:c:api::duckdb_value_string_internal}

> **Deprecated.** This method has been deprecated. Use duckdb_value_string_internal instead. This function does not work correctly if the string contains
null bytes.

####### Return Value {#docs:current:clients:c:api::return-value}

The char* value at the specified location. ONLY works on VARCHAR columns and does not auto-cast.
If the column is NOT a VARCHAR column this function will return NULL.

The result must NOT be freed.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_string duckdb_value_string_internal(
  duckdb_result *result,
  idx_t col,
  idx_t row
);
```

<br>

###### `duckdb_value_blob` {#docs:current:clients:c:api::duckdb_value_blob}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.


####### Return Value {#docs:current:clients:c:api::return-value}

The duckdb_blob value at the specified location. Returns a blob with blob.data set to nullptr if the
value cannot be converted. The resulting field "blob.data" must be freed with `duckdb_free.`

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_blob duckdb_value_blob(
  duckdb_result *result,
  idx_t col,
  idx_t row
);
```

<br>

###### `duckdb_value_is_null` {#docs:current:clients:c:api::duckdb_value_is_null}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.


####### Return Value {#docs:current:clients:c:api::return-value}

Returns true if the value at the specified index is NULL, and false otherwise.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            bool duckdb_value_is_null(
  duckdb_result *result,
  idx_t col,
  idx_t row
);
```

<br>

###### `duckdb_malloc` {#docs:current:clients:c:api::duckdb_malloc}

Allocate `size` bytes of memory using the duckdb internal malloc function. Any memory allocated in this manner
should be freed using `duckdb_free`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void *duckdb_malloc(
  size_t size
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `size`: The number of bytes to allocate.

####### Return Value {#docs:current:clients:c:api::return-value}

A pointer to the allocated memory region.

<br>

###### `duckdb_free` {#docs:current:clients:c:api::duckdb_free}

Free a value returned from `duckdb_malloc`, `duckdb_value_varchar`, `duckdb_value_blob`, or
`duckdb_value_string`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_free(
  void *ptr
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `ptr`: The memory region to de-allocate.

<br>

###### `duckdb_vector_size` {#docs:current:clients:c:api::duckdb_vector_size}

The internal vector size used by DuckDB.
This is the amount of tuples that will fit into a data chunk created by `duckdb_create_data_chunk`.


####### Return Value {#docs:current:clients:c:api::return-value}

The vector size.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            idx_t duckdb_vector_size(

);
```

<br>

###### `duckdb_string_is_inlined` {#docs:current:clients:c:api::duckdb_string_is_inlined}

Whether or not the duckdb_string_t value is inlined.
This means that the data of the string does not have a separate allocation.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            bool duckdb_string_is_inlined(
  duckdb_string_t string
);
```

<br>

###### `duckdb_string_t_length` {#docs:current:clients:c:api::duckdb_string_t_length}

Get the string length of a string_t

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            uint32_t duckdb_string_t_length(
  duckdb_string_t string
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `string`: The string to get the length of.

####### Return Value {#docs:current:clients:c:api::return-value}

The length.

<br>

###### `duckdb_string_t_data` {#docs:current:clients:c:api::duckdb_string_t_data}

Get a pointer to the string data of a string_t

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            const char *duckdb_string_t_data(
  duckdb_string_t *string
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `string`: The string to get the pointer to.

####### Return Value {#docs:current:clients:c:api::return-value}

The pointer.

<br>

###### `duckdb_from_date` {#docs:current:clients:c:api::duckdb_from_date}

Decompose a `duckdb_date` object into year, month and date (stored as `duckdb_date_struct`).

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_date_struct duckdb_from_date(
  duckdb_date date
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `date`: The date object, as obtained from a `DUCKDB_TYPE_DATE` column.

####### Return Value {#docs:current:clients:c:api::return-value}

The `duckdb_date_struct` with the decomposed elements.

<br>

###### `duckdb_to_date` {#docs:current:clients:c:api::duckdb_to_date}

Re-compose a `duckdb_date` from year, month and date (` duckdb_date_struct`).

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_date duckdb_to_date(
  duckdb_date_struct date
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `date`: The year, month and date stored in a `duckdb_date_struct`.

####### Return Value {#docs:current:clients:c:api::return-value}

The `duckdb_date` element.

<br>

###### `duckdb_is_finite_date` {#docs:current:clients:c:api::duckdb_is_finite_date}

Test a `duckdb_date` to see if it is a finite value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            bool duckdb_is_finite_date(
  duckdb_date date
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `date`: The date object, as obtained from a `DUCKDB_TYPE_DATE` column.

####### Return Value {#docs:current:clients:c:api::return-value}

True if the date is finite, false if it is ±infinity.

<br>

###### `duckdb_from_time` {#docs:current:clients:c:api::duckdb_from_time}

Decompose a `duckdb_time` object into hour, minute, second and microsecond (stored as `duckdb_time_struct`).

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_time_struct duckdb_from_time(
  duckdb_time time
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `time`: The time object, as obtained from a `DUCKDB_TYPE_TIME` column.

####### Return Value {#docs:current:clients:c:api::return-value}

The `duckdb_time_struct` with the decomposed elements.

<br>

###### `duckdb_create_time_tz` {#docs:current:clients:c:api::duckdb_create_time_tz}

Create a `duckdb_time_tz` object from micros and a timezone offset.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_time_tz duckdb_create_time_tz(
  int64_t micros,
  int32_t offset
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `micros`: The microsecond component of the time.
* `offset`: The timezone offset component of the time.

####### Return Value {#docs:current:clients:c:api::return-value}

The `duckdb_time_tz` element.

<br>

###### `duckdb_from_time_tz` {#docs:current:clients:c:api::duckdb_from_time_tz}

Decompose a TIME_TZ objects into micros and a timezone offset.

Use `duckdb_from_time` to further decompose the micros into hour, minute, second and microsecond.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_time_tz_struct duckdb_from_time_tz(
  duckdb_time_tz micros
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `micros`: The time object, as obtained from a `DUCKDB_TYPE_TIME_TZ` column.

<br>

###### `duckdb_to_time` {#docs:current:clients:c:api::duckdb_to_time}

Re-compose a `duckdb_time` from hour, minute, second and microsecond (` duckdb_time_struct`).

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_time duckdb_to_time(
  duckdb_time_struct time
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `time`: The hour, minute, second and microsecond in a `duckdb_time_struct`.

####### Return Value {#docs:current:clients:c:api::return-value}

The `duckdb_time` element.

<br>

###### `duckdb_from_timestamp` {#docs:current:clients:c:api::duckdb_from_timestamp}

Decompose a `duckdb_timestamp` object into a `duckdb_timestamp_struct`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_timestamp_struct duckdb_from_timestamp(
  duckdb_timestamp ts
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `ts`: The ts object, as obtained from a `DUCKDB_TYPE_TIMESTAMP` column.

####### Return Value {#docs:current:clients:c:api::return-value}

The `duckdb_timestamp_struct` with the decomposed elements.

<br>

###### `duckdb_to_timestamp` {#docs:current:clients:c:api::duckdb_to_timestamp}

Re-compose a `duckdb_timestamp` from a duckdb_timestamp_struct.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_timestamp duckdb_to_timestamp(
  duckdb_timestamp_struct ts
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `ts`: The de-composed elements in a `duckdb_timestamp_struct`.

####### Return Value {#docs:current:clients:c:api::return-value}

The `duckdb_timestamp` element.

<br>

###### `duckdb_is_finite_timestamp` {#docs:current:clients:c:api::duckdb_is_finite_timestamp}

Test a `duckdb_timestamp` to see if it is a finite value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            bool duckdb_is_finite_timestamp(
  duckdb_timestamp ts
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `ts`: The duckdb_timestamp object, as obtained from a `DUCKDB_TYPE_TIMESTAMP` column.

####### Return Value {#docs:current:clients:c:api::return-value}

True if the timestamp is finite, false if it is ±infinity.

<br>

###### `duckdb_is_finite_timestamp_s` {#docs:current:clients:c:api::duckdb_is_finite_timestamp_s}

Test a `duckdb_timestamp_s` to see if it is a finite value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            bool duckdb_is_finite_timestamp_s(
  duckdb_timestamp_s ts
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `ts`: The duckdb_timestamp_s object, as obtained from a `DUCKDB_TYPE_TIMESTAMP_S` column.

####### Return Value {#docs:current:clients:c:api::return-value}

True if the timestamp is finite, false if it is ±infinity.

<br>

###### `duckdb_is_finite_timestamp_ms` {#docs:current:clients:c:api::duckdb_is_finite_timestamp_ms}

Test a `duckdb_timestamp_ms` to see if it is a finite value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            bool duckdb_is_finite_timestamp_ms(
  duckdb_timestamp_ms ts
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `ts`: The duckdb_timestamp_ms object, as obtained from a `DUCKDB_TYPE_TIMESTAMP_MS` column.

####### Return Value {#docs:current:clients:c:api::return-value}

True if the timestamp is finite, false if it is ±infinity.

<br>

###### `duckdb_is_finite_timestamp_ns` {#docs:current:clients:c:api::duckdb_is_finite_timestamp_ns}

Test a `duckdb_timestamp_ns` to see if it is a finite value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            bool duckdb_is_finite_timestamp_ns(
  duckdb_timestamp_ns ts
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `ts`: The duckdb_timestamp_ns object, as obtained from a `DUCKDB_TYPE_TIMESTAMP_NS` column.

####### Return Value {#docs:current:clients:c:api::return-value}

True if the timestamp is finite, false if it is ±infinity.

<br>

###### `duckdb_hugeint_to_double` {#docs:current:clients:c:api::duckdb_hugeint_to_double}

Converts a duckdb_hugeint object (as obtained from a `DUCKDB_TYPE_HUGEINT` column) into a double.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            double duckdb_hugeint_to_double(
  duckdb_hugeint val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: The hugeint value.

####### Return Value {#docs:current:clients:c:api::return-value}

The converted `double` element.

<br>

###### `duckdb_double_to_hugeint` {#docs:current:clients:c:api::duckdb_double_to_hugeint}

Converts a double value to a duckdb_hugeint object.

If the conversion fails because the double value is too big the result will be 0.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_hugeint duckdb_double_to_hugeint(
  double val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: The double value.

####### Return Value {#docs:current:clients:c:api::return-value}

The converted `duckdb_hugeint` element.

<br>

###### `duckdb_uhugeint_to_double` {#docs:current:clients:c:api::duckdb_uhugeint_to_double}

Converts a duckdb_uhugeint object (as obtained from a `DUCKDB_TYPE_UHUGEINT` column) into a double.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            double duckdb_uhugeint_to_double(
  duckdb_uhugeint val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: The uhugeint value.

####### Return Value {#docs:current:clients:c:api::return-value}

The converted `double` element.

<br>

###### `duckdb_double_to_uhugeint` {#docs:current:clients:c:api::duckdb_double_to_uhugeint}

Converts a double value to a duckdb_uhugeint object.

If the conversion fails because the double value is too big the result will be 0.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_uhugeint duckdb_double_to_uhugeint(
  double val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: The double value.

####### Return Value {#docs:current:clients:c:api::return-value}

The converted `duckdb_uhugeint` element.

<br>

###### `duckdb_double_to_decimal` {#docs:current:clients:c:api::duckdb_double_to_decimal}

Converts a double value to a duckdb_decimal object.

If the conversion fails because the double value is too big, or the width/scale are invalid the result will be 0.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_decimal duckdb_double_to_decimal(
  double val,
  uint8_t width,
  uint8_t scale
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: The double value.

####### Return Value {#docs:current:clients:c:api::return-value}

The converted `duckdb_decimal` element.

<br>

###### `duckdb_decimal_to_double` {#docs:current:clients:c:api::duckdb_decimal_to_double}

Converts a duckdb_decimal object (as obtained from a `DUCKDB_TYPE_DECIMAL` column) into a double.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            double duckdb_decimal_to_double(
  duckdb_decimal val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: The decimal value.

####### Return Value {#docs:current:clients:c:api::return-value}

The converted `double` element.

<br>

###### `duckdb_prepare` {#docs:current:clients:c:api::duckdb_prepare}

Create a prepared statement object from a query.

Note that after calling `duckdb_prepare`, the prepared statement should always be destroyed using
`duckdb_destroy_prepare`, even if the prepare fails.

If the prepare fails, `duckdb_prepare_error` can be called to obtain the reason why the prepare failed.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_prepare(
  duckdb_connection connection,
  const char *query,
  duckdb_prepared_statement *out_prepared_statement
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `connection`: The connection object
* `query`: The SQL query to prepare
* `out_prepared_statement`: The resulting prepared statement object

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_destroy_prepare` {#docs:current:clients:c:api::duckdb_destroy_prepare}

Closes the prepared statement and de-allocates all memory allocated for the statement.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_destroy_prepare(
  duckdb_prepared_statement *prepared_statement
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `prepared_statement`: The prepared statement to destroy.

<br>

###### `duckdb_prepare_error` {#docs:current:clients:c:api::duckdb_prepare_error}

Returns the error message associated with the given prepared statement.
If the prepared statement has no error message, this returns `nullptr` instead.

The error message should not be freed. It will be de-allocated when `duckdb_destroy_prepare` is called.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            const char *duckdb_prepare_error(
  duckdb_prepared_statement prepared_statement
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `prepared_statement`: The prepared statement to obtain the error from.

####### Return Value {#docs:current:clients:c:api::return-value}

The error message, or `nullptr` if there is none.

<br>

###### `duckdb_nparams` {#docs:current:clients:c:api::duckdb_nparams}

Returns the number of parameters that can be provided to the given prepared statement.

Returns 0 if the query was not successfully prepared.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            idx_t duckdb_nparams(
  duckdb_prepared_statement prepared_statement
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `prepared_statement`: The prepared statement to obtain the number of parameters for.

<br>

###### `duckdb_parameter_name` {#docs:current:clients:c:api::duckdb_parameter_name}

Returns the name used to identify the parameter
The returned string should be freed using `duckdb_free`.

Returns NULL if the index is out of range for the provided prepared statement.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            const char *duckdb_parameter_name(
  duckdb_prepared_statement prepared_statement,
  idx_t index
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `prepared_statement`: The prepared statement for which to get the parameter name from.

<br>

###### `duckdb_param_type` {#docs:current:clients:c:api::duckdb_param_type}

Returns the parameter type for the parameter at the given index.

Returns `DUCKDB_TYPE_INVALID` if the parameter index is out of range or the statement was not successfully prepared.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_type duckdb_param_type(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `prepared_statement`: The prepared statement.
* `param_idx`: The parameter index.

####### Return Value {#docs:current:clients:c:api::return-value}

The parameter type

<br>

###### `duckdb_param_logical_type` {#docs:current:clients:c:api::duckdb_param_logical_type}

Returns the logical type for the parameter at the given index.

Returns `nullptr` if the parameter index is out of range or the statement was not successfully prepared.

The return type of this call should be destroyed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_logical_type duckdb_param_logical_type(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `prepared_statement`: The prepared statement.
* `param_idx`: The parameter index.

####### Return Value {#docs:current:clients:c:api::return-value}

The logical type of the parameter

<br>

###### `duckdb_clear_bindings` {#docs:current:clients:c:api::duckdb_clear_bindings}

Clear the params bind to the prepared statement.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_clear_bindings(
  duckdb_prepared_statement prepared_statement
);
```

<br>

###### `duckdb_prepared_statement_type` {#docs:current:clients:c:api::duckdb_prepared_statement_type}

Returns the statement type of the statement to be executed

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_statement_type duckdb_prepared_statement_type(
  duckdb_prepared_statement statement
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `statement`: The prepared statement.

####### Return Value {#docs:current:clients:c:api::return-value}

duckdb_statement_type value or DUCKDB_STATEMENT_TYPE_INVALID

<br>

###### `duckdb_prepared_statement_column_count` {#docs:current:clients:c:api::duckdb_prepared_statement_column_count}

Returns the number of columns present in a the result of the prepared statement. If any of the column types are invalid,
the result will be 1.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            idx_t duckdb_prepared_statement_column_count(
  duckdb_prepared_statement prepared_statement
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `prepared_statement`: The prepared statement.

####### Return Value {#docs:current:clients:c:api::return-value}

The number of columns present in the result of the prepared statement.

<br>

###### `duckdb_prepared_statement_column_name` {#docs:current:clients:c:api::duckdb_prepared_statement_column_name}

Returns the name of the specified column of the result of the prepared_statement.
The returned string should be freed using `duckdb_free`.

Returns `nullptr` if the column is out of range.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            const char *duckdb_prepared_statement_column_name(
  duckdb_prepared_statement prepared_statement,
  idx_t col_idx
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `prepared_statement`: The prepared statement.
* `col_idx`: The column index.

####### Return Value {#docs:current:clients:c:api::return-value}

The column name of the specified column.

<br>

###### `duckdb_prepared_statement_column_logical_type` {#docs:current:clients:c:api::duckdb_prepared_statement_column_logical_type}

Returns the column type of the specified column of the result of the prepared_statement.

Returns `DUCKDB_TYPE_INVALID` if the column is out of range.
The return type of this call should be destroyed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_logical_type duckdb_prepared_statement_column_logical_type(
  duckdb_prepared_statement prepared_statement,
  idx_t col_idx
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `prepared_statement`: The prepared statement to fetch the column type from.
* `col_idx`: The column index.

####### Return Value {#docs:current:clients:c:api::return-value}

The logical type of the specified column.

<br>

###### `duckdb_prepared_statement_column_type` {#docs:current:clients:c:api::duckdb_prepared_statement_column_type}

Returns the column type of the specified column of the result of the prepared_statement.

Returns `DUCKDB_TYPE_INVALID` if the column is out of range.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_type duckdb_prepared_statement_column_type(
  duckdb_prepared_statement prepared_statement,
  idx_t col_idx
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `prepared_statement`: The prepared statement to fetch the column type from.
* `col_idx`: The column index.

####### Return Value {#docs:current:clients:c:api::return-value}

The type of the specified column.

<br>

###### `duckdb_bind_value` {#docs:current:clients:c:api::duckdb_bind_value}

Binds a value to the prepared statement at the specified index.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_bind_value(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx,
  duckdb_value val
);
```

<br>

###### `duckdb_bind_parameter_index` {#docs:current:clients:c:api::duckdb_bind_parameter_index}

Retrieve the index of the parameter for the prepared statement, identified by name

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_bind_parameter_index(
  duckdb_prepared_statement prepared_statement,
  idx_t *param_idx_out,
  const char *name
);
```

<br>

###### `duckdb_bind_boolean` {#docs:current:clients:c:api::duckdb_bind_boolean}

Binds a bool value to the prepared statement at the specified index.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_bind_boolean(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx,
  bool val
);
```

<br>

###### `duckdb_bind_int8` {#docs:current:clients:c:api::duckdb_bind_int8}

Binds an int8_t value to the prepared statement at the specified index.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_bind_int8(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx,
  int8_t val
);
```

<br>

###### `duckdb_bind_int16` {#docs:current:clients:c:api::duckdb_bind_int16}

Binds an int16_t value to the prepared statement at the specified index.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_bind_int16(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx,
  int16_t val
);
```

<br>

###### `duckdb_bind_int32` {#docs:current:clients:c:api::duckdb_bind_int32}

Binds an int32_t value to the prepared statement at the specified index.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_bind_int32(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx,
  int32_t val
);
```

<br>

###### `duckdb_bind_int64` {#docs:current:clients:c:api::duckdb_bind_int64}

Binds an int64_t value to the prepared statement at the specified index.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_bind_int64(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx,
  int64_t val
);
```

<br>

###### `duckdb_bind_hugeint` {#docs:current:clients:c:api::duckdb_bind_hugeint}

Binds a duckdb_hugeint value to the prepared statement at the specified index.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_bind_hugeint(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx,
  duckdb_hugeint val
);
```

<br>

###### `duckdb_bind_uhugeint` {#docs:current:clients:c:api::duckdb_bind_uhugeint}

Binds a duckdb_uhugeint value to the prepared statement at the specified index.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_bind_uhugeint(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx,
  duckdb_uhugeint val
);
```

<br>

###### `duckdb_bind_decimal` {#docs:current:clients:c:api::duckdb_bind_decimal}

Binds a duckdb_decimal value to the prepared statement at the specified index.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_bind_decimal(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx,
  duckdb_decimal val
);
```

<br>

###### `duckdb_bind_uint8` {#docs:current:clients:c:api::duckdb_bind_uint8}

Binds a uint8_t value to the prepared statement at the specified index.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_bind_uint8(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx,
  uint8_t val
);
```

<br>

###### `duckdb_bind_uint16` {#docs:current:clients:c:api::duckdb_bind_uint16}

Binds a uint16_t value to the prepared statement at the specified index.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_bind_uint16(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx,
  uint16_t val
);
```

<br>

###### `duckdb_bind_uint32` {#docs:current:clients:c:api::duckdb_bind_uint32}

Binds a uint32_t value to the prepared statement at the specified index.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_bind_uint32(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx,
  uint32_t val
);
```

<br>

###### `duckdb_bind_uint64` {#docs:current:clients:c:api::duckdb_bind_uint64}

Binds a uint64_t value to the prepared statement at the specified index.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_bind_uint64(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx,
  uint64_t val
);
```

<br>

###### `duckdb_bind_float` {#docs:current:clients:c:api::duckdb_bind_float}

Binds a float value to the prepared statement at the specified index.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_bind_float(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx,
  float val
);
```

<br>

###### `duckdb_bind_double` {#docs:current:clients:c:api::duckdb_bind_double}

Binds a double value to the prepared statement at the specified index.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_bind_double(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx,
  double val
);
```

<br>

###### `duckdb_bind_date` {#docs:current:clients:c:api::duckdb_bind_date}

Binds a duckdb_date value to the prepared statement at the specified index.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_bind_date(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx,
  duckdb_date val
);
```

<br>

###### `duckdb_bind_time` {#docs:current:clients:c:api::duckdb_bind_time}

Binds a duckdb_time value to the prepared statement at the specified index.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_bind_time(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx,
  duckdb_time val
);
```

<br>

###### `duckdb_bind_timestamp` {#docs:current:clients:c:api::duckdb_bind_timestamp}

Binds a duckdb_timestamp value to the prepared statement at the specified index.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_bind_timestamp(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx,
  duckdb_timestamp val
);
```

<br>

###### `duckdb_bind_timestamp_tz` {#docs:current:clients:c:api::duckdb_bind_timestamp_tz}

Binds a duckdb_timestamp value to the prepared statement at the specified index.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_bind_timestamp_tz(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx,
  duckdb_timestamp val
);
```

<br>

###### `duckdb_bind_interval` {#docs:current:clients:c:api::duckdb_bind_interval}

Binds a duckdb_interval value to the prepared statement at the specified index.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_bind_interval(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx,
  duckdb_interval val
);
```

<br>

###### `duckdb_bind_varchar` {#docs:current:clients:c:api::duckdb_bind_varchar}

Binds a null-terminated varchar value to the prepared statement at the specified index.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_bind_varchar(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx,
  const char *val
);
```

<br>

###### `duckdb_bind_varchar_length` {#docs:current:clients:c:api::duckdb_bind_varchar_length}

Binds a varchar value to the prepared statement at the specified index.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_bind_varchar_length(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx,
  const char *val,
  idx_t length
);
```

<br>

###### `duckdb_bind_blob` {#docs:current:clients:c:api::duckdb_bind_blob}

Binds a blob value to the prepared statement at the specified index.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_bind_blob(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx,
  const void *data,
  idx_t length
);
```

<br>

###### `duckdb_bind_null` {#docs:current:clients:c:api::duckdb_bind_null}

Binds a NULL value to the prepared statement at the specified index.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_bind_null(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx
);
```

<br>

###### `duckdb_execute_prepared` {#docs:current:clients:c:api::duckdb_execute_prepared}

Executes the prepared statement with the given bound parameters, and returns a materialized query result.

This method can be called multiple times for each prepared statement, and the parameters can be modified
between calls to this function.

Note that the result must be freed with `duckdb_destroy_result`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_execute_prepared(
  duckdb_prepared_statement prepared_statement,
  duckdb_result *out_result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `prepared_statement`: The prepared statement to execute.
* `out_result`: The query result.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_execute_prepared_streaming` {#docs:current:clients:c:api::duckdb_execute_prepared_streaming}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.

Executes the prepared statement with the given bound parameters, and returns an optionally-streaming query result.
To determine if the resulting query was in fact streamed, use `duckdb_result_is_streaming`

This method can be called multiple times for each prepared statement, and the parameters can be modified
between calls to this function.

Note that the result must be freed with `duckdb_destroy_result`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_execute_prepared_streaming(
  duckdb_prepared_statement prepared_statement,
  duckdb_result *out_result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `prepared_statement`: The prepared statement to execute.
* `out_result`: The query result.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_extract_statements` {#docs:current:clients:c:api::duckdb_extract_statements}

Extract all statements from a query.
Note that after calling `duckdb_extract_statements`, the extracted statements should always be destroyed using
`duckdb_destroy_extracted`, even if no statements were extracted.

If the extract fails, `duckdb_extract_statements_error` can be called to obtain the reason why the extract failed.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            idx_t duckdb_extract_statements(
  duckdb_connection connection,
  const char *query,
  duckdb_extracted_statements *out_extracted_statements
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `connection`: The connection object
* `query`: The SQL query to extract
* `out_extracted_statements`: The resulting extracted statements object

####### Return Value {#docs:current:clients:c:api::return-value}

The number of extracted statements or 0 on failure.

<br>

###### `duckdb_prepare_extracted_statement` {#docs:current:clients:c:api::duckdb_prepare_extracted_statement}

Prepare an extracted statement.
Note that after calling `duckdb_prepare_extracted_statement`, the prepared statement should always be destroyed using
`duckdb_destroy_prepare`, even if the prepare fails.

If the prepare fails, `duckdb_prepare_error` can be called to obtain the reason why the prepare failed.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_prepare_extracted_statement(
  duckdb_connection connection,
  duckdb_extracted_statements extracted_statements,
  idx_t index,
  duckdb_prepared_statement *out_prepared_statement
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `connection`: The connection object
* `extracted_statements`: The extracted statements object
* `index`: The index of the extracted statement to prepare
* `out_prepared_statement`: The resulting prepared statement object

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_extract_statements_error` {#docs:current:clients:c:api::duckdb_extract_statements_error}

Returns the error message contained within the extracted statements.
The result of this function must not be freed. It will be cleaned up when `duckdb_destroy_extracted` is called.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            const char *duckdb_extract_statements_error(
  duckdb_extracted_statements extracted_statements
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `extracted_statements`: The extracted statements to fetch the error from.

####### Return Value {#docs:current:clients:c:api::return-value}

The error of the extracted statements.

<br>

###### `duckdb_destroy_extracted` {#docs:current:clients:c:api::duckdb_destroy_extracted}

De-allocates all memory allocated for the extracted statements.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_destroy_extracted(
  duckdb_extracted_statements *extracted_statements
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `extracted_statements`: The extracted statements to destroy.

<br>

###### `duckdb_pending_prepared` {#docs:current:clients:c:api::duckdb_pending_prepared}

Executes the prepared statement with the given bound parameters, and returns a pending result.
The pending result represents an intermediate structure for a query that is not yet fully executed.
The pending result can be used to incrementally execute a query, returning control to the client between tasks.

Note that after calling `duckdb_pending_prepared`, the pending result should always be destroyed using
`duckdb_destroy_pending`, even if this function returns DuckDBError.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_pending_prepared(
  duckdb_prepared_statement prepared_statement,
  duckdb_pending_result *out_result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `prepared_statement`: The prepared statement to execute.
* `out_result`: The pending query result.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_pending_prepared_streaming` {#docs:current:clients:c:api::duckdb_pending_prepared_streaming}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.

Executes the prepared statement with the given bound parameters, and returns a pending result.
This pending result will create a streaming duckdb_result when executed.
The pending result represents an intermediate structure for a query that is not yet fully executed.

Note that after calling `duckdb_pending_prepared_streaming`, the pending result should always be destroyed using
`duckdb_destroy_pending`, even if this function returns DuckDBError.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_pending_prepared_streaming(
  duckdb_prepared_statement prepared_statement,
  duckdb_pending_result *out_result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `prepared_statement`: The prepared statement to execute.
* `out_result`: The pending query result.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_destroy_pending` {#docs:current:clients:c:api::duckdb_destroy_pending}

Closes the pending result and de-allocates all memory allocated for the result.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_destroy_pending(
  duckdb_pending_result *pending_result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `pending_result`: The pending result to destroy.

<br>

###### `duckdb_pending_error` {#docs:current:clients:c:api::duckdb_pending_error}

Returns the error message contained within the pending result.

The result of this function must not be freed. It will be cleaned up when `duckdb_destroy_pending` is called.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            const char *duckdb_pending_error(
  duckdb_pending_result pending_result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `pending_result`: The pending result to fetch the error from.

####### Return Value {#docs:current:clients:c:api::return-value}

The error of the pending result.

<br>

###### `duckdb_pending_execute_task` {#docs:current:clients:c:api::duckdb_pending_execute_task}

Executes a single task within the query, returning whether or not the query is ready.

If this returns DUCKDB_PENDING_RESULT_READY, the duckdb_execute_pending function can be called to obtain the result.
If this returns DUCKDB_PENDING_RESULT_NOT_READY, the duckdb_pending_execute_task function should be called again.
If this returns DUCKDB_PENDING_ERROR, an error occurred during execution.

The error message can be obtained by calling duckdb_pending_error on the pending_result.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_pending_state duckdb_pending_execute_task(
  duckdb_pending_result pending_result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `pending_result`: The pending result to execute a task within.

####### Return Value {#docs:current:clients:c:api::return-value}

The state of the pending result after the execution.

<br>

###### `duckdb_pending_execute_check_state` {#docs:current:clients:c:api::duckdb_pending_execute_check_state}

If this returns DUCKDB_PENDING_RESULT_READY, the duckdb_execute_pending function can be called to obtain the result.
If this returns DUCKDB_PENDING_RESULT_NOT_READY, the duckdb_pending_execute_check_state function should be called again.
If this returns DUCKDB_PENDING_ERROR, an error occurred during execution.

The error message can be obtained by calling duckdb_pending_error on the pending_result.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_pending_state duckdb_pending_execute_check_state(
  duckdb_pending_result pending_result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `pending_result`: The pending result.

####### Return Value {#docs:current:clients:c:api::return-value}

The state of the pending result.

<br>

###### `duckdb_execute_pending` {#docs:current:clients:c:api::duckdb_execute_pending}

Fully execute a pending query result, returning the final query result.

If duckdb_pending_execute_task has been called until DUCKDB_PENDING_RESULT_READY was returned, this will return fast.
Otherwise, all remaining tasks must be executed first.

Note that the result must be freed with `duckdb_destroy_result`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_execute_pending(
  duckdb_pending_result pending_result,
  duckdb_result *out_result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `pending_result`: The pending result to execute.
* `out_result`: The result object.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_pending_execution_is_finished` {#docs:current:clients:c:api::duckdb_pending_execution_is_finished}

Returns whether a duckdb_pending_state is finished executing. For example if `pending_state` is
DUCKDB_PENDING_RESULT_READY, this function will return true.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            bool duckdb_pending_execution_is_finished(
  duckdb_pending_state pending_state
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `pending_state`: The pending state on which to decide whether to finish execution.

####### Return Value {#docs:current:clients:c:api::return-value}

Boolean indicating pending execution should be considered finished.

<br>

###### `duckdb_destroy_value` {#docs:current:clients:c:api::duckdb_destroy_value}

Destroys the value and de-allocates all memory allocated for that type.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_destroy_value(
  duckdb_value *value
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `value`: The value to destroy.

<br>

###### `duckdb_create_varchar` {#docs:current:clients:c:api::duckdb_create_varchar}

Creates a value from a null-terminated string

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_varchar(
  const char *text
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `text`: The null-terminated string

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_varchar_length` {#docs:current:clients:c:api::duckdb_create_varchar_length}

Creates a value from a string

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_varchar_length(
  const char *text,
  idx_t length
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `text`: The text
* `length`: The length of the text

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_bool` {#docs:current:clients:c:api::duckdb_create_bool}

Creates a value from a boolean

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_bool(
  bool input
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `input`: The boolean value

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_int8` {#docs:current:clients:c:api::duckdb_create_int8}

Creates a value from an int8_t (a tinyint)

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_int8(
  int8_t input
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `input`: The tinyint value

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_uint8` {#docs:current:clients:c:api::duckdb_create_uint8}

Creates a value from a uint8_t (a utinyint)

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_uint8(
  uint8_t input
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `input`: The utinyint value

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_int16` {#docs:current:clients:c:api::duckdb_create_int16}

Creates a value from an int16_t (a smallint)

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_int16(
  int16_t input
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `input`: The smallint value

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_uint16` {#docs:current:clients:c:api::duckdb_create_uint16}

Creates a value from a uint16_t (a usmallint)

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_uint16(
  uint16_t input
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `input`: The usmallint value

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_int32` {#docs:current:clients:c:api::duckdb_create_int32}

Creates a value from an int32_t (an integer)

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_int32(
  int32_t input
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `input`: The integer value

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_uint32` {#docs:current:clients:c:api::duckdb_create_uint32}

Creates a value from a uint32_t (a uinteger)

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_uint32(
  uint32_t input
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `input`: The uinteger value

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_uint64` {#docs:current:clients:c:api::duckdb_create_uint64}

Creates a value from a uint64_t (a ubigint)

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_uint64(
  uint64_t input
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `input`: The ubigint value

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_int64` {#docs:current:clients:c:api::duckdb_create_int64}

Creates a value from an int64


####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_int64(
  int64_t val
);
```

<br>

###### `duckdb_create_hugeint` {#docs:current:clients:c:api::duckdb_create_hugeint}

Creates a value from a hugeint

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_hugeint(
  duckdb_hugeint input
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `input`: The hugeint value

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_uhugeint` {#docs:current:clients:c:api::duckdb_create_uhugeint}

Creates a value from a uhugeint

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_uhugeint(
  duckdb_uhugeint input
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `input`: The uhugeint value

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_bignum` {#docs:current:clients:c:api::duckdb_create_bignum}

Creates a BIGNUM value from a duckdb_bignum

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_bignum(
  duckdb_bignum input
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `input`: The duckdb_bignum value

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_decimal` {#docs:current:clients:c:api::duckdb_create_decimal}

Creates a DECIMAL value from a duckdb_decimal

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_decimal(
  duckdb_decimal input
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `input`: The duckdb_decimal value

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_float` {#docs:current:clients:c:api::duckdb_create_float}

Creates a value from a float

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_float(
  float input
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `input`: The float value

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_double` {#docs:current:clients:c:api::duckdb_create_double}

Creates a value from a double

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_double(
  double input
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `input`: The double value

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_date` {#docs:current:clients:c:api::duckdb_create_date}

Creates a value from a date

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_date(
  duckdb_date input
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `input`: The date value

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_time` {#docs:current:clients:c:api::duckdb_create_time}

Creates a value from a time

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_time(
  duckdb_time input
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `input`: The time value

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_time_ns` {#docs:current:clients:c:api::duckdb_create_time_ns}

Creates a value from a time_ns

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_time_ns(
  duckdb_time_ns input
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `input`: The time value

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_time_tz_value` {#docs:current:clients:c:api::duckdb_create_time_tz_value}

Creates a value from a time_tz.
Not to be confused with `duckdb_create_time_tz`, which creates a duckdb_time_tz_t.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_time_tz_value(
  duckdb_time_tz value
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `value`: The time_tz value

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_timestamp` {#docs:current:clients:c:api::duckdb_create_timestamp}

Creates a TIMESTAMP value from a duckdb_timestamp

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_timestamp(
  duckdb_timestamp input
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `input`: The duckdb_timestamp value

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_timestamp_tz` {#docs:current:clients:c:api::duckdb_create_timestamp_tz}

Creates a TIMESTAMP_TZ value from a duckdb_timestamp

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_timestamp_tz(
  duckdb_timestamp input
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `input`: The duckdb_timestamp value

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_timestamp_s` {#docs:current:clients:c:api::duckdb_create_timestamp_s}

Creates a TIMESTAMP_S value from a duckdb_timestamp_s

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_timestamp_s(
  duckdb_timestamp_s input
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `input`: The duckdb_timestamp_s value

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_timestamp_ms` {#docs:current:clients:c:api::duckdb_create_timestamp_ms}

Creates a TIMESTAMP_MS value from a duckdb_timestamp_ms

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_timestamp_ms(
  duckdb_timestamp_ms input
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `input`: The duckdb_timestamp_ms value

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_timestamp_ns` {#docs:current:clients:c:api::duckdb_create_timestamp_ns}

Creates a TIMESTAMP_NS value from a duckdb_timestamp_ns

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_timestamp_ns(
  duckdb_timestamp_ns input
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `input`: The duckdb_timestamp_ns value

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_interval` {#docs:current:clients:c:api::duckdb_create_interval}

Creates a value from an interval

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_interval(
  duckdb_interval input
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `input`: The interval value

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_blob` {#docs:current:clients:c:api::duckdb_create_blob}

Creates a value from a blob

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_blob(
  const uint8_t *data,
  idx_t length
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `data`: The blob data
* `length`: The length of the blob data

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_bit` {#docs:current:clients:c:api::duckdb_create_bit}

Creates a BIT value from a duckdb_bit

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_bit(
  duckdb_bit input
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `input`: The duckdb_bit value

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_create_uuid` {#docs:current:clients:c:api::duckdb_create_uuid}

Creates a UUID value from a uhugeint

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_uuid(
  duckdb_uhugeint input
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `input`: The duckdb_uhugeint containing the UUID

####### Return Value {#docs:current:clients:c:api::return-value}

The value. This must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_get_bool` {#docs:current:clients:c:api::duckdb_get_bool}

Returns the boolean value of the given value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            bool duckdb_get_bool(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value containing a boolean

####### Return Value {#docs:current:clients:c:api::return-value}

A boolean, or false if the value cannot be converted

<br>

###### `duckdb_get_int8` {#docs:current:clients:c:api::duckdb_get_int8}

Returns the int8_t value of the given value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            int8_t duckdb_get_int8(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value containing a tinyint

####### Return Value {#docs:current:clients:c:api::return-value}

A int8_t, or MinValue<int8> if the value cannot be converted

<br>

###### `duckdb_get_uint8` {#docs:current:clients:c:api::duckdb_get_uint8}

Returns the uint8_t value of the given value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            uint8_t duckdb_get_uint8(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value containing a utinyint

####### Return Value {#docs:current:clients:c:api::return-value}

A uint8_t, or MinValue<uint8> if the value cannot be converted

<br>

###### `duckdb_get_int16` {#docs:current:clients:c:api::duckdb_get_int16}

Returns the int16_t value of the given value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            int16_t duckdb_get_int16(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value containing a smallint

####### Return Value {#docs:current:clients:c:api::return-value}

A int16_t, or MinValue<int16> if the value cannot be converted

<br>

###### `duckdb_get_uint16` {#docs:current:clients:c:api::duckdb_get_uint16}

Returns the uint16_t value of the given value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            uint16_t duckdb_get_uint16(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value containing a usmallint

####### Return Value {#docs:current:clients:c:api::return-value}

A uint16_t, or MinValue<uint16> if the value cannot be converted

<br>

###### `duckdb_get_int32` {#docs:current:clients:c:api::duckdb_get_int32}

Returns the int32_t value of the given value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            int32_t duckdb_get_int32(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value containing an integer

####### Return Value {#docs:current:clients:c:api::return-value}

A int32_t, or MinValue<int32> if the value cannot be converted

<br>

###### `duckdb_get_uint32` {#docs:current:clients:c:api::duckdb_get_uint32}

Returns the uint32_t value of the given value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            uint32_t duckdb_get_uint32(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value containing a uinteger

####### Return Value {#docs:current:clients:c:api::return-value}

A uint32_t, or MinValue<uint32> if the value cannot be converted

<br>

###### `duckdb_get_int64` {#docs:current:clients:c:api::duckdb_get_int64}

Returns the int64_t value of the given value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            int64_t duckdb_get_int64(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value containing a bigint

####### Return Value {#docs:current:clients:c:api::return-value}

A int64_t, or MinValue<int64> if the value cannot be converted

<br>

###### `duckdb_get_uint64` {#docs:current:clients:c:api::duckdb_get_uint64}

Returns the uint64_t value of the given value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            uint64_t duckdb_get_uint64(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value containing a ubigint

####### Return Value {#docs:current:clients:c:api::return-value}

A uint64_t, or MinValue<uint64> if the value cannot be converted

<br>

###### `duckdb_get_hugeint` {#docs:current:clients:c:api::duckdb_get_hugeint}

Returns the hugeint value of the given value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_hugeint duckdb_get_hugeint(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value containing a hugeint

####### Return Value {#docs:current:clients:c:api::return-value}

A duckdb_hugeint, or MinValue<hugeint> if the value cannot be converted

<br>

###### `duckdb_get_uhugeint` {#docs:current:clients:c:api::duckdb_get_uhugeint}

Returns the uhugeint value of the given value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_uhugeint duckdb_get_uhugeint(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value containing a uhugeint

####### Return Value {#docs:current:clients:c:api::return-value}

A duckdb_uhugeint, or MinValue<uhugeint> if the value cannot be converted

<br>

###### `duckdb_get_bignum` {#docs:current:clients:c:api::duckdb_get_bignum}

Returns the duckdb_bignum value of the given value.
The `data` field must be destroyed with `duckdb_free`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_bignum duckdb_get_bignum(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value containing a BIGNUM

####### Return Value {#docs:current:clients:c:api::return-value}

A duckdb_bignum. The `data` field must be destroyed with `duckdb_free`.

<br>

###### `duckdb_get_decimal` {#docs:current:clients:c:api::duckdb_get_decimal}

Returns the duckdb_decimal value of the given value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_decimal duckdb_get_decimal(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value containing a DECIMAL

####### Return Value {#docs:current:clients:c:api::return-value}

A duckdb_decimal, or MinValue<decimal> if the value cannot be converted

<br>

###### `duckdb_get_float` {#docs:current:clients:c:api::duckdb_get_float}

Returns the float value of the given value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            float duckdb_get_float(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value containing a float

####### Return Value {#docs:current:clients:c:api::return-value}

A float, or NAN if the value cannot be converted

<br>

###### `duckdb_get_double` {#docs:current:clients:c:api::duckdb_get_double}

Returns the double value of the given value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            double duckdb_get_double(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value containing a double

####### Return Value {#docs:current:clients:c:api::return-value}

A double, or NAN if the value cannot be converted

<br>

###### `duckdb_get_date` {#docs:current:clients:c:api::duckdb_get_date}

Returns the date value of the given value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_date duckdb_get_date(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value containing a date

####### Return Value {#docs:current:clients:c:api::return-value}

A duckdb_date, or MinValue<date> if the value cannot be converted

<br>

###### `duckdb_get_time` {#docs:current:clients:c:api::duckdb_get_time}

Returns the time value of the given value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_time duckdb_get_time(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value containing a time

####### Return Value {#docs:current:clients:c:api::return-value}

A duckdb_time, or MinValue<time> if the value cannot be converted

<br>

###### `duckdb_get_time_ns` {#docs:current:clients:c:api::duckdb_get_time_ns}

Returns the time_ns value of the given value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_time_ns duckdb_get_time_ns(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value containing a time_ns

####### Return Value {#docs:current:clients:c:api::return-value}

A duckdb_time_ns, or MinValue<time_ns> if the value cannot be converted

<br>

###### `duckdb_get_time_tz` {#docs:current:clients:c:api::duckdb_get_time_tz}

Returns the time_tz value of the given value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_time_tz duckdb_get_time_tz(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value containing a time_tz

####### Return Value {#docs:current:clients:c:api::return-value}

A duckdb_time_tz, or MinValue<time_tz> if the value cannot be converted

<br>

###### `duckdb_get_timestamp` {#docs:current:clients:c:api::duckdb_get_timestamp}

Returns the TIMESTAMP value of the given value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_timestamp duckdb_get_timestamp(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value containing a TIMESTAMP

####### Return Value {#docs:current:clients:c:api::return-value}

A duckdb_timestamp, or MinValue<timestamp> if the value cannot be converted

<br>

###### `duckdb_get_timestamp_tz` {#docs:current:clients:c:api::duckdb_get_timestamp_tz}

Returns the TIMESTAMP_TZ value of the given value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_timestamp duckdb_get_timestamp_tz(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value containing a TIMESTAMP_TZ

####### Return Value {#docs:current:clients:c:api::return-value}

A duckdb_timestamp, or MinValue<timestamp_tz> if the value cannot be converted

<br>

###### `duckdb_get_timestamp_s` {#docs:current:clients:c:api::duckdb_get_timestamp_s}

Returns the duckdb_timestamp_s value of the given value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_timestamp_s duckdb_get_timestamp_s(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value containing a TIMESTAMP_S

####### Return Value {#docs:current:clients:c:api::return-value}

A duckdb_timestamp_s, or MinValue<timestamp_s> if the value cannot be converted

<br>

###### `duckdb_get_timestamp_ms` {#docs:current:clients:c:api::duckdb_get_timestamp_ms}

Returns the duckdb_timestamp_ms value of the given value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_timestamp_ms duckdb_get_timestamp_ms(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value containing a TIMESTAMP_MS

####### Return Value {#docs:current:clients:c:api::return-value}

A duckdb_timestamp_ms, or MinValue<timestamp_ms> if the value cannot be converted

<br>

###### `duckdb_get_timestamp_ns` {#docs:current:clients:c:api::duckdb_get_timestamp_ns}

Returns the duckdb_timestamp_ns value of the given value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_timestamp_ns duckdb_get_timestamp_ns(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value containing a TIMESTAMP_NS

####### Return Value {#docs:current:clients:c:api::return-value}

A duckdb_timestamp_ns, or MinValue<timestamp_ns> if the value cannot be converted

<br>

###### `duckdb_get_interval` {#docs:current:clients:c:api::duckdb_get_interval}

Returns the interval value of the given value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_interval duckdb_get_interval(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value containing a interval

####### Return Value {#docs:current:clients:c:api::return-value}

A duckdb_interval, or MinValue<interval> if the value cannot be converted

<br>

###### `duckdb_get_value_type` {#docs:current:clients:c:api::duckdb_get_value_type}

Returns the type of the given value. The type is valid as long as the value is not destroyed.
The type itself must not be destroyed.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_logical_type duckdb_get_value_type(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value

####### Return Value {#docs:current:clients:c:api::return-value}

A duckdb_logical_type.

<br>

###### `duckdb_get_blob` {#docs:current:clients:c:api::duckdb_get_blob}

Returns the blob value of the given value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_blob duckdb_get_blob(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value containing a blob

####### Return Value {#docs:current:clients:c:api::return-value}

A duckdb_blob

<br>

###### `duckdb_get_bit` {#docs:current:clients:c:api::duckdb_get_bit}

Returns the duckdb_bit value of the given value.
The `data` field must be destroyed with `duckdb_free`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_bit duckdb_get_bit(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value containing a BIT

####### Return Value {#docs:current:clients:c:api::return-value}

A duckdb_bit

<br>

###### `duckdb_get_uuid` {#docs:current:clients:c:api::duckdb_get_uuid}

Returns a duckdb_uhugeint representing the UUID value of the given value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_uhugeint duckdb_get_uuid(
  duckdb_value val
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `val`: A duckdb_value containing a UUID

####### Return Value {#docs:current:clients:c:api::return-value}

A duckdb_uhugeint representing the UUID value

<br>

###### `duckdb_get_varchar` {#docs:current:clients:c:api::duckdb_get_varchar}

Obtains a string representation of the given value.
The result must be destroyed with `duckdb_free`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            char *duckdb_get_varchar(
  duckdb_value value
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `value`: The value

####### Return Value {#docs:current:clients:c:api::return-value}

The string value. This must be destroyed with `duckdb_free`.

<br>

###### `duckdb_create_struct_value` {#docs:current:clients:c:api::duckdb_create_struct_value}

Creates a struct value from a type and an array of values. Must be destroyed with `duckdb_destroy_value`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_struct_value(
  duckdb_logical_type type,
  duckdb_value *values
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The type of the struct
* `values`: The values for the struct fields

####### Return Value {#docs:current:clients:c:api::return-value}

The struct value, or nullptr, if any child type is `DUCKDB_TYPE_ANY` or `DUCKDB_TYPE_INVALID`.

<br>

###### `duckdb_create_list_value` {#docs:current:clients:c:api::duckdb_create_list_value}

Creates a list value from a child (element) type and an array of values of length `value_count`.
Must be destroyed with `duckdb_destroy_value`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_list_value(
  duckdb_logical_type type,
  duckdb_value *values,
  idx_t value_count
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The type of the list
* `values`: The values for the list
* `value_count`: The number of values in the list

####### Return Value {#docs:current:clients:c:api::return-value}

The list value, or nullptr, if the child type is `DUCKDB_TYPE_ANY` or `DUCKDB_TYPE_INVALID`.

<br>

###### `duckdb_create_array_value` {#docs:current:clients:c:api::duckdb_create_array_value}

Creates an array value from a child (element) type and an array of values of length `value_count`.
Must be destroyed with `duckdb_destroy_value`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_array_value(
  duckdb_logical_type type,
  duckdb_value *values,
  idx_t value_count
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The type of the array
* `values`: The values for the array
* `value_count`: The number of values in the array

####### Return Value {#docs:current:clients:c:api::return-value}

The array value, or nullptr, if the child type is `DUCKDB_TYPE_ANY` or `DUCKDB_TYPE_INVALID`.

<br>

###### `duckdb_create_map_value` {#docs:current:clients:c:api::duckdb_create_map_value}

Creates a map value from a map type and two arrays, one for the keys and one for the values, each of length
`entry_count`. Must be destroyed with `duckdb_destroy_value`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_map_value(
  duckdb_logical_type map_type,
  duckdb_value *keys,
  duckdb_value *values,
  idx_t entry_count
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `map_type`: The map type
* `keys`: The keys of the map
* `values`: The values of the map
* `entry_count`: The number of entries (key-value pairs) in the map

####### Return Value {#docs:current:clients:c:api::return-value}

The map value, or nullptr, if the parameters are invalid.

<br>

###### `duckdb_create_union_value` {#docs:current:clients:c:api::duckdb_create_union_value}

Creates a union value from a union type, a tag index, and a value.
Must be destroyed with `duckdb_destroy_value`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_union_value(
  duckdb_logical_type union_type,
  idx_t tag_index,
  duckdb_value value
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `union_type`: The union type
* `tag_index`: The index of the tag of the union
* `value`: The value of the union for that tag

####### Return Value {#docs:current:clients:c:api::return-value}

The union value, or nullptr, if the parameters are invalid.

<br>

###### `duckdb_get_map_size` {#docs:current:clients:c:api::duckdb_get_map_size}

Returns the number of elements in a MAP value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            idx_t duckdb_get_map_size(
  duckdb_value value
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `value`: The MAP value.

####### Return Value {#docs:current:clients:c:api::return-value}

The number of elements in the map.

<br>

###### `duckdb_get_map_key` {#docs:current:clients:c:api::duckdb_get_map_key}

Returns the MAP key at index as a duckdb_value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_get_map_key(
  duckdb_value value,
  idx_t index
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `value`: The MAP value.
* `index`: The index of the key.

####### Return Value {#docs:current:clients:c:api::return-value}

The key as a duckdb_value.

<br>

###### `duckdb_get_map_value` {#docs:current:clients:c:api::duckdb_get_map_value}

Returns the MAP value at index as a duckdb_value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_get_map_value(
  duckdb_value value,
  idx_t index
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `value`: The MAP value.
* `index`: The index of the value.

####### Return Value {#docs:current:clients:c:api::return-value}

The value as a duckdb_value.

<br>

###### `duckdb_is_null_value` {#docs:current:clients:c:api::duckdb_is_null_value}

Returns whether the value's type is SQLNULL or not.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            bool duckdb_is_null_value(
  duckdb_value value
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `value`: The value to check.

####### Return Value {#docs:current:clients:c:api::return-value}

True, if the value's type is SQLNULL, otherwise false.

<br>

###### `duckdb_create_null_value` {#docs:current:clients:c:api::duckdb_create_null_value}

Creates a value of type SQLNULL.


####### Return Value {#docs:current:clients:c:api::return-value}

The duckdb_value representing SQLNULL. This must be destroyed with `duckdb_destroy_value`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_null_value(

);
```

<br>

###### `duckdb_get_list_size` {#docs:current:clients:c:api::duckdb_get_list_size}

Returns the number of elements in a LIST value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            idx_t duckdb_get_list_size(
  duckdb_value value
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `value`: The LIST value.

####### Return Value {#docs:current:clients:c:api::return-value}

The number of elements in the list.

<br>

###### `duckdb_get_list_child` {#docs:current:clients:c:api::duckdb_get_list_child}

Returns the LIST child at index as a duckdb_value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_get_list_child(
  duckdb_value value,
  idx_t index
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `value`: The LIST value.
* `index`: The index of the child.

####### Return Value {#docs:current:clients:c:api::return-value}

The child as a duckdb_value.

<br>

###### `duckdb_create_enum_value` {#docs:current:clients:c:api::duckdb_create_enum_value}

Creates an enum value from a type and a value. Must be destroyed with `duckdb_destroy_value`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_create_enum_value(
  duckdb_logical_type type,
  uint64_t value
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The type of the enum
* `value`: The value for the enum

####### Return Value {#docs:current:clients:c:api::return-value}

The enum value, or nullptr.

<br>

###### `duckdb_get_enum_value` {#docs:current:clients:c:api::duckdb_get_enum_value}

Returns the enum value of the given value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            uint64_t duckdb_get_enum_value(
  duckdb_value value
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `value`: A duckdb_value containing an enum

####### Return Value {#docs:current:clients:c:api::return-value}

A uint64_t, or MinValue<uint64> if the value cannot be converted

<br>

###### `duckdb_get_struct_child` {#docs:current:clients:c:api::duckdb_get_struct_child}

Returns the STRUCT child at index as a duckdb_value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_get_struct_child(
  duckdb_value value,
  idx_t index
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `value`: The STRUCT value.
* `index`: The index of the child.

####### Return Value {#docs:current:clients:c:api::return-value}

The child as a duckdb_value.

<br>

###### `duckdb_value_to_string` {#docs:current:clients:c:api::duckdb_value_to_string}

Returns the SQL string representation of the given value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            char *duckdb_value_to_string(
  duckdb_value value
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `value`: A duckdb_value.

####### Return Value {#docs:current:clients:c:api::return-value}

The SQL string representation as a null-terminated string. The result must be freed with `duckdb_free`.

<br>

###### `duckdb_create_logical_type` {#docs:current:clients:c:api::duckdb_create_logical_type}

Creates a `duckdb_logical_type` from a primitive type.
The resulting logical type must be destroyed with `duckdb_destroy_logical_type`.

Returns an invalid logical type, if type is: `DUCKDB_TYPE_INVALID`, `DUCKDB_TYPE_DECIMAL`, `DUCKDB_TYPE_ENUM`,
`DUCKDB_TYPE_LIST`, `DUCKDB_TYPE_STRUCT`, `DUCKDB_TYPE_MAP`, `DUCKDB_TYPE_ARRAY`, or `DUCKDB_TYPE_UNION`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_logical_type duckdb_create_logical_type(
  duckdb_type type
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The primitive type to create.

####### Return Value {#docs:current:clients:c:api::return-value}

The logical type.

<br>

###### `duckdb_logical_type_get_alias` {#docs:current:clients:c:api::duckdb_logical_type_get_alias}

Returns the alias of a duckdb_logical_type, if set, else `nullptr`.
The result must be destroyed with `duckdb_free`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            char *duckdb_logical_type_get_alias(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The logical type

####### Return Value {#docs:current:clients:c:api::return-value}

The alias or `nullptr`

<br>

###### `duckdb_logical_type_set_alias` {#docs:current:clients:c:api::duckdb_logical_type_set_alias}

Sets the alias of a duckdb_logical_type.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_logical_type_set_alias(
  duckdb_logical_type type,
  const char *alias
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The logical type
* `alias`: The alias to set

<br>

###### `duckdb_create_list_type` {#docs:current:clients:c:api::duckdb_create_list_type}

Creates a LIST type from its child type.
The return type must be destroyed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_logical_type duckdb_create_list_type(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The child type of the list

####### Return Value {#docs:current:clients:c:api::return-value}

The logical type.

<br>

###### `duckdb_create_array_type` {#docs:current:clients:c:api::duckdb_create_array_type}

Creates an ARRAY type from its child type.
The return type must be destroyed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_logical_type duckdb_create_array_type(
  duckdb_logical_type type,
  idx_t array_size
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The child type of the array.
* `array_size`: The number of elements in the array.

####### Return Value {#docs:current:clients:c:api::return-value}

The logical type.

<br>

###### `duckdb_create_map_type` {#docs:current:clients:c:api::duckdb_create_map_type}

Creates a MAP type from its key type and value type.
The return type must be destroyed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_logical_type duckdb_create_map_type(
  duckdb_logical_type key_type,
  duckdb_logical_type value_type
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `key_type`: The map's key type.
* `value_type`: The map's value type.

####### Return Value {#docs:current:clients:c:api::return-value}

The logical type.

<br>

###### `duckdb_create_union_type` {#docs:current:clients:c:api::duckdb_create_union_type}

Creates a UNION type from the passed arrays.
The return type must be destroyed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_logical_type duckdb_create_union_type(
  duckdb_logical_type *member_types,
  const char **member_names,
  idx_t member_count
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `member_types`: The array of union member types.
* `member_names`: The union member names.
* `member_count`: The number of union members.

####### Return Value {#docs:current:clients:c:api::return-value}

The logical type.

<br>

###### `duckdb_create_struct_type` {#docs:current:clients:c:api::duckdb_create_struct_type}

Creates a STRUCT type based on the member types and names.
The resulting type must be destroyed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_logical_type duckdb_create_struct_type(
  duckdb_logical_type *member_types,
  const char **member_names,
  idx_t member_count
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `member_types`: The array of types of the struct members.
* `member_names`: The array of names of the struct members.
* `member_count`: The number of members of the struct.

####### Return Value {#docs:current:clients:c:api::return-value}

The logical type.

<br>

###### `duckdb_create_enum_type` {#docs:current:clients:c:api::duckdb_create_enum_type}

Creates an ENUM type from the passed member name array.
The resulting type should be destroyed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_logical_type duckdb_create_enum_type(
  const char **member_names,
  idx_t member_count
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `member_names`: The array of names that the enum should consist of.
* `member_count`: The number of elements that were specified in the array.

####### Return Value {#docs:current:clients:c:api::return-value}

The logical type.

<br>

###### `duckdb_create_decimal_type` {#docs:current:clients:c:api::duckdb_create_decimal_type}

Creates a DECIMAL type with the specified width and scale.
The resulting type should be destroyed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_logical_type duckdb_create_decimal_type(
  uint8_t width,
  uint8_t scale
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `width`: The width of the decimal type
* `scale`: The scale of the decimal type

####### Return Value {#docs:current:clients:c:api::return-value}

The logical type.

<br>

###### `duckdb_get_type_id` {#docs:current:clients:c:api::duckdb_get_type_id}

Retrieves the enum `duckdb_type` of a `duckdb_logical_type`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_type duckdb_get_type_id(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The logical type.

####### Return Value {#docs:current:clients:c:api::return-value}

The `duckdb_type` id.

<br>

###### `duckdb_decimal_width` {#docs:current:clients:c:api::duckdb_decimal_width}

Retrieves the width of a decimal type.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            uint8_t duckdb_decimal_width(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The logical type object

####### Return Value {#docs:current:clients:c:api::return-value}

The width of the decimal type

<br>

###### `duckdb_decimal_scale` {#docs:current:clients:c:api::duckdb_decimal_scale}

Retrieves the scale of a decimal type.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            uint8_t duckdb_decimal_scale(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The logical type object

####### Return Value {#docs:current:clients:c:api::return-value}

The scale of the decimal type

<br>

###### `duckdb_decimal_internal_type` {#docs:current:clients:c:api::duckdb_decimal_internal_type}

Retrieves the internal storage type of a decimal type.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_type duckdb_decimal_internal_type(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The logical type object

####### Return Value {#docs:current:clients:c:api::return-value}

The internal type of the decimal type

<br>

###### `duckdb_enum_internal_type` {#docs:current:clients:c:api::duckdb_enum_internal_type}

Retrieves the internal storage type of an enum type.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_type duckdb_enum_internal_type(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The logical type object

####### Return Value {#docs:current:clients:c:api::return-value}

The internal type of the enum type

<br>

###### `duckdb_enum_dictionary_size` {#docs:current:clients:c:api::duckdb_enum_dictionary_size}

Retrieves the dictionary size of the enum type.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            uint32_t duckdb_enum_dictionary_size(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The logical type object

####### Return Value {#docs:current:clients:c:api::return-value}

The dictionary size of the enum type

<br>

###### `duckdb_enum_dictionary_value` {#docs:current:clients:c:api::duckdb_enum_dictionary_value}

Retrieves the dictionary value at the specified position from the enum.

The result must be freed with `duckdb_free`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            char *duckdb_enum_dictionary_value(
  duckdb_logical_type type,
  idx_t index
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The logical type object
* `index`: The index in the dictionary

####### Return Value {#docs:current:clients:c:api::return-value}

The string value of the enum type. Must be freed with `duckdb_free`.

<br>

###### `duckdb_list_type_child_type` {#docs:current:clients:c:api::duckdb_list_type_child_type}

Retrieves the child type of the given LIST type. Also accepts MAP types.
The result must be freed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_logical_type duckdb_list_type_child_type(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The logical type, either LIST or MAP.

####### Return Value {#docs:current:clients:c:api::return-value}

The child type of the LIST or MAP type.

<br>

###### `duckdb_array_type_child_type` {#docs:current:clients:c:api::duckdb_array_type_child_type}

Retrieves the child type of the given ARRAY type.

The result must be freed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_logical_type duckdb_array_type_child_type(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The logical type. Must be ARRAY.

####### Return Value {#docs:current:clients:c:api::return-value}

The child type of the ARRAY type.

<br>

###### `duckdb_array_type_array_size` {#docs:current:clients:c:api::duckdb_array_type_array_size}

Retrieves the array size of the given array type.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            idx_t duckdb_array_type_array_size(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The logical type object

####### Return Value {#docs:current:clients:c:api::return-value}

The fixed number of elements the values of this array type can store.

<br>

###### `duckdb_map_type_key_type` {#docs:current:clients:c:api::duckdb_map_type_key_type}

Retrieves the key type of the given map type.

The result must be freed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_logical_type duckdb_map_type_key_type(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The logical type object

####### Return Value {#docs:current:clients:c:api::return-value}

The key type of the map type. Must be destroyed with `duckdb_destroy_logical_type`.

<br>

###### `duckdb_map_type_value_type` {#docs:current:clients:c:api::duckdb_map_type_value_type}

Retrieves the value type of the given map type.

The result must be freed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_logical_type duckdb_map_type_value_type(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The logical type object

####### Return Value {#docs:current:clients:c:api::return-value}

The value type of the map type. Must be destroyed with `duckdb_destroy_logical_type`.

<br>

###### `duckdb_struct_type_child_count` {#docs:current:clients:c:api::duckdb_struct_type_child_count}

Returns the number of children of a struct type.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            idx_t duckdb_struct_type_child_count(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The logical type object

####### Return Value {#docs:current:clients:c:api::return-value}

The number of children of a struct type.

<br>

###### `duckdb_struct_type_child_name` {#docs:current:clients:c:api::duckdb_struct_type_child_name}

Retrieves the name of the struct child.

The result must be freed with `duckdb_free`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            char *duckdb_struct_type_child_name(
  duckdb_logical_type type,
  idx_t index
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The logical type object
* `index`: The child index

####### Return Value {#docs:current:clients:c:api::return-value}

The name of the struct type. Must be freed with `duckdb_free`.

<br>

###### `duckdb_struct_type_child_type` {#docs:current:clients:c:api::duckdb_struct_type_child_type}

Retrieves the child type of the given struct type at the specified index.

The result must be freed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_logical_type duckdb_struct_type_child_type(
  duckdb_logical_type type,
  idx_t index
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The logical type object
* `index`: The child index

####### Return Value {#docs:current:clients:c:api::return-value}

The child type of the struct type. Must be destroyed with `duckdb_destroy_logical_type`.

<br>

###### `duckdb_union_type_member_count` {#docs:current:clients:c:api::duckdb_union_type_member_count}

Returns the number of members that the union type has.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            idx_t duckdb_union_type_member_count(
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The logical type (union) object

####### Return Value {#docs:current:clients:c:api::return-value}

The number of members of a union type.

<br>

###### `duckdb_union_type_member_name` {#docs:current:clients:c:api::duckdb_union_type_member_name}

Retrieves the name of the union member.

The result must be freed with `duckdb_free`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            char *duckdb_union_type_member_name(
  duckdb_logical_type type,
  idx_t index
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The logical type object
* `index`: The child index

####### Return Value {#docs:current:clients:c:api::return-value}

The name of the union member. Must be freed with `duckdb_free`.

<br>

###### `duckdb_union_type_member_type` {#docs:current:clients:c:api::duckdb_union_type_member_type}

Retrieves the child type of the given union member at the specified index.

The result must be freed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_logical_type duckdb_union_type_member_type(
  duckdb_logical_type type,
  idx_t index
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The logical type object
* `index`: The child index

####### Return Value {#docs:current:clients:c:api::return-value}

The child type of the union member. Must be destroyed with `duckdb_destroy_logical_type`.

<br>

###### `duckdb_destroy_logical_type` {#docs:current:clients:c:api::duckdb_destroy_logical_type}

Destroys the logical type and de-allocates all memory allocated for that type.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_destroy_logical_type(
  duckdb_logical_type *type
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The logical type to destroy.

<br>

###### `duckdb_register_logical_type` {#docs:current:clients:c:api::duckdb_register_logical_type}

Registers a custom type within the given connection.
The type must have an alias

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_register_logical_type(
  duckdb_connection con,
  duckdb_logical_type type,
  duckdb_create_type_info info
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `con`: The connection to use
* `type`: The custom type to register

####### Return Value {#docs:current:clients:c:api::return-value}

Whether or not the registration was successful.

<br>

###### `duckdb_create_data_chunk` {#docs:current:clients:c:api::duckdb_create_data_chunk}

Creates an empty data chunk with the specified column types.
The result must be destroyed with `duckdb_destroy_data_chunk`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_data_chunk duckdb_create_data_chunk(
  duckdb_logical_type *types,
  idx_t column_count
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `types`: An array of column types. Column types can not contain ANY and INVALID types.
* `column_count`: The number of columns.

####### Return Value {#docs:current:clients:c:api::return-value}

The data chunk.

<br>

###### `duckdb_destroy_data_chunk` {#docs:current:clients:c:api::duckdb_destroy_data_chunk}

Destroys the data chunk and de-allocates all memory allocated for that chunk.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_destroy_data_chunk(
  duckdb_data_chunk *chunk
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `chunk`: The data chunk to destroy.

<br>

###### `duckdb_data_chunk_reset` {#docs:current:clients:c:api::duckdb_data_chunk_reset}

Resets a data chunk, clearing the validity masks and setting the cardinality of the data chunk to 0.
After calling this method, you must call `duckdb_vector_get_validity` and `duckdb_vector_get_data` to obtain current
data and validity pointers

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_data_chunk_reset(
  duckdb_data_chunk chunk
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `chunk`: The data chunk to reset.

<br>

###### `duckdb_data_chunk_get_column_count` {#docs:current:clients:c:api::duckdb_data_chunk_get_column_count}

Retrieves the number of columns in a data chunk.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            idx_t duckdb_data_chunk_get_column_count(
  duckdb_data_chunk chunk
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `chunk`: The data chunk to get the data from

####### Return Value {#docs:current:clients:c:api::return-value}

The number of columns in the data chunk

<br>

###### `duckdb_data_chunk_get_vector` {#docs:current:clients:c:api::duckdb_data_chunk_get_vector}

Retrieves the vector at the specified column index in the data chunk.

The pointer to the vector is valid for as long as the chunk is alive.
It does NOT need to be destroyed.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_vector duckdb_data_chunk_get_vector(
  duckdb_data_chunk chunk,
  idx_t col_idx
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `chunk`: The data chunk to get the data from

####### Return Value {#docs:current:clients:c:api::return-value}

The vector

<br>

###### `duckdb_data_chunk_get_size` {#docs:current:clients:c:api::duckdb_data_chunk_get_size}

Retrieves the current number of tuples in a data chunk.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            idx_t duckdb_data_chunk_get_size(
  duckdb_data_chunk chunk
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `chunk`: The data chunk to get the data from

####### Return Value {#docs:current:clients:c:api::return-value}

The number of tuples in the data chunk

<br>

###### `duckdb_data_chunk_set_size` {#docs:current:clients:c:api::duckdb_data_chunk_set_size}

Sets the current number of tuples in a data chunk.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_data_chunk_set_size(
  duckdb_data_chunk chunk,
  idx_t size
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `chunk`: The data chunk to set the size in
* `size`: The number of tuples in the data chunk

<br>

###### `duckdb_create_vector` {#docs:current:clients:c:api::duckdb_create_vector}

Creates a flat vector. Must be destroyed with `duckdb_destroy_vector`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_vector duckdb_create_vector(
  duckdb_logical_type type,
  idx_t capacity
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `type`: The logical type of the vector.
* `capacity`: The capacity of the vector.

####### Return Value {#docs:current:clients:c:api::return-value}

The vector.

<br>

###### `duckdb_destroy_vector` {#docs:current:clients:c:api::duckdb_destroy_vector}

Destroys the vector and de-allocates its memory.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_destroy_vector(
  duckdb_vector *vector
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `vector`: A pointer to the vector.

<br>

###### `duckdb_vector_get_column_type` {#docs:current:clients:c:api::duckdb_vector_get_column_type}

Retrieves the column type of the specified vector.

The result must be destroyed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_logical_type duckdb_vector_get_column_type(
  duckdb_vector vector
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `vector`: The vector get the data from

####### Return Value {#docs:current:clients:c:api::return-value}

The type of the vector

<br>

###### `duckdb_vector_get_data` {#docs:current:clients:c:api::duckdb_vector_get_data}

Retrieves the data pointer of the vector.

The data pointer can be used to read or write values from the vector.
How to read or write values depends on the type of the vector.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void *duckdb_vector_get_data(
  duckdb_vector vector
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `vector`: The vector to get the data from

####### Return Value {#docs:current:clients:c:api::return-value}

The data pointer

<br>

###### `duckdb_vector_get_validity` {#docs:current:clients:c:api::duckdb_vector_get_validity}

Retrieves the validity mask pointer of the specified vector.

If all values are valid, this function MIGHT return NULL!

The validity mask is a bitset that signifies null-ness within the data chunk.
It is a series of uint64_t values, where each uint64_t value contains validity for 64 tuples.
The bit is set to 1 if the value is valid (i.e., not NULL) or 0 if the value is invalid (i.e., NULL).

Validity of a specific value can be obtained like this:

idx_t entry_idx = row_idx / 64;
idx_t idx_in_entry = row_idx % 64;
bool is_valid = validity_mask[entry_idx] & (1 << idx_in_entry);

Alternatively, the (slower) duckdb_validity_row_is_valid function can be used.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            uint64_t *duckdb_vector_get_validity(
  duckdb_vector vector
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `vector`: The vector to get the data from

####### Return Value {#docs:current:clients:c:api::return-value}

The pointer to the validity mask, or NULL if no validity mask is present

<br>

###### `duckdb_vector_ensure_validity_writable` {#docs:current:clients:c:api::duckdb_vector_ensure_validity_writable}

Ensures the validity mask is writable by allocating it.

After this function is called, `duckdb_vector_get_validity` will ALWAYS return non-NULL.
This allows NULL values to be written to the vector, regardless of whether a validity mask was present before.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_vector_ensure_validity_writable(
  duckdb_vector vector
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `vector`: The vector to alter

<br>

###### `duckdb_vector_assign_string_element` {#docs:current:clients:c:api::duckdb_vector_assign_string_element}

Assigns a string element in the vector at the specified location.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_vector_assign_string_element(
  duckdb_vector vector,
  idx_t index,
  const char *str
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `vector`: The vector to alter
* `index`: The row position in the vector to assign the string to
* `str`: The null-terminated string

<br>

###### `duckdb_vector_assign_string_element_len` {#docs:current:clients:c:api::duckdb_vector_assign_string_element_len}

Assigns a string element in the vector at the specified location. You may also use this function to assign BLOBs.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_vector_assign_string_element_len(
  duckdb_vector vector,
  idx_t index,
  const char *str,
  idx_t str_len
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `vector`: The vector to alter
* `index`: The row position in the vector to assign the string to
* `str`: The string
* `str_len`: The length of the string (in bytes)

<br>

###### `duckdb_list_vector_get_child` {#docs:current:clients:c:api::duckdb_list_vector_get_child}

Retrieves the child vector of a list vector.

The resulting vector is valid as long as the parent vector is valid.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_vector duckdb_list_vector_get_child(
  duckdb_vector vector
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `vector`: The vector

####### Return Value {#docs:current:clients:c:api::return-value}

The child vector

<br>

###### `duckdb_list_vector_get_size` {#docs:current:clients:c:api::duckdb_list_vector_get_size}

Returns the size of the child vector of the list.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            idx_t duckdb_list_vector_get_size(
  duckdb_vector vector
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `vector`: The vector

####### Return Value {#docs:current:clients:c:api::return-value}

The size of the child list

<br>

###### `duckdb_list_vector_set_size` {#docs:current:clients:c:api::duckdb_list_vector_set_size}

Sets the total size of the underlying child-vector of a list vector.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_list_vector_set_size(
  duckdb_vector vector,
  idx_t size
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `vector`: The list vector.
* `size`: The size of the child list.

####### Return Value {#docs:current:clients:c:api::return-value}

The duckdb state. Returns DuckDBError if the vector is nullptr.

<br>

###### `duckdb_list_vector_reserve` {#docs:current:clients:c:api::duckdb_list_vector_reserve}

Sets the total capacity of the underlying child-vector of a list.

After calling this method, you must call `duckdb_vector_get_validity` and `duckdb_vector_get_data` to obtain current
data and validity pointers

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_list_vector_reserve(
  duckdb_vector vector,
  idx_t required_capacity
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `vector`: The list vector.
* `required_capacity`: the total capacity to reserve.

####### Return Value {#docs:current:clients:c:api::return-value}

The duckdb state. Returns DuckDBError if the vector is nullptr.

<br>

###### `duckdb_struct_vector_get_child` {#docs:current:clients:c:api::duckdb_struct_vector_get_child}

Retrieves the child vector of a struct vector.
The resulting vector is valid as long as the parent vector is valid.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_vector duckdb_struct_vector_get_child(
  duckdb_vector vector,
  idx_t index
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `vector`: The vector
* `index`: The child index

####### Return Value {#docs:current:clients:c:api::return-value}

The child vector

<br>

###### `duckdb_array_vector_get_child` {#docs:current:clients:c:api::duckdb_array_vector_get_child}

Retrieves the child vector of an array vector.
The resulting vector is valid as long as the parent vector is valid.
The resulting vector has the size of the parent vector multiplied by the array size.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_vector duckdb_array_vector_get_child(
  duckdb_vector vector
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `vector`: The vector

####### Return Value {#docs:current:clients:c:api::return-value}

The child vector

<br>

###### `duckdb_slice_vector` {#docs:current:clients:c:api::duckdb_slice_vector}

Slice a vector with a selection vector.
The length of the selection vector must be less than or equal to the length of the vector.
Turns the vector into a dictionary vector.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_slice_vector(
  duckdb_vector vector,
  duckdb_selection_vector sel,
  idx_t len
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `vector`: The vector to slice.
* `sel`: The selection vector.
* `len`: The length of the selection vector.

<br>

###### `duckdb_vector_copy_sel` {#docs:current:clients:c:api::duckdb_vector_copy_sel}

Copy the src vector to the dst with a selection vector that identifies which indices to copy.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_vector_copy_sel(
  duckdb_vector src,
  duckdb_vector dst,
  duckdb_selection_vector sel,
  idx_t src_count,
  idx_t src_offset,
  idx_t dst_offset
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `src`: The vector to copy from.
* `dst`: The vector to copy to.
* `sel`: The selection vector. The length of the selection vector should not be more than the length of the src
vector
* `src_count`: The number of entries from selection vector to copy. Think of this as the effective length of the
selection vector starting from index 0
* `src_offset`: The offset in the selection vector to copy from (important: actual number of items copied =
src_count - src_offset).
* `dst_offset`: The offset in the dst vector to start copying to.

<br>

###### `duckdb_vector_reference_value` {#docs:current:clients:c:api::duckdb_vector_reference_value}

Copies the value from `value` to `vector`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_vector_reference_value(
  duckdb_vector vector,
  duckdb_value value
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `vector`: The receiving vector.
* `value`: The value to copy into the vector.

<br>

###### `duckdb_vector_reference_vector` {#docs:current:clients:c:api::duckdb_vector_reference_vector}

Changes `to_vector` to reference `from_vector. After, the vectors share ownership of the data.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_vector_reference_vector(
  duckdb_vector to_vector,
  duckdb_vector from_vector
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `to_vector`: The receiving vector.
* `from_vector`: The vector to reference.

<br>

###### `duckdb_validity_row_is_valid` {#docs:current:clients:c:api::duckdb_validity_row_is_valid}

Returns whether or not a row is valid (i.e., not NULL) in the given validity mask.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            bool duckdb_validity_row_is_valid(
  uint64_t *validity,
  idx_t row
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `validity`: The validity mask, as obtained through `duckdb_vector_get_validity`
* `row`: The row index

####### Return Value {#docs:current:clients:c:api::return-value}

true if the row is valid, false otherwise

<br>

###### `duckdb_validity_set_row_validity` {#docs:current:clients:c:api::duckdb_validity_set_row_validity}

In a validity mask, sets a specific row to either valid or invalid.

Note that `duckdb_vector_ensure_validity_writable` should be called before calling `duckdb_vector_get_validity`,
to ensure that there is a validity mask to write to.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_validity_set_row_validity(
  uint64_t *validity,
  idx_t row,
  bool valid
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `validity`: The validity mask, as obtained through `duckdb_vector_get_validity`.
* `row`: The row index
* `valid`: Whether or not to set the row to valid, or invalid

<br>

###### `duckdb_validity_set_row_invalid` {#docs:current:clients:c:api::duckdb_validity_set_row_invalid}

In a validity mask, sets a specific row to invalid.

Equivalent to `duckdb_validity_set_row_validity` with valid set to false.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_validity_set_row_invalid(
  uint64_t *validity,
  idx_t row
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `validity`: The validity mask
* `row`: The row index

<br>

###### `duckdb_validity_set_row_valid` {#docs:current:clients:c:api::duckdb_validity_set_row_valid}

In a validity mask, sets a specific row to valid.

Equivalent to `duckdb_validity_set_row_validity` with valid set to true.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_validity_set_row_valid(
  uint64_t *validity,
  idx_t row
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `validity`: The validity mask
* `row`: The row index

<br>

###### `duckdb_create_scalar_function` {#docs:current:clients:c:api::duckdb_create_scalar_function}

Creates a new empty scalar function.

The return value must be destroyed with `duckdb_destroy_scalar_function`.


####### Return Value {#docs:current:clients:c:api::return-value}

The scalar function object.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_scalar_function duckdb_create_scalar_function(

);
```

<br>

###### `duckdb_destroy_scalar_function` {#docs:current:clients:c:api::duckdb_destroy_scalar_function}

Destroys the given scalar function object.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_destroy_scalar_function(
  duckdb_scalar_function *scalar_function
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `scalar_function`: The scalar function to destroy

<br>

###### `duckdb_scalar_function_set_name` {#docs:current:clients:c:api::duckdb_scalar_function_set_name}

Sets the name of the given scalar function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_scalar_function_set_name(
  duckdb_scalar_function scalar_function,
  const char *name
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `scalar_function`: The scalar function
* `name`: The name of the scalar function

<br>

###### `duckdb_scalar_function_set_varargs` {#docs:current:clients:c:api::duckdb_scalar_function_set_varargs}

Sets the parameters of the given scalar function to varargs. Does not require adding parameters with
duckdb_scalar_function_add_parameter.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_scalar_function_set_varargs(
  duckdb_scalar_function scalar_function,
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `scalar_function`: The scalar function.
* `type`: The type of the arguments.

####### Return Value {#docs:current:clients:c:api::return-value}

The parameter type. Cannot contain INVALID.

<br>

###### `duckdb_scalar_function_set_special_handling` {#docs:current:clients:c:api::duckdb_scalar_function_set_special_handling}

Sets the scalar function's null-handling behavior to special.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_scalar_function_set_special_handling(
  duckdb_scalar_function scalar_function
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `scalar_function`: The scalar function.

<br>

###### `duckdb_scalar_function_set_volatile` {#docs:current:clients:c:api::duckdb_scalar_function_set_volatile}

Sets the Function Stability of the scalar function to VOLATILE, indicating the function should be re-run for every row.
This limits optimization that can be performed for the function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_scalar_function_set_volatile(
  duckdb_scalar_function scalar_function
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `scalar_function`: The scalar function.

<br>

###### `duckdb_scalar_function_add_parameter` {#docs:current:clients:c:api::duckdb_scalar_function_add_parameter}

Adds a parameter to the scalar function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_scalar_function_add_parameter(
  duckdb_scalar_function scalar_function,
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `scalar_function`: The scalar function.
* `type`: The parameter type. Cannot contain INVALID.

<br>

###### `duckdb_scalar_function_set_return_type` {#docs:current:clients:c:api::duckdb_scalar_function_set_return_type}

Sets the return type of the scalar function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_scalar_function_set_return_type(
  duckdb_scalar_function scalar_function,
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `scalar_function`: The scalar function
* `type`: Cannot contain INVALID or ANY.

<br>

###### `duckdb_scalar_function_set_extra_info` {#docs:current:clients:c:api::duckdb_scalar_function_set_extra_info}

Assigns extra information to the scalar function that can be fetched during binding, etc.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_scalar_function_set_extra_info(
  duckdb_scalar_function scalar_function,
  void *extra_info,
  duckdb_delete_callback_t destroy
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `scalar_function`: The scalar function
* `extra_info`: The extra information
* `destroy`: The callback that will be called to destroy the extra information (if any)

<br>

###### `duckdb_scalar_function_set_bind` {#docs:current:clients:c:api::duckdb_scalar_function_set_bind}

Sets the (optional) bind function of the scalar function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_scalar_function_set_bind(
  duckdb_scalar_function scalar_function,
  duckdb_scalar_function_bind_t bind
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `scalar_function`: The scalar function.
* `bind`: The bind function.

<br>

###### `duckdb_scalar_function_set_bind_data` {#docs:current:clients:c:api::duckdb_scalar_function_set_bind_data}

Sets the user-provided bind data in the bind object of the scalar function.
The bind data object can be retrieved again during execution.
In most case, you also need to set the copy-callback of your bind data via duckdb_scalar_function_set_bind_data_copy.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_scalar_function_set_bind_data(
  duckdb_bind_info info,
  void *bind_data,
  duckdb_delete_callback_t destroy
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The bind info of the scalar function.
* `bind_data`: The bind data object.
* `destroy`: The callback to destroy the bind data (if any).

<br>

###### `duckdb_scalar_function_set_bind_data_copy` {#docs:current:clients:c:api::duckdb_scalar_function_set_bind_data_copy}

Sets the copy-callback for the user-provided bind data in the bind object of the scalar function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_scalar_function_set_bind_data_copy(
  duckdb_bind_info info,
  duckdb_copy_callback_t copy
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The bind info of the scalar function.
* `copy`: The callback to copy the bind data (if any).

<br>

###### `duckdb_scalar_function_bind_set_error` {#docs:current:clients:c:api::duckdb_scalar_function_bind_set_error}

Report that an error has occurred while calling bind on a scalar function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_scalar_function_bind_set_error(
  duckdb_bind_info info,
  const char *error
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The bind info object.
* `error`: The error message.

<br>

###### `duckdb_scalar_function_set_function` {#docs:current:clients:c:api::duckdb_scalar_function_set_function}

Sets the main function of the scalar function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_scalar_function_set_function(
  duckdb_scalar_function scalar_function,
  duckdb_scalar_function_t function
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `scalar_function`: The scalar function
* `function`: The function

<br>

###### `duckdb_register_scalar_function` {#docs:current:clients:c:api::duckdb_register_scalar_function}

Register the scalar function object within the given connection.

The function requires at least a name, a function and a return type.

If the function is incomplete or a function with this name already exists DuckDBError is returned.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_register_scalar_function(
  duckdb_connection con,
  duckdb_scalar_function scalar_function
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `con`: The connection to register it in.
* `scalar_function`: The function pointer

####### Return Value {#docs:current:clients:c:api::return-value}

Whether or not the registration was successful.

<br>

###### `duckdb_scalar_function_get_extra_info` {#docs:current:clients:c:api::duckdb_scalar_function_get_extra_info}

Retrieves the extra info of the function as set in `duckdb_scalar_function_set_extra_info`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void *duckdb_scalar_function_get_extra_info(
  duckdb_function_info info
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The info object.

####### Return Value {#docs:current:clients:c:api::return-value}

The extra info.

<br>

###### `duckdb_scalar_function_bind_get_extra_info` {#docs:current:clients:c:api::duckdb_scalar_function_bind_get_extra_info}

Retrieves the extra info of the function as set in the bind info.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void *duckdb_scalar_function_bind_get_extra_info(
  duckdb_bind_info info
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The info object.

####### Return Value {#docs:current:clients:c:api::return-value}

The extra info.

<br>

###### `duckdb_scalar_function_get_bind_data` {#docs:current:clients:c:api::duckdb_scalar_function_get_bind_data}

Gets the scalar function's bind data set by `duckdb_scalar_function_set_bind_data`.
Note that the bind data is read-only.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void *duckdb_scalar_function_get_bind_data(
  duckdb_function_info info
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The function info.

####### Return Value {#docs:current:clients:c:api::return-value}

The bind data object.

<br>

###### `duckdb_scalar_function_get_client_context` {#docs:current:clients:c:api::duckdb_scalar_function_get_client_context}

Retrieves the client context of the bind info of a scalar function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_scalar_function_get_client_context(
  duckdb_bind_info info,
  duckdb_client_context *out_context
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The bind info object of the scalar function.
* `out_context`: The client context of the bind info. Must be destroyed with `duckdb_destroy_client_context`.

<br>

###### `duckdb_scalar_function_set_error` {#docs:current:clients:c:api::duckdb_scalar_function_set_error}

Report that an error has occurred while executing the scalar function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_scalar_function_set_error(
  duckdb_function_info info,
  const char *error
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The info object.
* `error`: The error message

<br>

###### `duckdb_create_scalar_function_set` {#docs:current:clients:c:api::duckdb_create_scalar_function_set}

Creates a new empty scalar function set.

The return value must be destroyed with `duckdb_destroy_scalar_function_set`.


####### Return Value {#docs:current:clients:c:api::return-value}

The scalar function set object.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_scalar_function_set duckdb_create_scalar_function_set(
  const char *name
);
```

<br>

###### `duckdb_destroy_scalar_function_set` {#docs:current:clients:c:api::duckdb_destroy_scalar_function_set}

Destroys the given scalar function set object.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_destroy_scalar_function_set(
  duckdb_scalar_function_set *scalar_function_set
);
```

<br>

###### `duckdb_add_scalar_function_to_set` {#docs:current:clients:c:api::duckdb_add_scalar_function_to_set}

Adds the scalar function as a new overload to the scalar function set.

Returns DuckDBError if the function could not be added, for example if the overload already exists.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_add_scalar_function_to_set(
  duckdb_scalar_function_set set,
  duckdb_scalar_function function
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `set`: The scalar function set
* `function`: The function to add

<br>

###### `duckdb_register_scalar_function_set` {#docs:current:clients:c:api::duckdb_register_scalar_function_set}

Register the scalar function set within the given connection.

The set requires at least a single valid overload.

If the set is incomplete or a function with this name already exists DuckDBError is returned.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_register_scalar_function_set(
  duckdb_connection con,
  duckdb_scalar_function_set set
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `con`: The connection to register it in.
* `set`: The function set to register

####### Return Value {#docs:current:clients:c:api::return-value}

Whether or not the registration was successful.

<br>

###### `duckdb_scalar_function_bind_get_argument_count` {#docs:current:clients:c:api::duckdb_scalar_function_bind_get_argument_count}

Returns the number of input arguments of the scalar function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            idx_t duckdb_scalar_function_bind_get_argument_count(
  duckdb_bind_info info
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The bind info.

####### Return Value {#docs:current:clients:c:api::return-value}

The number of input arguments.

<br>

###### `duckdb_scalar_function_bind_get_argument` {#docs:current:clients:c:api::duckdb_scalar_function_bind_get_argument}

Returns the input argument at index of the scalar function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_expression duckdb_scalar_function_bind_get_argument(
  duckdb_bind_info info,
  idx_t index
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The bind info.
* `index`: The argument index.

####### Return Value {#docs:current:clients:c:api::return-value}

The input argument at index. Must be destroyed with `duckdb_destroy_expression`.

<br>

###### `duckdb_create_selection_vector` {#docs:current:clients:c:api::duckdb_create_selection_vector}

Creates a new selection vector of size `size`.
Must be destroyed with `duckdb_destroy_selection_vector`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_selection_vector duckdb_create_selection_vector(
  idx_t size
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `size`: The size of the selection vector.

####### Return Value {#docs:current:clients:c:api::return-value}

The selection vector.

<br>

###### `duckdb_destroy_selection_vector` {#docs:current:clients:c:api::duckdb_destroy_selection_vector}

Destroys the selection vector and de-allocates its memory.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_destroy_selection_vector(
  duckdb_selection_vector sel
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `sel`: The selection vector.

<br>

###### `duckdb_selection_vector_get_data_ptr` {#docs:current:clients:c:api::duckdb_selection_vector_get_data_ptr}

Access the data pointer of a selection vector.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            sel_t *duckdb_selection_vector_get_data_ptr(
  duckdb_selection_vector sel
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `sel`: The selection vector.

####### Return Value {#docs:current:clients:c:api::return-value}

The data pointer.

<br>

###### `duckdb_create_aggregate_function` {#docs:current:clients:c:api::duckdb_create_aggregate_function}

Creates a new empty aggregate function.

The return value should be destroyed with `duckdb_destroy_aggregate_function`.


####### Return Value {#docs:current:clients:c:api::return-value}

The aggregate function object.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_aggregate_function duckdb_create_aggregate_function(

);
```

<br>

###### `duckdb_destroy_aggregate_function` {#docs:current:clients:c:api::duckdb_destroy_aggregate_function}

Destroys the given aggregate function object.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_destroy_aggregate_function(
  duckdb_aggregate_function *aggregate_function
);
```

<br>

###### `duckdb_aggregate_function_set_name` {#docs:current:clients:c:api::duckdb_aggregate_function_set_name}

Sets the name of the given aggregate function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_aggregate_function_set_name(
  duckdb_aggregate_function aggregate_function,
  const char *name
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `aggregate_function`: The aggregate function
* `name`: The name of the aggregate function

<br>

###### `duckdb_aggregate_function_add_parameter` {#docs:current:clients:c:api::duckdb_aggregate_function_add_parameter}

Adds a parameter to the aggregate function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_aggregate_function_add_parameter(
  duckdb_aggregate_function aggregate_function,
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `aggregate_function`: The aggregate function.
* `type`: The parameter type. Cannot contain INVALID.

<br>

###### `duckdb_aggregate_function_set_return_type` {#docs:current:clients:c:api::duckdb_aggregate_function_set_return_type}

Sets the return type of the aggregate function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_aggregate_function_set_return_type(
  duckdb_aggregate_function aggregate_function,
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `aggregate_function`: The aggregate function.
* `type`: The return type. Cannot contain INVALID or ANY.

<br>

###### `duckdb_aggregate_function_set_functions` {#docs:current:clients:c:api::duckdb_aggregate_function_set_functions}

Sets the main functions of the aggregate function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_aggregate_function_set_functions(
  duckdb_aggregate_function aggregate_function,
  duckdb_aggregate_state_size state_size,
  duckdb_aggregate_init_t state_init,
  duckdb_aggregate_update_t update,
  duckdb_aggregate_combine_t combine,
  duckdb_aggregate_finalize_t finalize
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `aggregate_function`: The aggregate function
* `state_size`: state size
* `state_init`: state init function
* `update`: update states
* `combine`: combine states
* `finalize`: finalize states

<br>

###### `duckdb_aggregate_function_set_destructor` {#docs:current:clients:c:api::duckdb_aggregate_function_set_destructor}

Sets the state destructor callback of the aggregate function (optional)

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_aggregate_function_set_destructor(
  duckdb_aggregate_function aggregate_function,
  duckdb_aggregate_destroy_t destroy
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `aggregate_function`: The aggregate function
* `destroy`: state destroy callback

<br>

###### `duckdb_register_aggregate_function` {#docs:current:clients:c:api::duckdb_register_aggregate_function}

Register the aggregate function object within the given connection.

The function requires at least a name, functions and a return type.

If the function is incomplete or a function with this name already exists DuckDBError is returned.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_register_aggregate_function(
  duckdb_connection con,
  duckdb_aggregate_function aggregate_function
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `con`: The connection to register it in.

####### Return Value {#docs:current:clients:c:api::return-value}

Whether or not the registration was successful.

<br>

###### `duckdb_aggregate_function_set_special_handling` {#docs:current:clients:c:api::duckdb_aggregate_function_set_special_handling}

Sets the NULL handling of the aggregate function to SPECIAL_HANDLING.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_aggregate_function_set_special_handling(
  duckdb_aggregate_function aggregate_function
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `aggregate_function`: The aggregate function

<br>

###### `duckdb_aggregate_function_set_extra_info` {#docs:current:clients:c:api::duckdb_aggregate_function_set_extra_info}

Assigns extra information to the scalar function that can be fetched during binding, etc.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_aggregate_function_set_extra_info(
  duckdb_aggregate_function aggregate_function,
  void *extra_info,
  duckdb_delete_callback_t destroy
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `aggregate_function`: The aggregate function
* `extra_info`: The extra information
* `destroy`: The callback that will be called to destroy the extra information (if any)

<br>

###### `duckdb_aggregate_function_get_extra_info` {#docs:current:clients:c:api::duckdb_aggregate_function_get_extra_info}

Retrieves the extra info of the function as set in `duckdb_aggregate_function_set_extra_info`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void *duckdb_aggregate_function_get_extra_info(
  duckdb_function_info info
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The info object

####### Return Value {#docs:current:clients:c:api::return-value}

The extra info

<br>

###### `duckdb_aggregate_function_set_error` {#docs:current:clients:c:api::duckdb_aggregate_function_set_error}

Report that an error has occurred while executing the aggregate function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_aggregate_function_set_error(
  duckdb_function_info info,
  const char *error
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The info object
* `error`: The error message

<br>

###### `duckdb_create_aggregate_function_set` {#docs:current:clients:c:api::duckdb_create_aggregate_function_set}

Creates a new empty aggregate function set.

The return value should be destroyed with `duckdb_destroy_aggregate_function_set`.


####### Return Value {#docs:current:clients:c:api::return-value}

The aggregate function set object.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_aggregate_function_set duckdb_create_aggregate_function_set(
  const char *name
);
```

<br>

###### `duckdb_destroy_aggregate_function_set` {#docs:current:clients:c:api::duckdb_destroy_aggregate_function_set}

Destroys the given aggregate function set object.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_destroy_aggregate_function_set(
  duckdb_aggregate_function_set *aggregate_function_set
);
```

<br>

###### `duckdb_add_aggregate_function_to_set` {#docs:current:clients:c:api::duckdb_add_aggregate_function_to_set}

Adds the aggregate function as a new overload to the aggregate function set.

Returns DuckDBError if the function could not be added, for example if the overload already exists.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_add_aggregate_function_to_set(
  duckdb_aggregate_function_set set,
  duckdb_aggregate_function function
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `set`: The aggregate function set
* `function`: The function to add

<br>

###### `duckdb_register_aggregate_function_set` {#docs:current:clients:c:api::duckdb_register_aggregate_function_set}

Register the aggregate function set within the given connection.

The set requires at least a single valid overload.

If the set is incomplete or a function with this name already exists DuckDBError is returned.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_register_aggregate_function_set(
  duckdb_connection con,
  duckdb_aggregate_function_set set
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `con`: The connection to register it in.
* `set`: The function set to register

####### Return Value {#docs:current:clients:c:api::return-value}

Whether or not the registration was successful.

<br>

###### `duckdb_create_table_function` {#docs:current:clients:c:api::duckdb_create_table_function}

Creates a new empty table function.

The return value should be destroyed with `duckdb_destroy_table_function`.


####### Return Value {#docs:current:clients:c:api::return-value}

The table function object.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_table_function duckdb_create_table_function(

);
```

<br>

###### `duckdb_destroy_table_function` {#docs:current:clients:c:api::duckdb_destroy_table_function}

Destroys the given table function object.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_destroy_table_function(
  duckdb_table_function *table_function
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `table_function`: The table function to destroy

<br>

###### `duckdb_table_function_set_name` {#docs:current:clients:c:api::duckdb_table_function_set_name}

Sets the name of the given table function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_table_function_set_name(
  duckdb_table_function table_function,
  const char *name
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `table_function`: The table function
* `name`: The name of the table function

<br>

###### `duckdb_table_function_add_parameter` {#docs:current:clients:c:api::duckdb_table_function_add_parameter}

Adds a parameter to the table function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_table_function_add_parameter(
  duckdb_table_function table_function,
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `table_function`: The table function.
* `type`: The parameter type. Cannot contain INVALID.

<br>

###### `duckdb_table_function_add_named_parameter` {#docs:current:clients:c:api::duckdb_table_function_add_named_parameter}

Adds a named parameter to the table function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_table_function_add_named_parameter(
  duckdb_table_function table_function,
  const char *name,
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `table_function`: The table function.
* `name`: The parameter name.
* `type`: The parameter type. Cannot contain INVALID.

<br>

###### `duckdb_table_function_set_extra_info` {#docs:current:clients:c:api::duckdb_table_function_set_extra_info}

Assigns extra information to the table function that can be fetched during binding, etc.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_table_function_set_extra_info(
  duckdb_table_function table_function,
  void *extra_info,
  duckdb_delete_callback_t destroy
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `table_function`: The table function
* `extra_info`: The extra information
* `destroy`: The callback that will be called to destroy the extra information (if any)

<br>

###### `duckdb_table_function_set_bind` {#docs:current:clients:c:api::duckdb_table_function_set_bind}

Sets the bind function of the table function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_table_function_set_bind(
  duckdb_table_function table_function,
  duckdb_table_function_bind_t bind
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `table_function`: The table function
* `bind`: The bind function

<br>

###### `duckdb_table_function_set_init` {#docs:current:clients:c:api::duckdb_table_function_set_init}

Sets the init function of the table function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_table_function_set_init(
  duckdb_table_function table_function,
  duckdb_table_function_init_t init
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `table_function`: The table function
* `init`: The init function

<br>

###### `duckdb_table_function_set_local_init` {#docs:current:clients:c:api::duckdb_table_function_set_local_init}

Sets the thread-local init function of the table function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_table_function_set_local_init(
  duckdb_table_function table_function,
  duckdb_table_function_init_t init
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `table_function`: The table function
* `init`: The init function

<br>

###### `duckdb_table_function_set_function` {#docs:current:clients:c:api::duckdb_table_function_set_function}

Sets the main function of the table function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_table_function_set_function(
  duckdb_table_function table_function,
  duckdb_table_function_t function
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `table_function`: The table function
* `function`: The function

<br>

###### `duckdb_table_function_supports_projection_pushdown` {#docs:current:clients:c:api::duckdb_table_function_supports_projection_pushdown}

Sets whether or not the given table function supports projection pushdown.

If this is set to true, the system will provide a list of all required columns in the `init` stage through
the `duckdb_init_get_column_count` and `duckdb_init_get_column_index` functions.
If this is set to false (the default), the system will expect all columns to be projected.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_table_function_supports_projection_pushdown(
  duckdb_table_function table_function,
  bool pushdown
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `table_function`: The table function
* `pushdown`: True if the table function supports projection pushdown, false otherwise.

<br>

###### `duckdb_register_table_function` {#docs:current:clients:c:api::duckdb_register_table_function}

Register the table function object within the given connection.

The function requires at least a name, a bind function, an init function and a main function.

If the function is incomplete or a function with this name already exists DuckDBError is returned.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_register_table_function(
  duckdb_connection con,
  duckdb_table_function function
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `con`: The connection to register it in.
* `function`: The function pointer

####### Return Value {#docs:current:clients:c:api::return-value}

Whether or not the registration was successful.

<br>

###### `duckdb_bind_get_extra_info` {#docs:current:clients:c:api::duckdb_bind_get_extra_info}

Retrieves the extra info of the function as set in `duckdb_table_function_set_extra_info`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void *duckdb_bind_get_extra_info(
  duckdb_bind_info info
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The info object

####### Return Value {#docs:current:clients:c:api::return-value}

The extra info

<br>

###### `duckdb_table_function_get_client_context` {#docs:current:clients:c:api::duckdb_table_function_get_client_context}

Retrieves the client context of the bind info of a table function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_table_function_get_client_context(
  duckdb_bind_info info,
  duckdb_client_context *out_context
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The bind info object of the table function.
* `out_context`: The client context of the bind info. Must be destroyed with `duckdb_destroy_client_context`.

<br>

###### `duckdb_bind_add_result_column` {#docs:current:clients:c:api::duckdb_bind_add_result_column}

Adds a result column to the output of the table function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_bind_add_result_column(
  duckdb_bind_info info,
  const char *name,
  duckdb_logical_type type
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The table function's bind info.
* `name`: The column name.
* `type`: The logical column type.

<br>

###### `duckdb_bind_get_parameter_count` {#docs:current:clients:c:api::duckdb_bind_get_parameter_count}

Retrieves the number of regular (non-named) parameters to the function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            idx_t duckdb_bind_get_parameter_count(
  duckdb_bind_info info
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The info object

####### Return Value {#docs:current:clients:c:api::return-value}

The number of parameters

<br>

###### `duckdb_bind_get_parameter` {#docs:current:clients:c:api::duckdb_bind_get_parameter}

Retrieves the parameter at the given index.

The result must be destroyed with `duckdb_destroy_value`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_bind_get_parameter(
  duckdb_bind_info info,
  idx_t index
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The info object
* `index`: The index of the parameter to get

####### Return Value {#docs:current:clients:c:api::return-value}

The value of the parameter. Must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_bind_get_named_parameter` {#docs:current:clients:c:api::duckdb_bind_get_named_parameter}

Retrieves a named parameter with the given name.

The result must be destroyed with `duckdb_destroy_value`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_bind_get_named_parameter(
  duckdb_bind_info info,
  const char *name
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The info object
* `name`: The name of the parameter

####### Return Value {#docs:current:clients:c:api::return-value}

The value of the parameter. Must be destroyed with `duckdb_destroy_value`.

<br>

###### `duckdb_bind_set_bind_data` {#docs:current:clients:c:api::duckdb_bind_set_bind_data}

Sets the user-provided bind data in the bind object of the table function.
This object can be retrieved again during execution.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_bind_set_bind_data(
  duckdb_bind_info info,
  void *bind_data,
  duckdb_delete_callback_t destroy
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The bind info of the table function.
* `bind_data`: The bind data object.
* `destroy`: The callback to destroy the bind data (if any).

<br>

###### `duckdb_bind_set_cardinality` {#docs:current:clients:c:api::duckdb_bind_set_cardinality}

Sets the cardinality estimate for the table function, used for optimization.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_bind_set_cardinality(
  duckdb_bind_info info,
  idx_t cardinality,
  bool is_exact
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The bind data object.
* `is_exact`: Whether or not the cardinality estimate is exact, or an approximation

<br>

###### `duckdb_bind_set_error` {#docs:current:clients:c:api::duckdb_bind_set_error}

Report that an error has occurred while calling bind on a table function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_bind_set_error(
  duckdb_bind_info info,
  const char *error
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The info object
* `error`: The error message

<br>

###### `duckdb_init_get_extra_info` {#docs:current:clients:c:api::duckdb_init_get_extra_info}

Retrieves the extra info of the function as set in `duckdb_table_function_set_extra_info`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void *duckdb_init_get_extra_info(
  duckdb_init_info info
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The info object

####### Return Value {#docs:current:clients:c:api::return-value}

The extra info

<br>

###### `duckdb_init_get_bind_data` {#docs:current:clients:c:api::duckdb_init_get_bind_data}

Gets the bind data set by `duckdb_bind_set_bind_data` during the bind.

Note that the bind data should be considered as read-only.
For tracking state, use the init data instead.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void *duckdb_init_get_bind_data(
  duckdb_init_info info
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The info object

####### Return Value {#docs:current:clients:c:api::return-value}

The bind data object

<br>

###### `duckdb_init_set_init_data` {#docs:current:clients:c:api::duckdb_init_set_init_data}

Sets the user-provided init data in the init object. This object can be retrieved again during execution.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_init_set_init_data(
  duckdb_init_info info,
  void *init_data,
  duckdb_delete_callback_t destroy
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The info object
* `init_data`: The init data object.
* `destroy`: The callback that will be called to destroy the init data (if any)

<br>

###### `duckdb_init_get_column_count` {#docs:current:clients:c:api::duckdb_init_get_column_count}

Returns the number of projected columns.

This function must be used if projection pushdown is enabled to figure out which columns to emit.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            idx_t duckdb_init_get_column_count(
  duckdb_init_info info
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The info object

####### Return Value {#docs:current:clients:c:api::return-value}

The number of projected columns.

<br>

###### `duckdb_init_get_column_index` {#docs:current:clients:c:api::duckdb_init_get_column_index}

Returns the column index of the projected column at the specified position.

This function must be used if projection pushdown is enabled to figure out which columns to emit.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            idx_t duckdb_init_get_column_index(
  duckdb_init_info info,
  idx_t column_index
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The info object
* `column_index`: The index at which to get the projected column index, from 0..duckdb_init_get_column_count(info)

####### Return Value {#docs:current:clients:c:api::return-value}

The column index of the projected column.

<br>

###### `duckdb_init_set_max_threads` {#docs:current:clients:c:api::duckdb_init_set_max_threads}

Sets how many threads can process this table function in parallel (default: 1)

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_init_set_max_threads(
  duckdb_init_info info,
  idx_t max_threads
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The info object
* `max_threads`: The maximum amount of threads that can process this table function

<br>

###### `duckdb_init_set_error` {#docs:current:clients:c:api::duckdb_init_set_error}

Report that an error has occurred while calling init.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_init_set_error(
  duckdb_init_info info,
  const char *error
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The info object
* `error`: The error message

<br>

###### `duckdb_function_get_extra_info` {#docs:current:clients:c:api::duckdb_function_get_extra_info}

Retrieves the extra info of the function as set in `duckdb_table_function_set_extra_info`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void *duckdb_function_get_extra_info(
  duckdb_function_info info
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The info object

####### Return Value {#docs:current:clients:c:api::return-value}

The extra info

<br>

###### `duckdb_function_get_bind_data` {#docs:current:clients:c:api::duckdb_function_get_bind_data}

Gets the table function's bind data set by `duckdb_bind_set_bind_data`.

Note that the bind data is read-only.
For tracking state, use the init data instead.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void *duckdb_function_get_bind_data(
  duckdb_function_info info
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The function info object.

####### Return Value {#docs:current:clients:c:api::return-value}

The bind data object.

<br>

###### `duckdb_function_get_init_data` {#docs:current:clients:c:api::duckdb_function_get_init_data}

Gets the init data set by `duckdb_init_set_init_data` during the init.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void *duckdb_function_get_init_data(
  duckdb_function_info info
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The info object

####### Return Value {#docs:current:clients:c:api::return-value}

The init data object

<br>

###### `duckdb_function_get_local_init_data` {#docs:current:clients:c:api::duckdb_function_get_local_init_data}

Gets the thread-local init data set by `duckdb_init_set_init_data` during the local_init.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void *duckdb_function_get_local_init_data(
  duckdb_function_info info
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The info object

####### Return Value {#docs:current:clients:c:api::return-value}

The init data object

<br>

###### `duckdb_function_set_error` {#docs:current:clients:c:api::duckdb_function_set_error}

Report that an error has occurred while executing the function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_function_set_error(
  duckdb_function_info info,
  const char *error
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The info object
* `error`: The error message

<br>

###### `duckdb_add_replacement_scan` {#docs:current:clients:c:api::duckdb_add_replacement_scan}

Add a replacement scan definition to the specified database.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_add_replacement_scan(
  duckdb_database db,
  duckdb_replacement_callback_t replacement,
  void *extra_data,
  duckdb_delete_callback_t delete_callback
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `db`: The database object to add the replacement scan to
* `replacement`: The replacement scan callback
* `extra_data`: Extra data that is passed back into the specified callback
* `delete_callback`: The delete callback to call on the extra data, if any

<br>

###### `duckdb_replacement_scan_set_function_name` {#docs:current:clients:c:api::duckdb_replacement_scan_set_function_name}

Sets the replacement function name. If this function is called in the replacement callback,
the replacement scan is performed. If it is not called, the replacement callback is not performed.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_replacement_scan_set_function_name(
  duckdb_replacement_scan_info info,
  const char *function_name
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The info object
* `function_name`: The function name to substitute.

<br>

###### `duckdb_replacement_scan_add_parameter` {#docs:current:clients:c:api::duckdb_replacement_scan_add_parameter}

Adds a parameter to the replacement scan function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_replacement_scan_add_parameter(
  duckdb_replacement_scan_info info,
  duckdb_value parameter
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The info object
* `parameter`: The parameter to add.

<br>

###### `duckdb_replacement_scan_set_error` {#docs:current:clients:c:api::duckdb_replacement_scan_set_error}

Report that an error has occurred while executing the replacement scan.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_replacement_scan_set_error(
  duckdb_replacement_scan_info info,
  const char *error
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The info object
* `error`: The error message

<br>

###### `duckdb_get_profiling_info` {#docs:current:clients:c:api::duckdb_get_profiling_info}

Returns the root node of the profiling information. Returns nullptr, if profiling is not enabled.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_profiling_info duckdb_get_profiling_info(
  duckdb_connection connection
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `connection`: A connection object.

####### Return Value {#docs:current:clients:c:api::return-value}

A profiling information object.

<br>

###### `duckdb_profiling_info_get_value` {#docs:current:clients:c:api::duckdb_profiling_info_get_value}

Returns the value of the metric of the current profiling info node. Returns nullptr, if the metric does
 not exist or is not enabled. Currently, the value holds a string, and you can retrieve the string
 by calling the corresponding function: char *duckdb_get_varchar(duckdb_value value).

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_profiling_info_get_value(
  duckdb_profiling_info info,
  const char *key
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: A profiling information object.
* `key`: The name of the requested metric.

####### Return Value {#docs:current:clients:c:api::return-value}

The value of the metric. Must be freed with `duckdb_destroy_value`

<br>

###### `duckdb_profiling_info_get_metrics` {#docs:current:clients:c:api::duckdb_profiling_info_get_metrics}

Returns the key-value metric map of this profiling node as a MAP duckdb_value.
The individual elements are accessible via the duckdb_value MAP functions.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_value duckdb_profiling_info_get_metrics(
  duckdb_profiling_info info
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: A profiling information object.

####### Return Value {#docs:current:clients:c:api::return-value}

The key-value metric map as a MAP duckdb_value.

<br>

###### `duckdb_profiling_info_get_child_count` {#docs:current:clients:c:api::duckdb_profiling_info_get_child_count}

Returns the number of children in the current profiling info node.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            idx_t duckdb_profiling_info_get_child_count(
  duckdb_profiling_info info
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: A profiling information object.

####### Return Value {#docs:current:clients:c:api::return-value}

The number of children in the current node.

<br>

###### `duckdb_profiling_info_get_child` {#docs:current:clients:c:api::duckdb_profiling_info_get_child}

Returns the child node at the specified index.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_profiling_info duckdb_profiling_info_get_child(
  duckdb_profiling_info info,
  idx_t index
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: A profiling information object.
* `index`: The index of the child node.

####### Return Value {#docs:current:clients:c:api::return-value}

The child node at the specified index.

<br>

###### `duckdb_appender_create` {#docs:current:clients:c:api::duckdb_appender_create}

Creates an appender object.

Note that the object must be destroyed with `duckdb_appender_destroy`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_appender_create(
  duckdb_connection connection,
  const char *schema,
  const char *table,
  duckdb_appender *out_appender
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `connection`: The connection context to create the appender in.
* `schema`: The schema of the table to append to, or `nullptr` for the default schema.
* `table`: The table name to append to.
* `out_appender`: The resulting appender object.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_appender_create_ext` {#docs:current:clients:c:api::duckdb_appender_create_ext}

Creates an appender object.

Note that the object must be destroyed with `duckdb_appender_destroy`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_appender_create_ext(
  duckdb_connection connection,
  const char *catalog,
  const char *schema,
  const char *table,
  duckdb_appender *out_appender
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `connection`: The connection context to create the appender in.
* `catalog`: The catalog of the table to append to, or `nullptr` for the default catalog.
* `schema`: The schema of the table to append to, or `nullptr` for the default schema.
* `table`: The table name to append to.
* `out_appender`: The resulting appender object.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_appender_create_query` {#docs:current:clients:c:api::duckdb_appender_create_query}

Creates an appender object that executes the given query with any data appended to it.

Note that the object must be destroyed with `duckdb_appender_destroy`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_appender_create_query(
  duckdb_connection connection,
  const char *query,
  idx_t column_count,
  duckdb_logical_type *types,
  const char *table_name,
  const char **column_names,
  duckdb_appender *out_appender
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `connection`: The connection context to create the appender in.
* `query`: The query to execute, can be an INSERT, DELETE, UPDATE or MERGE INTO statement.
* `column_count`: The number of columns to append.
* `types`: The types of the columns to append.
* `table_name`: (optionally) the table name used to refer to the appended data, defaults to "appended_data".
* `column_names`: (optionally) the list of column names, defaults to "col1", "col2", ...
* `out_appender`: The resulting appender object.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_appender_column_count` {#docs:current:clients:c:api::duckdb_appender_column_count}

Returns the number of columns that belong to the appender.
If there is no active column list, then this equals the table's physical columns.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            idx_t duckdb_appender_column_count(
  duckdb_appender appender
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `appender`: The appender to get the column count from.

####### Return Value {#docs:current:clients:c:api::return-value}

The number of columns in the data chunks.

<br>

###### `duckdb_appender_column_type` {#docs:current:clients:c:api::duckdb_appender_column_type}

Returns the type of the column at the specified index. This is either a type in the active column list, or the same type
as a column in the receiving table.

Note: The resulting type must be destroyed with `duckdb_destroy_logical_type`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_logical_type duckdb_appender_column_type(
  duckdb_appender appender,
  idx_t col_idx
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `appender`: The appender to get the column type from.
* `col_idx`: The index of the column to get the type of.

####### Return Value {#docs:current:clients:c:api::return-value}

The `duckdb_logical_type` of the column.

<br>

###### `duckdb_appender_error` {#docs:current:clients:c:api::duckdb_appender_error}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.
Use duckdb_appender_error_data instead.

Returns the error message associated with the appender.
If the appender has no error message, this returns `nullptr` instead.

The error message should not be freed. It will be de-allocated when `duckdb_appender_destroy` is called.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            const char *duckdb_appender_error(
  duckdb_appender appender
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `appender`: The appender to get the error from.

####### Return Value {#docs:current:clients:c:api::return-value}

The error message, or `nullptr` if there is none.

<br>

###### `duckdb_appender_error_data` {#docs:current:clients:c:api::duckdb_appender_error_data}

Returns the error data associated with the appender.
Must be destroyed with duckdb_destroy_error_data.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_error_data duckdb_appender_error_data(
  duckdb_appender appender
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `appender`: The appender to get the error data from.

####### Return Value {#docs:current:clients:c:api::return-value}

The error data.

<br>

###### `duckdb_appender_flush` {#docs:current:clients:c:api::duckdb_appender_flush}

Flush the appender to the table, forcing the cache of the appender to be cleared. If flushing the data triggers a
constraint violation or any other error, then all data is invalidated, and this function returns DuckDBError.
It is not possible to append more values. Call duckdb_appender_error_data to obtain the error data followed by
duckdb_appender_destroy to destroy the invalidated appender.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_appender_flush(
  duckdb_appender appender
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `appender`: The appender to flush.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_appender_close` {#docs:current:clients:c:api::duckdb_appender_close}

Closes the appender by flushing all intermediate states and closing it for further appends. If flushing the data
triggers a constraint violation or any other error, then all data is invalidated, and this function returns DuckDBError.
Call duckdb_appender_error_data to obtain the error data followed by duckdb_appender_destroy to destroy the invalidated
appender.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_appender_close(
  duckdb_appender appender
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `appender`: The appender to flush and close.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_appender_destroy` {#docs:current:clients:c:api::duckdb_appender_destroy}

Closes the appender by flushing all intermediate states to the table and destroying it. By destroying it, this function
de-allocates all memory associated with the appender. If flushing the data triggers a constraint violation,
then all data is invalidated, and this function returns DuckDBError. Due to the destruction of the appender, it is no
longer possible to obtain the specific error message with duckdb_appender_error. Therefore, call duckdb_appender_close
before destroying the appender, if you need insights into the specific error.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_appender_destroy(
  duckdb_appender *appender
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `appender`: The appender to flush, close and destroy.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_appender_add_column` {#docs:current:clients:c:api::duckdb_appender_add_column}

Appends a column to the active column list of the appender. Immediately flushes all previous data.

The active column list specifies all columns that are expected when flushing the data. Any non-active columns are filled
with their default values, or NULL.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_appender_add_column(
  duckdb_appender appender,
  const char *name
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `appender`: The appender to add the column to.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_appender_clear_columns` {#docs:current:clients:c:api::duckdb_appender_clear_columns}

Removes all columns from the active column list of the appender, resetting the appender to treat all columns as active.
Immediately flushes all previous data.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_appender_clear_columns(
  duckdb_appender appender
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `appender`: The appender to clear the columns from.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_appender_begin_row` {#docs:current:clients:c:api::duckdb_appender_begin_row}

A nop function, provided for backwards compatibility reasons. Does nothing. Only `duckdb_appender_end_row` is required.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_appender_begin_row(
  duckdb_appender appender
);
```

<br>

###### `duckdb_appender_end_row` {#docs:current:clients:c:api::duckdb_appender_end_row}

Finish the current row of appends. After end_row is called, the next row can be appended.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_appender_end_row(
  duckdb_appender appender
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `appender`: The appender.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_append_default` {#docs:current:clients:c:api::duckdb_append_default}

Append a DEFAULT value (NULL if DEFAULT not available for column) to the appender.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_append_default(
  duckdb_appender appender
);
```

<br>

###### `duckdb_append_default_to_chunk` {#docs:current:clients:c:api::duckdb_append_default_to_chunk}

Append a DEFAULT value, at the specified row and column, (NULL if DEFAULT not available for column) to the chunk created
from the specified appender. The default value of the column must be a constant value. Non-deterministic expressions
like nextval('seq') or random() are not supported.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_append_default_to_chunk(
  duckdb_appender appender,
  duckdb_data_chunk chunk,
  idx_t col,
  idx_t row
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `appender`: The appender to get the default value from.
* `chunk`: The data chunk to append the default value to.
* `col`: The chunk column index to append the default value to.
* `row`: The chunk row index to append the default value to.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_append_bool` {#docs:current:clients:c:api::duckdb_append_bool}

Append a bool value to the appender.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_append_bool(
  duckdb_appender appender,
  bool value
);
```

<br>

###### `duckdb_append_int8` {#docs:current:clients:c:api::duckdb_append_int8}

Append an int8_t value to the appender.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_append_int8(
  duckdb_appender appender,
  int8_t value
);
```

<br>

###### `duckdb_append_int16` {#docs:current:clients:c:api::duckdb_append_int16}

Append an int16_t value to the appender.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_append_int16(
  duckdb_appender appender,
  int16_t value
);
```

<br>

###### `duckdb_append_int32` {#docs:current:clients:c:api::duckdb_append_int32}

Append an int32_t value to the appender.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_append_int32(
  duckdb_appender appender,
  int32_t value
);
```

<br>

###### `duckdb_append_int64` {#docs:current:clients:c:api::duckdb_append_int64}

Append an int64_t value to the appender.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_append_int64(
  duckdb_appender appender,
  int64_t value
);
```

<br>

###### `duckdb_append_hugeint` {#docs:current:clients:c:api::duckdb_append_hugeint}

Append a duckdb_hugeint value to the appender.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_append_hugeint(
  duckdb_appender appender,
  duckdb_hugeint value
);
```

<br>

###### `duckdb_append_uint8` {#docs:current:clients:c:api::duckdb_append_uint8}

Append a uint8_t value to the appender.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_append_uint8(
  duckdb_appender appender,
  uint8_t value
);
```

<br>

###### `duckdb_append_uint16` {#docs:current:clients:c:api::duckdb_append_uint16}

Append a uint16_t value to the appender.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_append_uint16(
  duckdb_appender appender,
  uint16_t value
);
```

<br>

###### `duckdb_append_uint32` {#docs:current:clients:c:api::duckdb_append_uint32}

Append a uint32_t value to the appender.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_append_uint32(
  duckdb_appender appender,
  uint32_t value
);
```

<br>

###### `duckdb_append_uint64` {#docs:current:clients:c:api::duckdb_append_uint64}

Append a uint64_t value to the appender.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_append_uint64(
  duckdb_appender appender,
  uint64_t value
);
```

<br>

###### `duckdb_append_uhugeint` {#docs:current:clients:c:api::duckdb_append_uhugeint}

Append a duckdb_uhugeint value to the appender.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_append_uhugeint(
  duckdb_appender appender,
  duckdb_uhugeint value
);
```

<br>

###### `duckdb_append_float` {#docs:current:clients:c:api::duckdb_append_float}

Append a float value to the appender.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_append_float(
  duckdb_appender appender,
  float value
);
```

<br>

###### `duckdb_append_double` {#docs:current:clients:c:api::duckdb_append_double}

Append a double value to the appender.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_append_double(
  duckdb_appender appender,
  double value
);
```

<br>

###### `duckdb_append_date` {#docs:current:clients:c:api::duckdb_append_date}

Append a duckdb_date value to the appender.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_append_date(
  duckdb_appender appender,
  duckdb_date value
);
```

<br>

###### `duckdb_append_time` {#docs:current:clients:c:api::duckdb_append_time}

Append a duckdb_time value to the appender.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_append_time(
  duckdb_appender appender,
  duckdb_time value
);
```

<br>

###### `duckdb_append_timestamp` {#docs:current:clients:c:api::duckdb_append_timestamp}

Append a duckdb_timestamp value to the appender.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_append_timestamp(
  duckdb_appender appender,
  duckdb_timestamp value
);
```

<br>

###### `duckdb_append_interval` {#docs:current:clients:c:api::duckdb_append_interval}

Append a duckdb_interval value to the appender.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_append_interval(
  duckdb_appender appender,
  duckdb_interval value
);
```

<br>

###### `duckdb_append_varchar` {#docs:current:clients:c:api::duckdb_append_varchar}

Append a varchar value to the appender.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_append_varchar(
  duckdb_appender appender,
  const char *val
);
```

<br>

###### `duckdb_append_varchar_length` {#docs:current:clients:c:api::duckdb_append_varchar_length}

Append a varchar value to the appender.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_append_varchar_length(
  duckdb_appender appender,
  const char *val,
  idx_t length
);
```

<br>

###### `duckdb_append_blob` {#docs:current:clients:c:api::duckdb_append_blob}

Append a blob value to the appender.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_append_blob(
  duckdb_appender appender,
  const void *data,
  idx_t length
);
```

<br>

###### `duckdb_append_null` {#docs:current:clients:c:api::duckdb_append_null}

Append a NULL value to the appender (of any type).

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_append_null(
  duckdb_appender appender
);
```

<br>

###### `duckdb_append_value` {#docs:current:clients:c:api::duckdb_append_value}

Append a duckdb_value to the appender.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_append_value(
  duckdb_appender appender,
  duckdb_value value
);
```

<br>

###### `duckdb_append_data_chunk` {#docs:current:clients:c:api::duckdb_append_data_chunk}

Appends a pre-filled data chunk to the specified appender.
 Attempts casting, if the data chunk types do not match the active appender types.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_append_data_chunk(
  duckdb_appender appender,
  duckdb_data_chunk chunk
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `appender`: The appender to append to.
* `chunk`: The data chunk to append.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_table_description_create` {#docs:current:clients:c:api::duckdb_table_description_create}

Creates a table description object. Note that `duckdb_table_description_destroy` should always be called on the
resulting table_description, even if the function returns `DuckDBError`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_table_description_create(
  duckdb_connection connection,
  const char *schema,
  const char *table,
  duckdb_table_description *out
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `connection`: The connection context.
* `schema`: The schema of the table, or `nullptr` for the default schema.
* `table`: The table name.
* `out`: The resulting table description object.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_table_description_create_ext` {#docs:current:clients:c:api::duckdb_table_description_create_ext}

Creates a table description object. Note that `duckdb_table_description_destroy` must be called on the resulting
table_description, even if the function returns `DuckDBError`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_table_description_create_ext(
  duckdb_connection connection,
  const char *catalog,
  const char *schema,
  const char *table,
  duckdb_table_description *out
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `connection`: The connection context.
* `catalog`: The catalog (database) name of the table, or `nullptr` for the default catalog.
* `schema`: The schema of the table, or `nullptr` for the default schema.
* `table`: The table name.
* `out`: The resulting table description object.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_table_description_destroy` {#docs:current:clients:c:api::duckdb_table_description_destroy}

Destroy the TableDescription object.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_table_description_destroy(
  duckdb_table_description *table_description
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `table_description`: The table_description to destroy.

<br>

###### `duckdb_table_description_error` {#docs:current:clients:c:api::duckdb_table_description_error}

Returns the error message associated with the given table_description.
If the table_description has no error message, this returns `nullptr` instead.
The error message should not be freed. It will be de-allocated when `duckdb_table_description_destroy` is called.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            const char *duckdb_table_description_error(
  duckdb_table_description table_description
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `table_description`: The table_description to get the error from.

####### Return Value {#docs:current:clients:c:api::return-value}

The error message, or `nullptr` if there is none.

<br>

###### `duckdb_column_has_default` {#docs:current:clients:c:api::duckdb_column_has_default}

Check if the column at 'index' index of the table has a DEFAULT expression.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_column_has_default(
  duckdb_table_description table_description,
  idx_t index,
  bool *out
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `table_description`: The table_description to query.
* `index`: The index of the column to query.
* `out`: The out-parameter used to store the result.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_table_description_get_column_name` {#docs:current:clients:c:api::duckdb_table_description_get_column_name}

Obtain the column name at 'index'.
The out result must be destroyed with `duckdb_free`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            char *duckdb_table_description_get_column_name(
  duckdb_table_description table_description,
  idx_t index
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `table_description`: The table_description to query.
* `index`: The index of the column to query.

####### Return Value {#docs:current:clients:c:api::return-value}

The column name.

<br>

###### `duckdb_to_arrow_schema` {#docs:current:clients:c:api::duckdb_to_arrow_schema}

Transforms a DuckDB Schema into an Arrow Schema

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_error_data duckdb_to_arrow_schema(
  duckdb_arrow_options arrow_options,
  duckdb_logical_type *types,
  const char **names,
  idx_t column_count,
  struct ArrowSchema *out_schema
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `arrow_options`: The Arrow settings used to produce arrow.
* `types`: The DuckDB logical types for each column in the schema.
* `names`: The names for each column in the schema.
* `column_count`: The number of columns that exist in the schema.
* `out_schema`: The resulting arrow schema. Must be destroyed with `out_schema->release(out_schema)`.

####### Return Value {#docs:current:clients:c:api::return-value}

The error data. Must be destroyed with `duckdb_destroy_error_data`.

<br>

###### `duckdb_data_chunk_to_arrow` {#docs:current:clients:c:api::duckdb_data_chunk_to_arrow}

Transforms a DuckDB data chunk into an Arrow array.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_error_data duckdb_data_chunk_to_arrow(
  duckdb_arrow_options arrow_options,
  duckdb_data_chunk chunk,
  struct ArrowArray *out_arrow_array
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `arrow_options`: The Arrow settings used to produce arrow.
* `chunk`: The DuckDB data chunk to convert.
* `out_arrow_array`: The output Arrow structure that will hold the converted data. Must be released with
`out_arrow_array->release(out_arrow_array)`

####### Return Value {#docs:current:clients:c:api::return-value}

The error data. Must be destroyed with `duckdb_destroy_error_data`.

<br>

###### `duckdb_schema_from_arrow` {#docs:current:clients:c:api::duckdb_schema_from_arrow}

Transforms an Arrow Schema into a DuckDB Schema.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_error_data duckdb_schema_from_arrow(
  duckdb_connection connection,
  struct ArrowSchema *schema,
  duckdb_arrow_converted_schema *out_types
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `connection`: The connection to get the transformation settings from.
* `schema`: The input Arrow schema. Must be released with `schema->release(schema)`.
* `out_types`: The Arrow converted schema with extra information about the arrow types. Must be destroyed with
`duckdb_destroy_arrow_converted_schema`.

####### Return Value {#docs:current:clients:c:api::return-value}

The error data. Must be destroyed with `duckdb_destroy_error_data`.

<br>

###### `duckdb_data_chunk_from_arrow` {#docs:current:clients:c:api::duckdb_data_chunk_from_arrow}

Transforms an Arrow array into a DuckDB data chunk. The data chunk will retain ownership of the underlying Arrow data.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_error_data duckdb_data_chunk_from_arrow(
  duckdb_connection connection,
  struct ArrowArray *arrow_array,
  duckdb_arrow_converted_schema converted_schema,
  duckdb_data_chunk *out_chunk
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `connection`: The connection to get the transformation settings from.
* `arrow_array`: The input Arrow array. Data ownership is passed on to DuckDB's DataChunk, the underlying object
does not need to be released and won't have ownership of the data.
* `converted_schema`: The Arrow converted schema with extra information about the arrow types.
* `out_chunk`: The resulting DuckDB data chunk. Must be destroyed by duckdb_destroy_data_chunk.

####### Return Value {#docs:current:clients:c:api::return-value}

The error data. Must be destroyed with `duckdb_destroy_error_data`.

<br>

###### `duckdb_destroy_arrow_converted_schema` {#docs:current:clients:c:api::duckdb_destroy_arrow_converted_schema}

Destroys the arrow converted schema and de-allocates all memory allocated for that arrow converted schema.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_destroy_arrow_converted_schema(
  duckdb_arrow_converted_schema *arrow_converted_schema
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `arrow_converted_schema`: The arrow converted schema to destroy.

<br>

###### `duckdb_query_arrow` {#docs:current:clients:c:api::duckdb_query_arrow}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.

Executes a SQL query within a connection and stores the full (materialized) result in an arrow structure.
If the query fails to execute, DuckDBError is returned and the error message can be retrieved by calling
`duckdb_query_arrow_error`.

Note that after running `duckdb_query_arrow`, `duckdb_destroy_arrow` must be called on the result object even if the
query fails, otherwise the error stored within the result will not be freed correctly.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_query_arrow(
  duckdb_connection connection,
  const char *query,
  duckdb_arrow *out_result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `connection`: The connection to perform the query in.
* `query`: The SQL query to run.
* `out_result`: The query result.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_query_arrow_schema` {#docs:current:clients:c:api::duckdb_query_arrow_schema}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.

Fetch the internal arrow schema from the arrow result. Remember to call release on the respective
ArrowSchema object.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_query_arrow_schema(
  duckdb_arrow result,
  duckdb_arrow_schema *out_schema
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `result`: The result to fetch the schema from.
* `out_schema`: The output schema.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_prepared_arrow_schema` {#docs:current:clients:c:api::duckdb_prepared_arrow_schema}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.

Fetch the internal arrow schema from the prepared statement. Remember to call release on the respective
ArrowSchema object.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_prepared_arrow_schema(
  duckdb_prepared_statement prepared,
  duckdb_arrow_schema *out_schema
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `prepared`: The prepared statement to fetch the schema from.
* `out_schema`: The output schema.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_result_arrow_array` {#docs:current:clients:c:api::duckdb_result_arrow_array}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.

Convert a data chunk into an arrow struct array. Remember to call release on the respective
ArrowArray object.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_result_arrow_array(
  duckdb_result result,
  duckdb_data_chunk chunk,
  duckdb_arrow_array *out_array
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `result`: The result object the data chunk have been fetched from.
* `chunk`: The data chunk to convert.
* `out_array`: The output array.

<br>

###### `duckdb_query_arrow_array` {#docs:current:clients:c:api::duckdb_query_arrow_array}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.

Fetch an internal arrow struct array from the arrow result. Remember to call release on the respective
ArrowArray object.

This function can be called multiple time to get next chunks, which will free the previous out_array.
So consume the out_array before calling this function again.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_query_arrow_array(
  duckdb_arrow result,
  duckdb_arrow_array *out_array
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `result`: The result to fetch the array from.
* `out_array`: The output array.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_arrow_column_count` {#docs:current:clients:c:api::duckdb_arrow_column_count}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.

Returns the number of columns present in the arrow result object.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            idx_t duckdb_arrow_column_count(
  duckdb_arrow result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `result`: The result object.

####### Return Value {#docs:current:clients:c:api::return-value}

The number of columns present in the result object.

<br>

###### `duckdb_arrow_row_count` {#docs:current:clients:c:api::duckdb_arrow_row_count}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.

Returns the number of rows present in the arrow result object.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            idx_t duckdb_arrow_row_count(
  duckdb_arrow result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `result`: The result object.

####### Return Value {#docs:current:clients:c:api::return-value}

The number of rows present in the result object.

<br>

###### `duckdb_arrow_rows_changed` {#docs:current:clients:c:api::duckdb_arrow_rows_changed}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.

Returns the number of rows changed by the query stored in the arrow result. This is relevant only for
INSERT/UPDATE/DELETE queries. For other queries the rows_changed will be 0.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            idx_t duckdb_arrow_rows_changed(
  duckdb_arrow result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `result`: The result object.

####### Return Value {#docs:current:clients:c:api::return-value}

The number of rows changed.

<br>

###### `duckdb_query_arrow_error` {#docs:current:clients:c:api::duckdb_query_arrow_error}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.

 Returns the error message contained within the result. The error is only set if `duckdb_query_arrow` returns
`DuckDBError`.

The error message should not be freed. It will be de-allocated when `duckdb_destroy_arrow` is called.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            const char *duckdb_query_arrow_error(
  duckdb_arrow result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `result`: The result object to fetch the error from.

####### Return Value {#docs:current:clients:c:api::return-value}

The error of the result.

<br>

###### `duckdb_destroy_arrow` {#docs:current:clients:c:api::duckdb_destroy_arrow}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.

Closes the result and de-allocates all memory allocated for the arrow result.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_destroy_arrow(
  duckdb_arrow *result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `result`: The result to destroy.

<br>

###### `duckdb_destroy_arrow_stream` {#docs:current:clients:c:api::duckdb_destroy_arrow_stream}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.

Releases the arrow array stream and de-allocates its memory.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_destroy_arrow_stream(
  duckdb_arrow_stream *stream_p
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `stream_p`: The arrow array stream to destroy.

<br>

###### `duckdb_execute_prepared_arrow` {#docs:current:clients:c:api::duckdb_execute_prepared_arrow}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.

Executes the prepared statement with the given bound parameters, and returns an arrow query result.
Note that after running `duckdb_execute_prepared_arrow`, `duckdb_destroy_arrow` must be called on the result object.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_execute_prepared_arrow(
  duckdb_prepared_statement prepared_statement,
  duckdb_arrow *out_result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `prepared_statement`: The prepared statement to execute.
* `out_result`: The query result.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_arrow_scan` {#docs:current:clients:c:api::duckdb_arrow_scan}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.

Scans the Arrow stream and creates a view with the given name.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_arrow_scan(
  duckdb_connection connection,
  const char *table_name,
  duckdb_arrow_stream arrow
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `connection`: The connection on which to execute the scan.
* `table_name`: Name of the temporary view to create.
* `arrow`: Arrow stream wrapper.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_arrow_array_scan` {#docs:current:clients:c:api::duckdb_arrow_array_scan}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.

Scans the Arrow array and creates a view with the given name.
Note that after running `duckdb_arrow_array_scan`, `duckdb_destroy_arrow_stream` must be called on the out stream.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_arrow_array_scan(
  duckdb_connection connection,
  const char *table_name,
  duckdb_arrow_schema arrow_schema,
  duckdb_arrow_array arrow_array,
  duckdb_arrow_stream *out_stream
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `connection`: The connection on which to execute the scan.
* `table_name`: Name of the temporary view to create.
* `arrow_schema`: Arrow schema wrapper.
* `arrow_array`: Arrow array wrapper.
* `out_stream`: Output array stream that wraps around the passed schema, for releasing/deleting once done.

####### Return Value {#docs:current:clients:c:api::return-value}

`DuckDBSuccess` on success or `DuckDBError` on failure.

<br>

###### `duckdb_execute_tasks` {#docs:current:clients:c:api::duckdb_execute_tasks}

Execute DuckDB tasks on this thread.

Will return after `max_tasks` have been executed, or if there are no more tasks present.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_execute_tasks(
  duckdb_database database,
  idx_t max_tasks
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `database`: The database object to execute tasks for
* `max_tasks`: The maximum amount of tasks to execute

<br>

###### `duckdb_create_task_state` {#docs:current:clients:c:api::duckdb_create_task_state}

Creates a task state that can be used with duckdb_execute_tasks_state to execute tasks until
`duckdb_finish_execution` is called on the state.

`duckdb_destroy_state` must be called on the result.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_task_state duckdb_create_task_state(
  duckdb_database database
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `database`: The database object to create the task state for

####### Return Value {#docs:current:clients:c:api::return-value}

The task state that can be used with duckdb_execute_tasks_state.

<br>

###### `duckdb_execute_tasks_state` {#docs:current:clients:c:api::duckdb_execute_tasks_state}

Execute DuckDB tasks on this thread.

The thread will keep on executing tasks forever, until duckdb_finish_execution is called on the state.
Multiple threads can share the same duckdb_task_state.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_execute_tasks_state(
  duckdb_task_state state
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `state`: The task state of the executor

<br>

###### `duckdb_execute_n_tasks_state` {#docs:current:clients:c:api::duckdb_execute_n_tasks_state}

Execute DuckDB tasks on this thread.

The thread will keep on executing tasks until either duckdb_finish_execution is called on the state,
max_tasks tasks have been executed or there are no more tasks to be executed.

Multiple threads can share the same duckdb_task_state.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            idx_t duckdb_execute_n_tasks_state(
  duckdb_task_state state,
  idx_t max_tasks
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `state`: The task state of the executor
* `max_tasks`: The maximum amount of tasks to execute

####### Return Value {#docs:current:clients:c:api::return-value}

The amount of tasks that have actually been executed

<br>

###### `duckdb_finish_execution` {#docs:current:clients:c:api::duckdb_finish_execution}

Finish execution on a specific task.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_finish_execution(
  duckdb_task_state state
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `state`: The task state to finish execution

<br>

###### `duckdb_task_state_is_finished` {#docs:current:clients:c:api::duckdb_task_state_is_finished}

Check if the provided duckdb_task_state has finished execution

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            bool duckdb_task_state_is_finished(
  duckdb_task_state state
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `state`: The task state to inspect

####### Return Value {#docs:current:clients:c:api::return-value}

Whether or not duckdb_finish_execution has been called on the task state

<br>

###### `duckdb_destroy_task_state` {#docs:current:clients:c:api::duckdb_destroy_task_state}

Destroys the task state returned from duckdb_create_task_state.

Note that this should not be called while there is an active duckdb_execute_tasks_state running
on the task state.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_destroy_task_state(
  duckdb_task_state state
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `state`: The task state to clean up

<br>

###### `duckdb_execution_is_finished` {#docs:current:clients:c:api::duckdb_execution_is_finished}

Returns true if the execution of the current query is finished.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            bool duckdb_execution_is_finished(
  duckdb_connection con
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `con`: The connection on which to check

<br>

###### `duckdb_stream_fetch_chunk` {#docs:current:clients:c:api::duckdb_stream_fetch_chunk}

> **Warning.** Deprecation notice. This method is scheduled for removal in a future release.

Fetches a data chunk from the (streaming) duckdb_result. This function should be called repeatedly until the result is
exhausted.

The result must be destroyed with `duckdb_destroy_data_chunk`.

This function can only be used on duckdb_results created with 'duckdb_pending_prepared_streaming'

If this function is used, none of the other result functions can be used and vice versa (i.e., this function cannot be
mixed with the legacy result functions or the materialized result functions).

It is not known beforehand how many chunks will be returned by this result.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_data_chunk duckdb_stream_fetch_chunk(
  duckdb_result result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `result`: The result object to fetch the data chunk from.

####### Return Value {#docs:current:clients:c:api::return-value}

The resulting data chunk. Returns `NULL` if the result has an error.

<br>

###### `duckdb_fetch_chunk` {#docs:current:clients:c:api::duckdb_fetch_chunk}

Fetches a data chunk from a duckdb_result. This function should be called repeatedly until the result is exhausted.

The result must be destroyed with `duckdb_destroy_data_chunk`.

It is not known beforehand how many chunks will be returned by this result.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_data_chunk duckdb_fetch_chunk(
  duckdb_result result
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `result`: The result object to fetch the data chunk from.

####### Return Value {#docs:current:clients:c:api::return-value}

The resulting data chunk. Returns `NULL` if the result has an error.

<br>

###### `duckdb_create_cast_function` {#docs:current:clients:c:api::duckdb_create_cast_function}

Creates a new cast function object.


####### Return Value {#docs:current:clients:c:api::return-value}

The cast function object.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_cast_function duckdb_create_cast_function(

);
```

<br>

###### `duckdb_cast_function_set_source_type` {#docs:current:clients:c:api::duckdb_cast_function_set_source_type}

Sets the source type of the cast function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_cast_function_set_source_type(
  duckdb_cast_function cast_function,
  duckdb_logical_type source_type
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `cast_function`: The cast function object.
* `source_type`: The source type to set.

<br>

###### `duckdb_cast_function_set_target_type` {#docs:current:clients:c:api::duckdb_cast_function_set_target_type}

Sets the target type of the cast function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_cast_function_set_target_type(
  duckdb_cast_function cast_function,
  duckdb_logical_type target_type
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `cast_function`: The cast function object.
* `target_type`: The target type to set.

<br>

###### `duckdb_cast_function_set_implicit_cast_cost` {#docs:current:clients:c:api::duckdb_cast_function_set_implicit_cast_cost}

Sets the "cost" of implicitly casting the source type to the target type using this function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_cast_function_set_implicit_cast_cost(
  duckdb_cast_function cast_function,
  int64_t cost
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `cast_function`: The cast function object.
* `cost`: The cost to set.

<br>

###### `duckdb_cast_function_set_function` {#docs:current:clients:c:api::duckdb_cast_function_set_function}

Sets the actual cast function to use.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_cast_function_set_function(
  duckdb_cast_function cast_function,
  duckdb_cast_function_t function
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `cast_function`: The cast function object.
* `function`: The function to set.

<br>

###### `duckdb_cast_function_set_extra_info` {#docs:current:clients:c:api::duckdb_cast_function_set_extra_info}

Assigns extra information to the cast function that can be fetched during execution, etc.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_cast_function_set_extra_info(
  duckdb_cast_function cast_function,
  void *extra_info,
  duckdb_delete_callback_t destroy
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `extra_info`: The extra information
* `destroy`: The callback that will be called to destroy the extra information (if any)

<br>

###### `duckdb_cast_function_get_extra_info` {#docs:current:clients:c:api::duckdb_cast_function_get_extra_info}

Retrieves the extra info of the function as set in `duckdb_cast_function_set_extra_info`.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void *duckdb_cast_function_get_extra_info(
  duckdb_function_info info
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The info object.

####### Return Value {#docs:current:clients:c:api::return-value}

The extra info.

<br>

###### `duckdb_cast_function_get_cast_mode` {#docs:current:clients:c:api::duckdb_cast_function_get_cast_mode}

Get the cast execution mode from the given function info.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_cast_mode duckdb_cast_function_get_cast_mode(
  duckdb_function_info info
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The info object.

####### Return Value {#docs:current:clients:c:api::return-value}

The cast mode.

<br>

###### `duckdb_cast_function_set_error` {#docs:current:clients:c:api::duckdb_cast_function_set_error}

Report that an error has occurred while executing the cast function.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_cast_function_set_error(
  duckdb_function_info info,
  const char *error
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The info object.
* `error`: The error message.

<br>

###### `duckdb_cast_function_set_row_error` {#docs:current:clients:c:api::duckdb_cast_function_set_row_error}

Report that an error has occurred while executing the cast function, setting the corresponding output row to NULL.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_cast_function_set_row_error(
  duckdb_function_info info,
  const char *error,
  idx_t row,
  duckdb_vector output
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `info`: The info object.
* `error`: The error message.
* `row`: The index of the row within the output vector to set to NULL.
* `output`: The output vector.

<br>

###### `duckdb_register_cast_function` {#docs:current:clients:c:api::duckdb_register_cast_function}

Registers a cast function within the given connection.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_state duckdb_register_cast_function(
  duckdb_connection con,
  duckdb_cast_function cast_function
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `con`: The connection to use.
* `cast_function`: The cast function to register.

####### Return Value {#docs:current:clients:c:api::return-value}

Whether or not the registration was successful.

<br>

###### `duckdb_destroy_cast_function` {#docs:current:clients:c:api::duckdb_destroy_cast_function}

Destroys the cast function object.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_destroy_cast_function(
  duckdb_cast_function *cast_function
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `cast_function`: The cast function object.

<br>

###### `duckdb_destroy_expression` {#docs:current:clients:c:api::duckdb_destroy_expression}

Destroys the expression and de-allocates its memory.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            void duckdb_destroy_expression(
  duckdb_expression *expr
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `expr`: A pointer to the expression.

<br>

###### `duckdb_expression_return_type` {#docs:current:clients:c:api::duckdb_expression_return_type}

Returns the return type of an expression.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_logical_type duckdb_expression_return_type(
  duckdb_expression expr
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `expr`: The expression.

####### Return Value {#docs:current:clients:c:api::return-value}

The return type. Must be destroyed with `duckdb_destroy_logical_type`.

<br>

###### `duckdb_expression_is_foldable` {#docs:current:clients:c:api::duckdb_expression_is_foldable}

Returns whether the expression is foldable into a value or not.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            bool duckdb_expression_is_foldable(
  duckdb_expression expr
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `expr`: The expression.

####### Return Value {#docs:current:clients:c:api::return-value}

True, if the expression is foldable, else false.

<br>

###### `duckdb_expression_fold` {#docs:current:clients:c:api::duckdb_expression_fold}

Folds an expression creating a folded value.

####### Syntax {#docs:current:clients:c:api::syntax}

```c
            duckdb_error_data duckdb_expression_fold(
  duckdb_client_context context,
  duckdb_expression expr,
  duckdb_value *out_value
);
```


####### Parameters {#docs:current:clients:c:api::parameters}

* `context`: The client context.
* `expr`: The expression. Must be foldable.
* `out_value`: The folded value, if folding was successful. Must be destroyed with `duckdb_destroy_value`.

####### Return Value {#docs:current:clients:c:api::return-value}

The error data. Must be destroyed with `duckdb_destroy_error_data`.

<br>

## C++ API {#docs:current:clients:cpp}

> Installation To use the DuckDB C++ API, download the [`libduckdb` archive](https://duckdb.org/install/index.html?environment=c) for your platform.
>
> The latest stable version of the DuckDB C++ API is 1.5.2.

> **Warning.** DuckDB's C++ API is internal.
> It is not guaranteed to be stable and can change without notice.
> If you would like to build an application on DuckDB, we recommend using the [C API](#docs:current:clients:c:overview).

#### Installation {#docs:current:clients:cpp::installation}

The DuckDB C++ API can be installed as part of the `libduckdb` packages. Please see the [installation page](https://duckdb.org/install) for details.

#### Basic API Usage {#docs:current:clients:cpp::basic-api-usage}

DuckDB implements a custom C++ API. This is built around the abstractions of a database instance (` DuckDB` class), multiple `Connection`s to the database instance and `QueryResult` instances as the result of queries. The header file for the C++ API is `duckdb.hpp`.

##### Startup & Shutdown {#docs:current:clients:cpp::startup--shutdown}

To use DuckDB, you must first initialize a `DuckDB` instance using its constructor. `DuckDB()` takes as parameter the database file to read and write from. The special value `nullptr` can be used to create an **in-memory database**. Note that for an in-memory database no data is persisted to disk (i.e., all data is lost when you exit the process). The second parameter to the `DuckDB` constructor is an optional `DBConfig` object. In `DBConfig`, you can set various database parameters, for example the read/write mode or memory limits. The `DuckDB` constructor may throw exceptions, for example if the database file is not usable.

With the `DuckDB` instance, you can create one or many `Connection` instances using the `Connection()` constructor. While connections should be thread-safe, they will be locked during querying. It is therefore recommended that each thread uses its own connection if you are in a multithreaded environment.

```cpp
DuckDB db(nullptr);
Connection con(db);
```

##### Querying {#docs:current:clients:cpp::querying}

Connections expose the `Query()` method to send a SQL query string to DuckDB from C++. `Query()` fully materializes the query result as a `MaterializedQueryResult` in memory before returning at which point the query result can be consumed. There is also a streaming API for queries, see further below.

```cpp
// create a table
con.Query("CREATE TABLE integers (i INTEGER, j INTEGER)");

// insert three rows into the table
con.Query("INSERT INTO integers VALUES (3, 4), (5, 6), (7, NULL)");

auto result = con.Query("SELECT * FROM integers");
if (result->HasError()) {
    cerr << result->GetError() << endl;
} else {
    cout << result->ToString() << endl;
}
```

The `MaterializedQueryResult` instance contains firstly two fields that indicate whether the query was successful. `Query` will not throw exceptions under normal circumstances. Instead, invalid queries or other issues will lead to the `success` Boolean field in the query result instance to be set to `false`. In this case an error message may be available in `error` as a string. The methods `GetErrorType()` and `GetErrorObject()` are also available for any `QueryResult` instance which may aid in more explicit error handling. 

```cpp
auto result = con.Query("INSERT INTO integers VALUES (1, 2)");
if (result->HasError()) {
    auto errorType = result->GetErrorType();
    switch (errorType) {
    case duckdb::ExceptionType::CONSTRAINT: {
        // Example handling
        auto errorObject = result->GetErrorObject();
        errorObject.ConvertErrorToJSON(); 
        std::cout << errorObject.Message() << std::endl;
        break;
    }
    // More handling
    }
} else {
    // Normal code
}
```

If successful, other fields are set: the type of statement that was just executed (e.g., `StatementType::INSERT_STATEMENT`) is contained in `statement_type`. The high-level (“Logical type”/“SQL type”) types of the result set columns are in `types`. The names of the result columns are in the `names` string vector. In case multiple result sets are returned, for example because the result set contained multiple statements, the result set can be chained using the `next` field.

DuckDB also supports prepared statements in the C++ API with the `Prepare()` method. This returns an instance of `PreparedStatement`. This instance can be used to execute the prepared statement with parameters. Below is an example:

```cpp
std::unique_ptr<PreparedStatement> prepare = con.Prepare("SELECT count(*) FROM a WHERE i = $1");
std::unique_ptr<QueryResult> result = prepare->Execute(12);
```

> **Warning.** Do **not** use prepared statements to insert large amounts of data into DuckDB. See the [data import documentation](#docs:current:data:overview) for better options.

##### UDF API {#docs:current:clients:cpp::udf-api}

The UDF API allows the definition of user-defined functions. It is exposed in `duckdb:Connection` through the methods: `CreateScalarFunction()`, `CreateVectorizedFunction()`, and variants.
These methods create UDFs in the temporary schema (` TEMP_SCHEMA`) of the owner connection that is the only one allowed to use and change them.

###### CreateScalarFunction {#docs:current:clients:cpp::createscalarfunction}

The user can code an ordinary scalar function and invoke the `CreateScalarFunction()` to register and afterward use the UDF in a `SELECT` statement, for instance:

```cpp
bool bigger_than_four(int value) {
    return value > 4;
}

connection.CreateScalarFunction<bool, int>("bigger_than_four", &bigger_than_four);

connection.Query("SELECT bigger_than_four(i) FROM (VALUES (3), (5)) tbl(i)")->Print();
```

The `CreateScalarFunction()` methods automatically create vectorized scalar UDFs so they are as efficient as built-in functions, we have two variants of this method interface as follows:

**1.**

```cpp
template<typename TR, typename... Args>
void CreateScalarFunction(string name, TR (*udf_func)(Args…))
```

* template parameters:
    * **TR** is the return type of the UDF function.
    * **Args** are the arguments up to 3 for the UDF function (this method only supports until ternary functions).
* **name** is the name to register the UDF function.
* **udf_func** is a pointer to the UDF function.

This method automatically discovers from the template typenames the corresponding LogicalTypes:

* `bool` → `LogicalType::BOOLEAN`
* `int8_t` → `LogicalType::TINYINT`
* `int16_t` → `LogicalType::SMALLINT`
* `int32_t` → `LogicalType::INTEGER`
* `int64_t` → `LogicalType::BIGINT`
* `float` → `LogicalType::FLOAT`
* `double` → `LogicalType::DOUBLE`
* `string_t` → `LogicalType::VARCHAR`

In DuckDB some primitive types, e.g., `int32_t`, are mapped to the same `LogicalType`: `INTEGER`, `TIME` and `DATE`, then for disambiguation the users can use the following overloaded method.

**2.**

```cpp
template<typename TR, typename... Args>
void CreateScalarFunction(string name, vector<LogicalType> args, LogicalType ret_type, TR (*udf_func)(Args…))
```

An example of use would be:

```cpp
int32_t udf_date(int32_t a) {
    return a;
}

con.Query("CREATE TABLE dates (d DATE)");
con.Query("INSERT INTO dates VALUES ('1992-01-01')");

con.CreateScalarFunction<int32_t, int32_t>("udf_date", {LogicalType::DATE}, LogicalType::DATE, &udf_date);

con.Query("SELECT udf_date(d) FROM dates")->Print();
```

* template parameters:
    * **TR** is the return type of the UDF function.
    * **Args** are the arguments up to 3 for the UDF function (this method only supports until ternary functions).
* **name** is the name to register the UDF function.
* **args** are the LogicalType arguments that the function uses, which should match with the template Args types.
* **ret_type** is the LogicalType of return of the function, which should match with the template TR type.
* **udf_func** is a pointer to the UDF function.

This function checks the template types against the LogicalTypes passed as arguments and they must match as follows:

* LogicalTypeId::BOOLEAN → bool
* LogicalTypeId::TINYINT → int8_t
* LogicalTypeId::SMALLINT → int16_t
* LogicalTypeId::DATE, LogicalTypeId::TIME, LogicalTypeId::INTEGER → int32_t
* LogicalTypeId::BIGINT, LogicalTypeId::TIMESTAMP → int64_t
* LogicalTypeId::FLOAT, LogicalTypeId::DOUBLE, LogicalTypeId::DECIMAL → double
* LogicalTypeId::VARCHAR, LogicalTypeId::CHAR, LogicalTypeId::BLOB → string_t
* LogicalTypeId::VARBINARY → blob_t

###### CreateVectorizedFunction {#docs:current:clients:cpp::createvectorizedfunction}

The `CreateVectorizedFunction()` methods register a vectorized UDF such as:

```cpp
/*
* This vectorized function copies the input values to the result vector
*/
template<typename TYPE>
static void udf_vectorized(DataChunk &args, ExpressionState &state, Vector &result) {
    // set the result vector type
    result.vector_type = VectorType::FLAT_VECTOR;
    // get a raw array from the result
    auto result_data = FlatVector::GetData<TYPE>(result);

    // get the solely input vector
    auto &input = args.data[0];
    // now get an orrified vector
    VectorData vdata;
    input.Orrify(args.size(), vdata);

    // get a raw array from the orrified input
    auto input_data = (TYPE *)vdata.data;

    // handling the data
    for (idx_t i = 0; i < args.size(); i++) {
        auto idx = vdata.sel->get_index(i);
        if ((*vdata.nullmask)[idx]) {
            continue;
        }
        result_data[i] = input_data[idx];
    }
}

con.Query("CREATE TABLE integers (i INTEGER)");
con.Query("INSERT INTO integers VALUES (1), (2), (3), (999)");

con.CreateVectorizedFunction<int, int>("udf_vectorized_int", &&udf_vectorized<int>);

con.Query("SELECT udf_vectorized_int(i) FROM integers")->Print();
```

The Vectorized UDF is a pointer of the type _scalar_function_t_:

```cpp
typedef std::function<void(DataChunk &args, ExpressionState &expr, Vector &result)> scalar_function_t;
```

* **args** is a [DataChunk](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/common/types/data_chunk.hpp) that holds a set of input vectors for the UDF that all have the same length.
* **expr** is an [ExpressionState](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/execution/expression_executor_state.hpp) that provides information to the query's expression state.
* **result** is a [Vector](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/common/types/vector.hpp) to store the result values.

There are different vector types to handle in a Vectorized UDF:

* ConstantVector
* DictionaryVector
* FlatVector
* ListVector
* StringVector
* StructVector
* SequenceVector

The general API of the `CreateVectorizedFunction()` method is as follows:

**1.**

```cpp
template<typename TR, typename... Args>
void CreateVectorizedFunction(string name, scalar_function_t udf_func, LogicalType varargs = LogicalType::INVALID)
```

* template parameters:
    * **TR** is the return type of the UDF function.
    * **Args** are the arguments up to 3 for the UDF function.
* **name** is the name to register the UDF function.
* **udf_func** is a _vectorized_ UDF function.
* **varargs** The type of varargs to support, or LogicalTypeId::INVALID (default value) if the function does not accept variable length arguments.

This method automatically discovers from the template typenames the corresponding LogicalTypes:

* `bool` → `LogicalType::BOOLEAN`
* `int8_t` → `LogicalType::TINYINT`
* `int16_t` → `LogicalType::SMALLINT`
* `int32_t` → `LogicalType::INTEGER`
* `int64_t` → `LogicalType::BIGINT`
* `float` → `LogicalType::FLOAT`
* `double` → `LogicalType::DOUBLE`
* `string_t` → `LogicalType::VARCHAR`

**2.**

```cpp
template<typename TR, typename... Args>
void CreateVectorizedFunction(string name, vector<LogicalType> args, LogicalType ret_type, scalar_function_t udf_func, LogicalType varargs = LogicalType::INVALID)
```

## CLI {#clients:cli}

### Command Line Client {#docs:current:clients:cli:overview}

> Installation To use the DuckDB CLI client, visit the [CLI installation page](https://duckdb.org/install/index.html?environment=cli).
>
> The latest stable version of the DuckDB command line client is 1.5.2.

#### Installation {#docs:current:clients:cli:overview::installation}

The DuckDB CLI (Command Line Interface) is a single, dependency-free executable. It is precompiled for Windows, Mac and Linux for both the stable version and for nightly builds produced by GitHub Actions. Please see the [installation page](https://duckdb.org/install) under the CLI tab for download links.

The DuckDB CLI is based on the SQLite command line shell, so CLI-client-specific functionality is similar to what is described in the [SQLite documentation](https://www.sqlite.org/cli.html) (although DuckDB's SQL syntax follows PostgreSQL conventions with a [few exceptions](#docs:current:sql:dialect:postgresql_compatibility)).

> DuckDB has a [tldr page](https://tldr.inbrowser.app/pages/common/duckdb), which summarizes the most common uses of the CLI client.
> If you have [tldr](https://github.com/tldr-pages/tldr) installed, you can display it by running `tldr duckdb`.

#### Getting Started {#docs:current:clients:cli:overview::getting-started}

Once the CLI executable has been downloaded, unzip it and save it to any directory.
Navigate to that directory in a terminal and enter the command `duckdb` to run the executable.
If in a PowerShell or POSIX shell environment, use the command `./duckdb` instead.

#### Usage {#docs:current:clients:cli:overview::usage}

The typical usage of the `duckdb` command is the following:

```batch
duckdb ⟨OPTIONS⟩ ⟨FILENAME⟩
```

##### Options {#docs:current:clients:cli:overview::options}

The `⟨OPTIONS⟩`{:.language-sql .highlight} part encodes [arguments for the CLI client](#docs:current:clients:cli:arguments). Common options include:

* `-csv`: sets the output mode to CSV
* `-json`: sets the output mode to JSON
* `-readonly`: open the database in read-only mode (see [concurrency in DuckDB](#docs:current:connect:concurrency::handling-concurrency))

For a full list of options, see the [command line arguments page](#docs:current:clients:cli:arguments).

##### In-Memory vs. Persistent Database {#docs:current:clients:cli:overview::in-memory-vs-persistent-database}

When no `⟨FILENAME⟩`{:.language-sql .highlight} argument is provided, the DuckDB CLI will open a temporary [in-memory database](#docs:current:connect:overview::in-memory-database).
You will see DuckDB's version number, the information on the connection and a prompt starting with a `D`.

```batch
duckdb
```

```text
DuckDB v1.5.2 ({{ site.current_duckdb_codename }}) 8a5851971f
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D
```

To open or create a [persistent database](#docs:current:connect:overview::persistent-database), simply include a path as a command line argument:

```batch
duckdb my_database.duckdb
```

##### Running SQL Statements in the CLI {#docs:current:clients:cli:overview::running-sql-statements-in-the-cli}

Once the CLI has been opened, enter a SQL statement followed by a semicolon, then hit enter and it will be executed. Results will be displayed in a table in the terminal. If a semicolon is omitted, hitting enter will allow for multi-line SQL statements to be entered.

```sql
SELECT 'quack' AS my_column;
```

| my_column |
|-----------|
| quack     |

The CLI supports all of DuckDB's rich [SQL syntax](#docs:current:sql:introduction) including `SELECT`, `CREATE` and `ALTER` statements.

##### Editor Features {#docs:current:clients:cli:overview::editor-features}

The CLI supports [autocompletion](#docs:current:clients:cli:autocomplete), and has sophisticated [editor features](#docs:current:clients:cli:editing) and [syntax highlighting](#docs:current:clients:cli:syntax_highlighting) on macOS, Linux and Windows.

##### Exiting the CLI {#docs:current:clients:cli:overview::exiting-the-cli}

To exit the CLI, press `Ctrl`+`D` if your platform supports it. Otherwise, press `Ctrl`+`C` or use the `.exit` command. If you used a persistent database, DuckDB will automatically checkpoint (save the latest edits to disk) and close. This will remove the `.wal` file (the [write-ahead log](https://en.wikipedia.org/wiki/Write-ahead_logging)) and consolidate all of your data into the single-file database.

##### Dot Commands {#docs:current:clients:cli:overview::dot-commands}

In addition to SQL syntax, special [dot commands](#docs:current:clients:cli:dot_commands) may be entered into the CLI client. To use one of these commands, begin the line with a period (` .`) immediately followed by the name of the command you wish to execute. Additional arguments to the command are entered, space separated, after the command. If an argument must contain a space, either single or double quotes may be used to wrap that parameter. Dot commands must be entered on a single line, and no whitespace may occur before the period. No semicolon is required at the end of the line.

Frequently-used configurations can be stored in the file `~/.duckdbrc`, which will be loaded when starting the CLI client. See the [Configuring the CLI](#::configuring-the-cli) section below for further information on these options.

> **Tip.** To prevent the DuckDB CLI client from reading the `~/.duckdbrc` file, start it as follows:
> ```batch
> duckdb -init /dev/null
> ```

Below, we summarize a few important dot commands. To see all available commands, see the [dot commands page](#docs:current:clients:cli:dot_commands) or use the `.help` command.

###### Opening Database Files {#docs:current:clients:cli:overview::opening-database-files}

In addition to connecting to a database when opening the CLI, a new database connection can be made by using the `.open` command. If no additional parameters are supplied, a new in-memory database connection is created. This database will not be persisted when the CLI connection is closed.

```text
.open
```

The `.open` command optionally accepts several options, but the final parameter can be used to indicate a path to a persistent database (or where one should be created). The special string `:memory:` can also be used to open a temporary in-memory database.

```text
.open persistent.duckdb
```

> **Warning.** `.open` closes the current database.
> To keep the current database, while adding a new database, use the [`ATTACH` statement](#docs:current:sql:statements:attach).

One important option accepted by `.open` is the `--readonly` flag. This disallows any editing of the database. To open in read only mode, the database must already exist. This also means that a new in-memory database can't be opened in read only mode since in-memory databases are created upon connection.

```text
.open --readonly preexisting.duckdb
```

The `--sql` option allows setting the database path using a SQL expression:

```text
.open --sql "getenv('MY_DB_PATH')"
```

###### Output Formats {#docs:current:clients:cli:overview::output-formats}

The `.mode` [dot command](#docs:current:clients:cli:dot_commands::mode) may be used to change the appearance of the tables returned in the terminal output.
These include the default `duckbox` mode, `csv` and `json` mode for ingestion by other tools, `markdown` and `latex` for documents and `insert` mode for generating SQL statements.

###### Writing Results to a File {#docs:current:clients:cli:overview::writing-results-to-a-file}

By default, the DuckDB CLI sends results to the terminal's standard output. However, this can be modified using either the `.output` or `.once` commands.
For details, see the documentation for the [output dot command](#docs:current:clients:cli:dot_commands::output-writing-results-to-a-file).

###### Reading SQL from a File {#docs:current:clients:cli:overview::reading-sql-from-a-file}

The DuckDB CLI can read both SQL commands and dot commands from an external file instead of the terminal using the `.read` command. This allows for a number of commands to be run in sequence and allows command sequences to be saved and reused.

The `.read` command requires only one argument: the path to the file containing the SQL and/or commands to execute. After running the commands in the file, control will revert back to the terminal. Output from the execution of that file is governed by the same `.output` and `.once` commands that have been discussed previously. This allows the output to be displayed back to the terminal, as in the first example below, or out to another file, as in the second example.

In this example, the file `select_example.sql` is located in the same directory as duckdb.exe and contains the following SQL statement:

```sql
SELECT *
FROM generate_series(5);
```

To execute it from the CLI, the `.read` command is used.

```text
.read select_example.sql
```

The output below is returned to the terminal by default. The formatting of the table can be adjusted using the `.output` or `.once` commands.

```text
| generate_series |
|----------------:|
| 0               |
| 1               |
| 2               |
| 3               |
| 4               |
| 5               |
```

Multiple commands, including both SQL and dot commands, can also be run in a single `.read` command. In this example, the file `write_markdown_to_file.sql` is located in the same directory as duckdb.exe and contains the following commands:

```sql
.mode markdown
.output series.md
SELECT *
FROM generate_series(5);
```

To execute it from the CLI, the `.read` command is used as before.

```text
.read write_markdown_to_file.sql
```

In this case, no output is returned to the terminal. Instead, the file `series.md` is created (or replaced if it already existed) with the markdown-formatted results shown here:

```text
| generate_series |
|----------------:|
| 0               |
| 1               |
| 2               |
| 3               |
| 4               |
| 5               |
```



#### Configuring the CLI {#docs:current:clients:cli:overview::configuring-the-cli}

Several dot commands can be used to configure the CLI.
On startup, the CLI reads and executes all commands in the file `~/.duckdbrc`, including dot commands and SQL statements.
This allows you to store the configuration state of the CLI.
You may also point to a different initialization file using the `-init` flag.

##### Setting a Custom Prompt {#docs:current:clients:cli:overview::setting-a-custom-prompt}

As an example, a file in the same directory as the DuckDB CLI named `prompt.sql` will change the DuckDB prompt to be a duck head and run a SQL statement.
Note that the duck head is built with Unicode characters and does not work in all terminal environments (e.g., in Windows, unless running with WSL and using the Windows Terminal).

```text
.prompt "{color:yellow1}{sql:select current_database()} ⚫◗ "
```

Or a simpler version without colours:
```sql
.prompt "{sql:select current_database()} ⚫◗ "
```


To invoke that file on initialization, use this command:

```batch
duckdb -init prompt.sql
```

This outputs:

```text
-- Loading resources from prompt.sql
v⟨version⟩ ⟨git_hash⟩
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
⚫◗
```

#### Non-Interactive Usage {#docs:current:clients:cli:overview::non-interactive-usage}

To read/process a file and exit immediately, redirect the file contents in to `duckdb`:

```batch
duckdb < select_example.sql
```

To execute a command with SQL text passed in directly from the command line, call `duckdb` with two arguments: the database location (or `:memory:`), and a string with the SQL statement to execute.

```batch
duckdb :memory: "SELECT 42 AS the_answer"
```

#### Loading Extensions {#docs:current:clients:cli:overview::loading-extensions}

To load extensions, use DuckDB's SQL `INSTALL` and `LOAD` commands as you would other SQL statements.

```sql
INSTALL fts;
LOAD fts;
```

For details, see the [Extension docs](#docs:current:extensions:overview).

#### Reading from stdin and Writing to stdout {#docs:current:clients:cli:overview::reading-from-stdin-and-writing-to-stdout}

When in a Unix environment, it can be useful to pipe data between multiple commands.
DuckDB is able to read data from stdin as well as write to stdout using the file location of stdin (` /dev/stdin`) and stdout (` /dev/stdout`) within SQL commands, as pipes act very similarly to file handles.

This command will create an example CSV:

```sql
COPY (SELECT 42 AS woot UNION ALL SELECT 43 AS woot) TO 'test.csv' (HEADER);
```

First, read a file and pipe it to the `duckdb` CLI executable. As arguments to the DuckDB CLI, pass in the location of the database to open, in this case, an in-memory database, and a SQL command that utilizes `/dev/stdin` as a file location.

```batch
cat test.csv | duckdb -c "SELECT * FROM read_csv('/dev/stdin')"
```

| woot |
|-----:|
| 42   |
| 43   |

To write back to stdout, the copy command can be used with the `/dev/stdout` file location.

```batch
cat test.csv | \
    duckdb -c "COPY (SELECT * FROM read_csv('/dev/stdin')) TO '/dev/stdout' WITH (FORMAT csv, HEADER)"
```

```csv
woot
42
43
```

#### Reading Environment Variables {#docs:current:clients:cli:overview::reading-environment-variables}

The `getenv` function can read environment variables.

##### Examples {#docs:current:clients:cli:overview::examples}

To retrieve the home directory's path from the `HOME` environment variable, use:

```sql
SELECT getenv('HOME') AS home;
```

|       home       |
|------------------|
| /Users/user_name |

The output of the `getenv` function can be used to set [configuration options](#docs:current:configuration:overview). For example, to set the `NULL` order based on the environment variable `DEFAULT_NULL_ORDER`, use:

```sql
SET default_null_order = getenv('DEFAULT_NULL_ORDER');
```

##### Restrictions for Reading Environment Variables {#docs:current:clients:cli:overview::restrictions-for-reading-environment-variables}

The `getenv` function can only be run when the [`enable_external_access`](#docs:current:configuration:overview::configuration-reference) option is set to `true` (the default setting).
It is only available in the CLI client and is not supported in other DuckDB clients.

#### Prepared Statements {#docs:current:clients:cli:overview::prepared-statements}

The DuckDB CLI supports executing [prepared statements](#docs:current:sql:query_syntax:prepared_statements) in addition to regular `SELECT` statements.
To create and execute a prepared statement in the CLI client, use the `PREPARE` clause and the `EXECUTE` statement.

#### Query Completion ETA {#docs:current:clients:cli:overview::query-completion-eta}

DuckDB's CLI now provides intelligent time-to-completion estimates for running queries and displays total execution time upon completion.

When executing queries in the DuckDB CLI, the progress bar displays an estimated time remaining until completion. This feature employs advanced statistical modeling ([Kalman filtering](https://en.wikipedia.org/wiki/Kalman_filter)) to deliver more accurate predictions than simple linear extrapolation.

##### How It Works {#docs:current:clients:cli:overview::how-it-works}

DuckDB calculates the estimated time to completion through the following process:

1. Progress Monitoring: DuckDB's internal progress API reports the estimated completion percentage for the running query
2. Statistical Filtering: A Kalman filter smooths noisy progress measurements and accounts for execution variability
3. Continuous Refinement: The system continuously updates predicted completion time as new progress data becomes available, improving accuracy throughout execution

The Kalman filter adapts to changing execution conditions such as memory pressure, I/O bottlenecks, or network delays. This adaptive approach means estimated completion times may not always decrease linearly—estimates can increase when query execution becomes less predictable.

##### Factors Affecting The Accuracy of Query Completion ETA {#docs:current:clients:cli:overview::factors-affecting-the-accuracy-of-query-completion-eta}

Completion time estimates may be less reliable under these conditions:

System resource constraints:

* Memory pressure causing disk swapping
* High CPU load from competing processes
* Disk I/O bottlenecks

Query execution characteristics:

* Variable execution phases (initial setup versus main processing)
* Network-dependent operations with inconsistent latency
* Queries with unpredictable branching logic
* Operations on remote data sources
* External function calls
* Highly skewed data distributions

### Command Line Arguments {#docs:current:clients:cli:arguments}

The table below summarizes DuckDB's command line options.
To list all command line options, use the command:

```batch
duckdb -help
```

For a list of dot commands available in the CLI shell, see the [Dot Commands page](#docs:current:clients:cli:dot_commands).



| Argument          | Description                                                                                                   |
| ----------------- | ------------------------------------------------------------------------------------------------------------- |
| `-append`         | Append the database to the end of the file                                                                    |
| `-ascii`          | Set [output mode](#docs:current:clients:cli:output_formats) to `ascii`                            |
| `-bail`           | Stop after hitting an error                                                                                   |
| `-batch`          | Force batch I/O                                                                                               |
| `-box`            | Set [output mode](#docs:current:clients:cli:output_formats) to `box`                              |
| `-column`         | Set [output mode](#docs:current:clients:cli:output_formats) to `column`                           |
| `-cmd COMMAND`    | Run `COMMAND` before reading `stdin`                                                                          |
| `-c COMMAND`      | Run `COMMAND` and exit                                                                                        |
| `-csv`            | Set [output mode](#docs:current:clients:cli:output_formats) to `csv`                              |
| `-echo`           | Print commands before execution                                                                               |
| `-f FILENAME`     | Run the script in `FILENAME` and exit. Note that the `~/.duckdbrc` is read and executed first (if it exists)  |
| `-init FILENAME`  | Run the script in `FILENAME` upon startup (instead of `~/.duckdbrc`)                                          |
| `-header`         | Turn headers on                                                                                               |
| `-help`           | Show this message                                                                                             |
| `-html`           | Set [output mode](#docs:current:clients:cli:output_formats) to HTML                               |
| `-interactive`    | Force interactive I/O                                                                                         |
| `-json`           | Set [output mode](#docs:current:clients:cli:output_formats) to `json`                             |
| `-line`           | Set [output mode](#docs:current:clients:cli:output_formats) to `line`                             |
| `-list`           | Set [output mode](#docs:current:clients:cli:output_formats) to `list`                             |
| `-markdown`       | Set [output mode](#docs:current:clients:cli:output_formats) to `markdown`                         |
| `-newline SEP`    | Set output row separator. Default: `\n`                                                                       |
| `-nofollow`       | Refuse to open symbolic links to database files                                                               |
| `-noheader`       | Turn headers off                                                                                              |
| `-no-stdin`       | Exit after processing options instead of reading stdin                                                        |
| `-nullvalue TEXT` | Set text string for `NULL` values. Default: `NULL`                                                            |
| `-quote`          | Set [output mode](#docs:current:clients:cli:output_formats) to `quote`                            |
| `-readonly`       | Open the database read-only. This option also supports attaching to remote databases via HTTPS                                                                                   |
| `-s COMMAND`      | Run `COMMAND` and exit                                                                                        |
| `-separator SEP`  | Set output column separator to `SEP`. Default: `|`                                                            |
| `-storage-version VER` | Database storage compatibility version to use.                                                           |
| `-table`          | Set [output mode](#docs:current:clients:cli:output_formats) to `table`                            |
| `-ui`             | Loads and starts the [DuckDB UI](#docs:current:core_extensions:ui). If the UI is not yet installed, it installs the `ui` extension |
| `-unsigned`       | Allow loading of [unsigned extensions](#docs:current:extensions:overview::unsigned-extensions). This option is intended to be used for developing extensions. Consult the [Securing DuckDB page](#docs:current:operations_manual:securing_duckdb:securing_extensions) for guidelines on how to set up DuckDB in a secure manner |
| `-version`        | Show DuckDB version                                                                                           |



#### Passing a Sequence of Arguments {#docs:current:clients:cli:arguments::passing-a-sequence-of-arguments}

Note that the CLI arguments are processed in order, similarly to the behavior of the SQLite CLI.
For example:

```batch
duckdb -csv -c 'SELECT 42 AS hello' -json -c 'SELECT 84 AS world'
```

Returns the following:

```text
hello
42
[{"world":84}]
```

### Dot Commands {#docs:current:clients:cli:dot_commands}

Dot commands are available in the DuckDB CLI client. To use one of these commands, begin the line with a period (` .`) immediately followed by the name of the command you wish to execute. Additional arguments to the command are entered, space separated, after the command. If an argument must contain a space, either single or double quotes may be used to wrap that parameter. Dot commands must be entered on a single line, and no whitespace may occur before the period. No semicolon is required at the end of the line. To see available commands, use the `.help` command.

#### List of Dot Commands {#docs:current:clients:cli:dot_commands::list-of-dot-commands}



| Command                                                               | Description                                                                                                                                                                |
| --------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `.bail ⟨on/off⟩`{:.language-sql .highlight}                           | Stop after hitting an error. Default: `off`                                                                                                                                |
| `.binary ⟨on/off⟩`{:.language-sql .highlight}                         | Turn binary output `on` or `off`. Default: `off`                                                                                                                           |
| `.cd ⟨DIRECTORY⟩`{:.language-sql .highlight}                          | Change the working directory to `DIRECTORY`                                                                                                                                |
| `.changes ⟨on/off⟩`{:.language-sql .highlight}                        | Show number of rows changed by SQL                                                                                                                                         |
| `.columns`{:.language-sql .highlight}                                 | Column-wise rendering of query results                                                                                                                                     |
| `.constant ⟨COLOR⟩`{:.language-sql .highlight}                        | Sets the syntax highlighting color used for constant values                                                                                                                |
| `.constantcode ⟨CODE⟩`{:.language-sql .highlight}                     | Sets the syntax highlighting terminal code used for constant values                                                                                                        |
| `.databases`{:.language-sql .highlight}                               | List names and files of attached databases                                                                                                                                 |
| `.dump ⟨TABLE⟩`{:.language-sql .highlight}                            | Render database content as SQL. `TABLE` is a [`LIKE` pattern](#docs:current:sql:functions:pattern_matching) for the tables to dump                            |
| `.echo ⟨on/off⟩`{:.language-sql .highlight}                           | Turn command echo `on` or `off`                                                                                                                                            |
| `.exit ⟨CODE⟩`{:.language-sql .highlight}                             | Exit this program with return-code `CODE`                                                                                                                                  |
| `.headers ⟨on/off⟩`{:.language-sql .highlight}                        | Turn display of headers `on` or `off`. Does not apply to duckbox mode                                                                                                      |
| `.help ⟨-all⟩ ⟨PATTERN⟩`{:.language-sql .highlight}                   | Show help text for `PATTERN`. Use `.help shortcuts` to display keyboard shortcuts                                                                                          |
| `.highlight ⟨on/off⟩`{:.language-sql .highlight}                      | Toggle syntax highlighting in the shell `on` / `off`. See the [query syntax highlighting section](#::configuring-the-query-syntax-highlighter) for more details              |
| `.highlight_colors ⟨COMPONENT⟩ ⟨COLOR⟩`{:.language-sql .highlight}    | Configure the color of each component in (duckbox only). See the [result syntax highlighting section](#::configuring-the-query-syntax-highlighter) for more details          |
| `.highlight_mode ⟨mixed/dark/light⟩`{:.language-sql .highlight}       | Toggle the highlight mode. See the [dark/light mode section](#docs:current:clients:cli:friendly_cli::darklight-mode) for details                               |
| `.highlight_results ⟨on/off⟩`{:.language-sql .highlight}              | Toggle highlighting in result tables `on` / `off` (duckbox only). See the [result syntax highlighting section](#::configuring-the-query-syntax-highlighter) for more details |
| `.import ⟨FILE⟩ ⟨TABLE⟩`{:.language-sql .highlight}                   | Import data from `FILE` into `TABLE`. Supports `--csv`, `--json`, `--parquet` options                                                                                     |
| `.indexes ⟨TABLE⟩`{:.language-sql .highlight}                         | Show names of indexes                                                                                                                                                      |
| `.keyword ⟨COLOR⟩`{:.language-sql .highlight}                         | Sets the syntax highlighting color used for keywords                                                                                                                       |
| `.keywordcode ⟨CODE⟩`{:.language-sql .highlight}                      | Sets the syntax highlighting terminal code used for keywords                                                                                                               |
| `.large_number_rendering ⟨all/footer/off⟩`{:.language-sql .highlight} | Toggle readable rendering of large numbers (duckbox only, default: `footer`)                                                                                               |
| `.last`{:.language-sql .highlight}                                    | Render the last result without truncating. Useful for navigating with the pager                                                                                            |
| `.log ⟨FILE/off⟩`{:.language-sql .highlight}                          | Turn logging `on` or `off`. `FILE` can be `stderr` / `stdout`                                                                                                              |
| `.maxrows ⟨COUNT⟩`{:.language-sql .highlight}                         | Sets the maximum number of rows for display. Only for [duckbox mode](#docs:current:clients:cli:output_formats)                                                |
| `.maxwidth ⟨COUNT⟩`{:.language-sql .highlight}                        | Sets the maximum width in characters. 0 defaults to terminal width. Only for [duckbox mode](#docs:current:clients:cli:output_formats)                         |
| `.mode ⟨MODE⟩ ⟨TABLE⟩`{:.language-sql .highlight}                     | Set [output mode](#docs:current:clients:cli:output_formats)                                                                                                   |
| `.multiline`{:.language-sql .highlight}                               | Set multi-line mode (default)                                                                                                                                              |
| `.nullvalue ⟨STRING⟩`{:.language-sql .highlight}                      | Use `STRING` in place of `NULL` values. Default: `NULL`                                                                                                                    |
| `.once ⟨OPTIONS⟩ ⟨FILE⟩`{:.language-sql .highlight}                   | Output for the next SQL command only to `FILE`                                                                                                                             |
| `.open ⟨OPTIONS⟩ ⟨FILE⟩`{:.language-sql .highlight}                   | Close existing database and reopen `FILE`. Options: `--new`, `--nofollow`, `--readonly`, `--sql`                                                                           |
| `.output ⟨FILE⟩`{:.language-sql .highlight}                           | Send output to `FILE` or `stdout` if `FILE` is omitted                                                                                                                     |
| `.pager ⟨OPTIONS⟩`{:.language-sql .highlight}                         | Control pager usage for output. See the [paging section](#docs:current:clients:cli:output_formats::paging) for details                                         |
| `.print ⟨STRING...⟩`{:.language-sql .highlight}                       | Print literal `STRING`                                                                                                                                                     |
| `.progress_bar ⟨COMPONENT⟩ `{:.language-sql .highlight}               | Set the progress bar component styles                                                                                                                                      |
| `.prompt ⟨OPTIONS⟩ ⟨CONTINUE⟩`{:.language-sql .highlight}             | Replace the standard prompts                                                                                                                                               |
| `.quit`{:.language-sql .highlight}                                    | Exit this program                                                                                                                                                          |
| `.read ⟨FILE⟩`{:.language-sql .highlight}                             | Read input from `FILE`                                                                                                                                                     |
| `.rows`{:.language-sql .highlight}                                    | Row-wise rendering of query results (default)                                                                                                                              |
| `.safe_mode`{:.language-sql .highlight}                               | Activates [safe mode](#docs:current:clients:cli:safe_mode)                                                                                                    |
| `.schema ⟨PATTERN⟩`{:.language-sql .highlight}                        | Show the `CREATE` statements matching `PATTERN`                                                                                                                            |
| `.separator ⟨COL⟩ ⟨ROW⟩`{:.language-sql .highlight}                   | Change the column and row separators                                                                                                                                       |
| `.shell ⟨CMD⟩ ⟨ARGS...⟩`{:.language-sql .highlight}                   | Run `CMD` with `ARGS...` in a system shell                                                                                                                                 |
| `.show`{:.language-sql .highlight}                                    | Show the current values for various settings                                                                                                                               |
| `.singleline`{:.language-sql .highlight}                              | Set single-line mode                                                                                                                                                       |
| `.startup_text ⟨none/version/all⟩`{:.language-sql .highlight}         | Controls the start-up text displayed when launching the CLI. Set this as the first line in `~/.duckdbrc`                                                                   |
| `.system ⟨CMD⟩ ⟨ARGS...⟩`{:.language-sql .highlight}                  | Run `CMD` with `ARGS...` in a system shell                                                                                                                                 |
| `.tables ⟨TABLE⟩`{:.language-sql .highlight}                          | List tables [matching `LIKE` pattern](#docs:current:sql:functions:pattern_matching) `TABLE` with column names, types and row counts, grouped by database and schema |
| `.timer ⟨on/off⟩`{:.language-sql .highlight}                          | Turn SQL timer `on` or `off`. SQL statements separated by `;` but _not_ separated via newline are measured together                                                        |
| `.width ⟨NUM1⟩ ⟨NUM2⟩ ...`{:.language-sql .highlight}                 | Set minimum column widths for columnar output                                                                                                                              |

#### Using the `.help` Command {#docs:current:clients:cli:dot_commands::using-the-help-command}

The `.help` text may be filtered by passing in a text string as the first argument.

```sql
.help m
```

```sql
.maxrows COUNT           Sets the maximum number of rows for display (default: 40). Only for duckbox mode.
.maxwidth COUNT          Sets the maximum width in characters. 0 defaults to terminal width. Only for duckbox mode.
.mode MODE ?TABLE?       Set output mode
```

#### `.output`: Writing Results to a File {#docs:current:clients:cli:dot_commands::output-writing-results-to-a-file}

By default, the DuckDB CLI sends results to the terminal's standard output. However, this can be modified using either the `.output` or `.once` commands. Pass in the desired output file location as a parameter. The `.once` command will only output the next set of results and then revert to standard out, but `.output` will redirect all subsequent output to that file location. Note that each result will overwrite the entire file at that destination. To revert back to standard output, enter `.output` with no file parameter.

In this example, the output format is changed to `markdown`, the destination is identified as a Markdown file, and then DuckDB will write the output of the SQL statement to that file. Output is then reverted to standard output using `.output` with no parameter.

```sql
.mode markdown
.output my_results.md
SELECT 'taking flight' AS output_column;
.output
SELECT 'back to the terminal' AS displayed_column;
```

The file `my_results.md` will then contain:

```text
| output_column |
| ------------- |
| taking flight |
```

The terminal will then display:

```text
| displayed_column     |
| -------------------- |
| back to the terminal |
```

A common output format is CSV, or comma separated values. DuckDB supports [SQL syntax to export data as CSV or Parquet](#docs:current:sql:statements:copy::copy-to), but the CLI-specific commands may be used to write a CSV instead if desired.

```sql
.mode csv
.once my_output_file.csv
SELECT 1 AS col_1, 2 AS col_2
UNION ALL
SELECT 10 AS col1, 20 AS col_2;
```

The file `my_output_file.csv` will then contain:

```csv
col_1,col_2
1,2
10,20
```

By passing special options (flags) to the `.once` command, query results can also be sent to a temporary file and automatically opened in the user's default program. Use either the `-e` flag for a text file (opened in the default text editor), or the `-x` flag for a CSV file (opened in the default spreadsheet editor). This is useful for more detailed inspection of query results, especially if there is a relatively large result set. The `.excel` command is equivalent to `.once -x`.

```sql
.once -e
SELECT 'quack' AS hello;
```

The results then open in the default text file editor of the system, for example:

![](../images/cli_docs_output_to_text_editor.jpg)


> **Tip.** macOS users can copy the results to their clipboards using [`pbcopy`](https://ss64.com/mac/pbcopy.html) by using `.once` to output to `pbcopy` via a pipe: `.once |pbcopy`
>
> Combining this with the `.headers off` and `.mode lines` options can be particularly effective.

#### Querying the Database Schema {#docs:current:clients:cli:dot_commands::querying-the-database-schema}

All DuckDB clients support [querying the database schema with SQL](#docs:current:sql:meta:information_schema), but the CLI has additional [dot commands](#docs:current:clients:cli:dot_commands) that can make it easier to understand the contents of a database.
The `.tables` command will return a list of tables in the database. It has an optional argument that will filter the results according to a [`LIKE` pattern](#docs:current:sql:functions:pattern_matching::like).

```sql
CREATE TABLE swimmers AS SELECT 'duck' AS animal;
CREATE TABLE fliers AS SELECT 'duck' AS animal;
CREATE TABLE walkers AS SELECT 'duck' AS animal;
.tables
```

```sql
fliers    swimmers  walkers
```

For example, to filter to only tables that contain an `l`, use the `LIKE` pattern `%l%`.

```sql
.tables %l%
```

```sql
fliers   walkers
```

The `.schema` command will show all of the SQL statements used to define the schema of the database.

```sql
.schema
```

```sql
CREATE TABLE fliers (animal VARCHAR);
CREATE TABLE swimmers (animal VARCHAR);
CREATE TABLE walkers (animal VARCHAR);
```

#### Dumping Database Content as SQL {#docs:current:clients:cli:dot_commands::dumping-database-content-as-sql}

The `.dump` command renders the database content as SQL statements, including both schema definitions and data. This is useful for creating backups or migrating data.

```sql
.dump
```

An optional `TABLE` argument filters the output using a [`LIKE` pattern](#docs:current:sql:functions:pattern_matching::like). Multiple patterns can be provided as additional arguments.

```sql
.dump %swim%
```

The `--newlines` option allows unescaped newline characters in the output:

```sql
.dump --newlines
```

#### Progress Bar {#docs:current:clients:cli:dot_commands::progress-bar}

The DuckDB CLI client's progress bar supports customization through components.

The `.progress_bar` command supports `--add` and `--clear` parameters for adding and removing components. 

For details on specific usage, see the examples below.

##### Configuring the Progress Bar Display {#docs:current:clients:cli:dot_commands::configuring-the-progress-bar-display}

To check if the progress bar is enabled: 

```sql
SELECT * FROM duckdb_settings() WHERE name = 'enable_progress_bar';
```

To check the current minimum amount of time (in milliseconds) a query needs to take before displaying a progress bar: 

```sql
SELECT * FROM duckdb_settings() WHERE name = 'progress_bar_time';
```

To set the minimum amount of time that the progress bar displays to 100 milliseconds:

```sql
SET progress_bar_time = 100;
```

To set that progress bar component to a red text that displays the current time on the progress bar:

```sql
.progress_bar --add "{align:right}{min_size:20}{color:red}Time: {sql:select (current_time::varchar).split('.')[1]}{color:reset} "
```

![](../images/progress_bar/duckdb_progressbar_time.gif)



> `.progress_bar --add` commands are additive, issuing multiple `--add` calls will stack additional components on the progress bar.

To set that progress bar component to a blue text that displays the file cache RAM usage on the progress bar:

```sql
.progress_bar --add "{align:right}{min_size:20}{color:blue}External Cache Usage: {sql:select format_bytes(memory_usage_bytes) from duckdb_memory() where tag='EXTERNAL_FILE_CACHE'}{color:reset};
```

![](../images/progress_bar/duckdb_progressbar_cache_usage.gif)


To reset all existing progress bar components:

```sql
.progress_bar --clear
```

#### Syntax Highlighters {#docs:current:clients:cli:dot_commands::syntax-highlighters}

The DuckDB CLI client has a syntax highlighter for the SQL queries and another for the duckbox-formatted result tables.

##### Configuring the Query Syntax Highlighter {#docs:current:clients:cli:dot_commands::configuring-the-query-syntax-highlighter}

By default the shell includes support for syntax highlighting.
The CLI's syntax highlighter can be configured using the following commands.

To turn off the highlighter:

```sql
.highlight off
```

To turn on the highlighter:

```sql
.highlight on
```

To configure the color used to highlight constants:

```sql
.constant [red|green|yellow|blue|magenta|cyan|white|brightblack|brightred|brightgreen|brightyellow|brightblue|brightmagenta|brightcyan|brightwhite]
```

```sql
.constantcode ⟨terminal_code⟩
```

For example:

```sql
.constantcode 033[31m
```

To configure the color used to highlight keywords:

```sql
.keyword [red|green|yellow|blue|magenta|cyan|white|brightblack|brightred|brightgreen|brightyellow|brightblue|brightmagenta|brightcyan|brightwhite]
```

```sql
.keywordcode ⟨terminal_code⟩
```

For example:

```sql
.keywordcode 033[31m
```

##### Configuring the Result Syntax Highlighter {#docs:current:clients:cli:dot_commands::configuring-the-result-syntax-highlighter}

By default, the result highlighting makes a few small modifications:

* Bold column names.
* `NULL` values are greyed out.
* Layout elements are grayed out.

The highlighting of each of the components can be customized using the `.highlight_colors` command.
For example:

```sql
.highlight_colors layout red
.highlight_colors column_type yellow
.highlight_colors column_name yellow bold_underline
.highlight_colors numeric_value cyan underline
.highlight_colors temporal_value red bold
.highlight_colors string_value green bold
.highlight_colors footer gray
```

The result highlighting can be disabled using `.highlight_results off`.

#### Shorthands {#docs:current:clients:cli:dot_commands::shorthands}

DuckDB's CLI allows using shorthands for dot commands.
Once a sequence of characters can be unambiguously completed to a dot command or an argument, the CLI (silently) autocompletes them.
For example:

```sql
.mo ma
```

Is equivalent to:

```sql
.mode markdown
```

> **Tip.** Avoid using shorthands in SQL scripts to improve readability and ensure that the scripts are future-proof.

#### Importing Data {#docs:current:clients:cli:dot_commands::importing-data}

The `.import` command imports data from a file into a DuckDB table. It uses DuckDB's reader functions (` read_csv`, `read_json`, `read_parquet`) and supports automatic schema detection. If the target table does not exist, it is automatically created.

The file format can be specified explicitly using `--csv`, `--json`, or `--parquet`. If no format is specified, the format is inferred from the file extension.

```sql
.import data.csv my_table
```

Additional parameters can be passed to the underlying reader function using `--⟨parameter⟩ ⟨value⟩{:.language-sql .highlight} syntax:

```sql
.import data.csv my_table --delimiter "|" --header false
```

To import a JSON file:

```sql
.import data.json my_table --json
```

To import a Parquet file:

```sql
.import data.parquet my_table
```

### Output Formats {#docs:current:clients:cli:output_formats}

The `.mode` [dot command](#docs:current:clients:cli:dot_commands) may be used to change the appearance of the tables returned in the terminal output. In addition to customizing the appearance, these modes have additional benefits. This can be useful for presenting DuckDB output elsewhere by redirecting the terminal [output to a file](#docs:current:clients:cli:dot_commands::output-writing-results-to-a-file). Using the `insert` mode will build a series of SQL statements that can be used to insert the data at a later point.
The `markdown` mode is particularly useful for building documentation and the `latex` mode is useful for writing academic papers.

> **Warning.** Unicode handling in Windows Terminal
> 
> When long results are displayed in Windows Terminal the [more system utility](https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/more)
> is used by default to provide the scrolling through the results. This utility has incomplete support of Unicode,
> depending on the output data, in some cases it can display Unicode characters in garbled form.
> 
> We suggest using the [third-party less utility](https://en.wikipedia.org/wiki/Less_(Unix)) instead,
> that is installed by default along with the [Git for Windows](https://git-scm.com/install/windows) installation.
> It can be enabled the following way:
>
> ```sql
> .pager '"C:\Program Files\Git\usr\bin\less.exe" -R'
> ```

#### List of Output Formats {#docs:current:clients:cli:output_formats::list-of-output-formats}



| Mode                                        | Description                                                    |
| ------------------------------------------- | -------------------------------------------------------------- |
| `ascii`                                     | Columns/rows delimited by 0x1F and 0x1E                        |
| `box`                                       | Tables using unicode box-drawing characters                    |
| `csv`                                       | Comma-separated values                                         |
| `column`                                    | Output in columns (See `.width`)                               |
| `duckbox`                                   | Tables with extensive features (default)                       |
| `html`                                      | HTML `<table>` code                                            |
| `insert ⟨TABLE⟩`{:.language-sql .highlight} | SQL insert statements for `⟨TABLE⟩`{:.language-sql .highlight} |
| `json`                                      | Results in a JSON array                                        |
| `jsonlines`                                 | Results in a NDJSON                                            |
| `latex`                                     | LaTeX tabular environment code                                 |
| `line`                                      | One value per line                                             |
| `list`                                      | Values delimited by `|`                                        |
| `markdown`                                  | Markdown table format                                          |
| `quote`                                     | Escape answers as for SQL                                      |
| `table`                                     | ASCII-art table                                                |
| `tabs`                                      | Tab-separated values                                           |
| `tcl`                                       | TCL list elements                                              |
| `trash`                                     | No output                                                      |



#### Changing the Output Format {#docs:current:clients:cli:output_formats::changing-the-output-format}

Use the vanilla `.mode` dot command to query the appearance currently in use.

```sql
.mode
```

```text
current output mode: duckbox
```

Use the `.mode` dot command with an argument to set the output format.

```sql
.mode markdown
SELECT 'quacking intensifies' AS incoming_ducks;
```

```text
|    incoming_ducks    |
|----------------------|
| quacking intensifies |
```

The output appearance can also be adjusted with the `.separator` command. If using an export mode that relies on a separator (` csv` or `tabs` for example), the separator will be reset when the mode is changed. For example, `.mode csv` will set the separator to a comma (` ,`). Using `.separator "|"` will then convert the output to be pipe-separated.

```sql
.mode csv
SELECT 1 AS col_1, 2 AS col_2
UNION ALL
SELECT 10 AS col1, 20 AS col_2;
```

```csv
col_1,col_2
1,2
10,20
```

```sql
.separator "|"
SELECT 1 AS col_1, 2 AS col_2
UNION ALL
SELECT 10 AS col1, 20 AS col_2;
```

```csv
col_1|col_2
1|2
10|20
```

#### Paging {#docs:current:clients:cli:output_formats::paging}

The CLI supports paging for large result sets using the `.pager` command. When enabled, results that exceed the terminal size are displayed in a pager (such as `less`) for easier navigation.

The pager has three modes:

* `automatic` (default) – The pager is triggered when the result exceeds the row or column threshold.
* `on` – The pager is always used for output.
* `off` – The pager is disabled.

```sql
.pager on
```

```sql
.pager off
```

```sql
.pager automatic
```

In automatic mode, the thresholds for triggering the pager can be configured:

```sql
.pager set_row_threshold 50
.pager set_column_threshold 5
```

A custom pager command can be set by passing it as an argument:

```sql
.pager less -RS
```

The default pager command can also be configured via the `DUCKDB_PAGER` or `PAGER` environment variables.

#### `duckbox` Mode {#docs:current:clients:cli:output_formats::duckbox-mode}

By default, DuckDB renders query results in `duckbox` mode, which is a feature-rich ASCII-art style output format.

The duckbox mode supports the `large_number_rendering` option, which allows human-readable rendering of large numbers. It has three levels:

- `off` – All numbers are printed using regular formatting.
- `footer` (default) – Large numbers are augmented with the human-readable format. Only applies to single-row results.
- `all` - All large numbers are replaced with the human-readable format.

See the following examples:

```sql
.large_number_rendering off
SELECT pi() * 1_000_000_000 AS x;
```

```text
┌───────────────────┐
│         x         │
│      double       │
├───────────────────┤
│ 3141592653.589793 │
└───────────────────┘
```

```sql
.large_number_rendering footer
SELECT pi() * 1_000_000_000 AS x;
```

```text
┌───────────────────┐
│         x         │
│      double       │
├───────────────────┤
│ 3141592653.589793 │
│  (3.14 billion)   │
└───────────────────┘
```

```sql
.large_number_rendering all
SELECT pi() * 1_000_000_000 AS x;
```

```text
┌──────────────┐
│      x       │
│    double    │
├──────────────┤
│ 3.14 billion │
└──────────────┘
```

### Editing {#docs:current:clients:cli:editing}

> The linenoise-based CLI editor is available for macOS, Linux and Windows.

DuckDB's CLI uses a line-editing library based on [linenoise](https://github.com/antirez/linenoise), which has shortcuts that are based on [Emacs mode of readline](https://readline.kablamo.org/emacs.html). Below is a list of available commands. You can also view these shortcuts from within the CLI using `.help shortcuts`.

#### Moving {#docs:current:clients:cli:editing::moving}

| Key            | Action                                                                 |
| -------------- | ---------------------------------------------------------------------- |
| `Left`         | Move back a character                                                  |
| `Right`        | Move forward a character                                               |
| `Up`           | Move up a line. When on the first line, move to previous history entry |
| `Down`         | Move down a line. When on last line, move to next history entry        |
| `Home`         | Move to beginning of buffer                                            |
| `End`          | Move to end of buffer                                                  |
| `Ctrl`+`Left`  | Move back a word                                                       |
| `Ctrl`+`Right` | Move forward a word                                                    |
| `Ctrl`+`A`     | Move to beginning of buffer                                            |
| `Ctrl`+`B`     | Move back a character                                                  |
| `Ctrl`+`E`     | Move to end of buffer                                                  |
| `Ctrl`+`F`     | Move forward a character                                               |
| `Alt`+`Left`   | Move back a word                                                       |
| `Alt`+`Right`  | Move forward a word                                                    |

#### History {#docs:current:clients:cli:editing::history}

| Key        | Action                         |
| ---------- | ------------------------------ |
| `Ctrl`+`P` | Move to previous history entry |
| `Ctrl`+`N` | Move to next history entry     |
| `Ctrl`+`R` | Search the history             |
| `Ctrl`+`S` | Search the history             |
| `Alt`+`<`  | Move to first history entry    |
| `Alt`+`>`  | Move to last history entry     |
| `Alt`+`N`  | Search the history             |
| `Alt`+`P`  | Search the history             |

#### Changing Text {#docs:current:clients:cli:editing::changing-text}

| Key               | Action                                                   |
| ----------------- | -------------------------------------------------------- |
| `Backspace`       | Delete previous character                                |
| `Delete`          | Delete next character                                    |
| `Ctrl`+`D`        | Delete next character. When buffer is empty, end editing |
| `Ctrl`+`H`        | Delete previous character                                |
| `Ctrl`+`K`        | Delete everything after the cursor                       |
| `Ctrl`+`T`        | Swap current and next character                          |
| `Ctrl`+`U`        | Delete all text                                          |
| `Ctrl`+`W`        | Delete previous word                                     |
| `Alt`+`C`         | Convert next word to titlecase                           |
| `Alt`+`D`         | Delete next word                                         |
| `Alt`+`L`         | Convert next word to lowercase                           |
| `Alt`+`R`         | Delete all text                                          |
| `Alt`+`T`         | Swap current and next word                               |
| `Alt`+`U`         | Convert next word to uppercase                           |
| `Alt`+`Backspace` | Delete previous word                                     |
| `Alt`+`\`         | Delete spaces around cursor                              |

#### Completing {#docs:current:clients:cli:editing::completing}

| Key           | Action                                                 |
| ------------- | ------------------------------------------------------ |
| `Tab`         | Autocomplete. When autocompleting, cycle to next entry |
| `Shift`+`Tab` | When autocompleting, cycle to previous entry           |
| `Esc`+`Esc`   | When autocompleting, revert autocompletion             |

#### Miscellaneous {#docs:current:clients:cli:editing::miscellaneous}

| Key                    | Action                                                                             |
| ---------------------- | ---------------------------------------------------------------------------------- |
| `Enter`                | Execute query. If query is not complete, insert a newline at the end of the buffer |
| `Ctrl`+`J`             | Execute query. If query is not complete, insert a newline at the end of the buffer |
| `Ctrl`+`C`             | Cancel editing of current query                                                    |
| `Ctrl`+`G`             | Cancel editing of current query                                                    |
| `Ctrl`+`L`             | Clear screen                                                                       |
| `Ctrl`+`O`             | Cancel editing of current query                                                    |
| `Ctrl`+`X`             | Insert a newline after the cursor                                                  |
| `Ctrl`+`Q`, then click | Move cursor to mouse click position                                                |
| `Ctrl`+`Z`             | Suspend CLI and return to shell, use `fg` to re-open                               |

#### External Editor Mode {#docs:current:clients:cli:editing::external-editor-mode}

Use `.edit` or `\e` to open a query in an external text editor.

* When entered alone, it opens the previous command for editing.
* When used inside a multi-line command, it opens the current command in the editor.

The editor is taken from the first set environment variable among `DUCKDB_EDITOR`, `EDITOR` or `VISUAL` (in that order). If none are set, `vi` is used.

#### Using Read-Line {#docs:current:clients:cli:editing::using-read-line}

If you prefer, you can use [`rlwrap`](https://github.com/hanslub42/rlwrap) to use read-line directly with the shell. Then, use `Shift`+`Enter` to insert a newline and `Enter` to execute the query:

```batch
rlwrap --substitute-prompt="D " duckdb -batch
```

### Friendly CLI {#docs:current:clients:cli:friendly_cli}

Along with our [Friendly SQL](#docs:current:sql:dialect:friendly_sql), we provide
**friendly CLI** features.

#### Dark/Light Mode {#docs:current:clients:cli:friendly_cli::darklight-mode}

The CLI automatically detects whether the terminal is using a dark or light background and adjusts syntax highlighting colors accordingly. The mode can also be set manually using the `.highlight_mode` command:

```sql
.highlight_mode dark
```

```sql
.highlight_mode light
```

To use a mix of colors suitable for both dark and light backgrounds:

```sql
.highlight_mode mixed
```

#### 8-Bit Colors {#docs:current:clients:cli:friendly_cli::8-bit-colors}

Since DuckDB v1.5, the CLI supports 8-bit colors corresponding to [Xterm system colors](https://www.ditig.com/256-colors-cheat-sheet#xterm-system-colors):

```.sql
.display_colors
```

```text
darkred1 red darkred2 red3 red4 red1 brightred indianred1 ...
```

#### Dynamic Prompt {#docs:current:clients:cli:friendly_cli::dynamic-prompt}

The default prompts are the following:

```text
-- macOS / Linux
{max_length:40}{color:38,5,208}{color:bold}{setting:current_database_and_schema}{color:reset} D 
-- Windows
{max_length:40}{color:green}{color:bold}{setting:current_database_and_schema}{color:reset} D 
```

#### Return the Result of the Last Query Using `_` {#docs:current:clients:cli:friendly_cli::return-the-result-of-the-last-query-using-_}

You can use the `_` (underscore) table to query the result of the last query:

```sql
SELECT 42 AS x;
```
```text
┌───────┐
│   x   │
│ int32 │
├───────┤
│    42 │
└───────┘
```
```sql
FROM _;
```
```text
┌───────┐
│   x   │
│ int32 │
├───────┤
│    42 │
└───────┘
```

If the last query did not return a result (e.g., because it performed an update operation), the CLI throws an error:

```console
Binder Error:
Failed to query last result "_": no result available
```

### Safe Mode {#docs:current:clients:cli:safe_mode}

The DuckDB CLI client supports “safe mode”.
In safe mode, the CLI is prevented from accessing external files other than the database file that it was initially connected to and prevented from interacting with the host file system.

This has the following effects:

* The following [dot commands](#docs:current:clients:cli:dot_commands) are disabled:
    * `.cd`
    * `.excel`
    * `.import`
    * `.log`
    * `.once`
    * `.open`
    * `.output`
    * `.read`
    * `.sh`
    * `.system`
* Auto-complete no longer scans the file system for files to suggest as auto-complete targets.
* The [`getenv` function](#docs:current:clients:cli:overview::reading-environment-variables) is disabled.
* The [`enable_external_access` option](#docs:current:configuration:overview::configuration-reference) is set to `false`. This implies that:
    * `ATTACH` cannot attach to a database in a file.
    * `COPY` cannot read to or write from files.
    * Functions such as `read_csv`, `read_parquet`, `read_json`, etc. cannot read from an external source.

Once safe mode is activated, it cannot be deactivated in the same DuckDB CLI session.

For more information on running DuckDB in secure environments, see the [“Securing DuckDB” page](#docs:current:operations_manual:securing_duckdb:overview).

### Autocomplete {#docs:current:clients:cli:autocomplete}

The shell offers context-aware autocomplete of SQL queries through the [`autocomplete` extension](#docs:current:core_extensions:autocomplete). autocomplete is triggered by pressing `Tab`.

Multiple autocomplete suggestions can be present. You can cycle forwards through the suggestions by repeatedly pressing `Tab`, or `Shift+Tab` to cycle backwards. autocompletion can be reverted by pressing `ESC` twice.

The shell autocompletes four different groups:

* Keywords
* Table names and table functions
* Column names and scalar functions
* File names

The shell looks at the position in the SQL statement to determine which of these autocompletions to trigger. For example:

```sql
SELECT s
```

```text
student_id
```

```sql
SELECT student_id F
```

```text
FROM
```

```sql
SELECT student_id FROM g
```

```text
grades
```

```sql
SELECT student_id FROM 'd
```

```text
'data/
```

```sql
SELECT student_id FROM 'data/
```

```text
'data/grades.csv
```

### Syntax Highlighting {#docs:current:clients:cli:syntax_highlighting}

> Syntax highlighting in the CLI is currently only available for macOS and Linux.

SQL queries that are written in the shell are automatically highlighted using syntax highlighting.

![Image showing syntax highlighting in the shell](../images/syntax_highlighting_screenshot.png)

There are several components of a query that are highlighted in different colors. The colors can be configured using [dot commands](#docs:current:clients:cli:dot_commands).
Syntax highlighting can also be disabled entirely using the `.highlight off` command.

Below is a list of components that can be configured.

|          Type           |   Command   |  Default color  |
|-------------------------|-------------|-----------------|
| Keywords                | `.keyword`  | `green`         |
| Constants and literals  | `.constant` | `yellow`        |
| Comments                | `.comment`  | `brightblack`   |
| Errors                  | `.error`    | `red`           |
| Continuation            | `.cont`     | `brightblack`   |
| Continuation (Selected) | `.cont_sel` | `green`         |

The components can be configured using either a supported color name (e.g., `.keyword red`), or by directly providing a terminal code to use for rendering (e.g., `.keywordcode \033[31m`). Below is a list of supported color names and their corresponding terminal codes.

|     Color     | Terminal code |
|---------------|---------------|
| red           | `\033[31m`    |
| green         | `\033[32m`    |
| yellow        | `\033[33m`    |
| blue          | `\033[34m`    |
| magenta       | `\033[35m`    |
| cyan          | `\033[36m`    |
| white         | `\033[37m`    |
| brightblack   | `\033[90m`    |
| brightred     | `\033[91m`    |
| brightgreen   | `\033[92m`    |
| brightyellow  | `\033[93m`    |
| brightblue    | `\033[94m`    |
| brightmagenta | `\033[95m`    |
| brightcyan    | `\033[96m`    |
| brightwhite   | `\033[97m`    |

For example, here is an alternative set of syntax highlighting colors:

```text
.keyword brightred
.constant brightwhite
.comment cyan
.error yellow
.cont blue
.cont_sel brightblue
```

If you wish to start up the CLI with a different set of colors every time, you can place these commands in the `~/.duckdbrc` file that is loaded on start-up of the CLI.

#### Error Highlighting {#docs:current:clients:cli:syntax_highlighting::error-highlighting}

The shell has support for highlighting certain errors. In particular, mismatched brackets and unclosed quotes are highlighted in red (or another color if specified). This highlighting is automatically disabled for large queries. In addition, it can be disabled manually using the `.render_errors off` command.

### Known Issues {#docs:current:clients:cli:known_issues}

#### Incorrect Memory Values on Old Linux Distributions and WSL 2 {#docs:current:clients:cli:known_issues::incorrect-memory-values-on-old-linux-distributions-and-wsl-2}

On Windows Subsystem for Linux 2 (WSL2), when querying the `max_memory` or `memory_limit` from the `duckdb_settings`, the values may be inaccurate on certain Ubuntu versions (e.g., 20.04 and 24.04). The issue also occurs on older distributions such as Red Hat Enterprise Linux 8 (RHEL 8):

Example:

```sql
FROM duckdb_settings() WHERE name LIKE '%mem%';
```

The output contains values larger than 1000 PiB:

```text
┌──────────────┬────────────┬─────────────────────────────────────────────┬────────────┬─────────┐
│     name     │   value    │                 description                 │ input_type │  scope  │
│   varchar    │  varchar   │                   varchar                   │  varchar   │ varchar │
├──────────────┼────────────┼─────────────────────────────────────────────┼────────────┼─────────┤
│ max_memory   │ 1638.3 PiB │ The maximum memory of the system (e.g. 1GB) │ VARCHAR    │ GLOBAL  │
│ memory_limit │ 1638.3 PiB │ The maximum memory of the system (e.g. 1GB) │ VARCHAR    │ GLOBAL  │
└──────────────┴────────────┴─────────────────────────────────────────────┴────────────┴─────────┘
```

## Go Client {#docs:current:clients:go}

> Installation To use the DuckDB Go client, visit the [Go installation page](https://duckdb.org/install/index.html?environment=go).
>
> The latest stable version of the DuckDB Go client is {% if site.current_duckdb_go_version != "" %}{{ site.current_duckdb_go_version }}{% else %}{{ site.lts_duckdb_go_version }}{% endif %}.

The DuckDB Go client, [`duckdb-go`](https://github.com/duckdb/duckdb-go), allows using DuckDB via the `database/sql` interface.
For examples on how to use this interface, see the [official documentation](https://pkg.go.dev/database/sql) and [tutorial](https://go.dev/doc/tutorial/database-access).

#### Installation {#docs:current:clients:go::installation}

To install the `duckdb-go` client, run:

```batch
go get github.com/duckdb/duckdb-go/v2
```

#### Importing {#docs:current:clients:go::importing}

To import the DuckDB Go package, add the following entries to your imports:

```go
import (
	"database/sql"
	_ "github.com/duckdb/duckdb-go/v2"
)
```

#### Appender {#docs:current:clients:go::appender}

The DuckDB Go client supports the [DuckDB Appender API](#docs:current:data:appender) for bulk inserts. You can obtain a new Appender by supplying a DuckDB connection to `NewAppenderFromConn()`. For example:

```go
connector, err := duckdb.NewConnector("test.db", nil)
if err != nil {
  ...
}
conn, err := connector.Connect(context.Background())
if err != nil {
  ...
}
defer conn.Close()

// Retrieve appender from connection (note that you have to create the table 'test' beforehand).
appender, err := NewAppenderFromConn(conn, "", "test")
if err != nil {
  ...
}
defer appender.Close()

err = appender.AppendRow(...)
if err != nil {
  ...
}

// Optional, if you want to access the appended rows immediately.
err = appender.Flush()
if err != nil {
  ...
}
```

#### Examples {#docs:current:clients:go::examples}

##### Simple Example {#docs:current:clients:go::simple-example}

An example for using the Go API is as follows:

```go
package main

import (
	"database/sql"
	"errors"
	"fmt"
	"log"

	_ "github.com/duckdb/duckdb-go/v2"
)

func main() {
	db, err := sql.Open("duckdb", "")
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	_, err = db.Exec(` CREATE TABLE people (id INTEGER, name VARCHAR)`)
	if err != nil {
		log.Fatal(err)
	}
	_, err = db.Exec(` INSERT INTO people VALUES (42, 'John')`)
	if err != nil {
		log.Fatal(err)
	}

	var (
		id   int
		name string
	)
	row := db.QueryRow(` SELECT id, name FROM people`)
	err = row.Scan(&id, &name)
	if errors.Is(err, sql.ErrNoRows) {
		log.Println("no rows")
	} else if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("id: %d, name: %s\n", id, name)
}
```

##### More Examples {#docs:current:clients:go::more-examples}

For more examples, see the [examples in the `duckdb-go` repository](https://github.com/duckdb/duckdb-go/tree/main/examples).

#### Acknowledgements {#docs:current:clients:go::acknowledgements}

We would like to thank [Marc Boeker](https://github.com/marcboeker) for the initial implementation of the DuckDB Go client.

## Java (JDBC) Client {#docs:current:clients:java}

> Installation To use the DuckDB Java (JDBC) client, visit the [Java installation page](https://duckdb.org/install/index.html?environment=java).
>
> The latest stable version of the DuckDB Java (JDBC) client is {% if site.current_duckdb_java_short_version != "" %}{{ site.current_duckdb_java_short_version }}{% else %}{{ site.lts_duckdb_java_short_version }}{% endif %}.

#### Installation {#docs:current:clients:java::installation}

The DuckDB Java JDBC API can be installed from [Maven Central](https://search.maven.org/artifact/org.duckdb/duckdb_jdbc). Please see the [installation page](https://duckdb.org/install) for details.

#### Basic API Usage {#docs:current:clients:java::basic-api-usage}

DuckDB's JDBC API implements the main parts of the standard Java Database Connectivity (JDBC) API, version 4.1. Describing JDBC is beyond the scope of this page, see the [official documentation](https://docs.oracle.com/javase/tutorial/jdbc/basics/index.html) for details. Below we focus on the DuckDB-specific parts.

Refer to the externally hosted [API Reference](https://javadoc.io/doc/org.duckdb/duckdb_jdbc) for more information about our extensions to the JDBC specification, or the below [Arrow Methods](#::arrow-methods).

##### Startup & Shutdown {#docs:current:clients:java::startup--shutdown}

In JDBC, database connections are created through the standard `java.sql.DriverManager` class.
The driver should auto-register in the `DriverManager`, if that does not work for some reason, you can enforce registration using the following statement:

```java
Class.forName("org.duckdb.DuckDBDriver");
```

To create a DuckDB connection, call `DriverManager` with the `jdbc:duckdb:` JDBC URL prefix, like so:

```java
import java.sql.Connection;
import java.sql.DriverManager;

Connection conn = DriverManager.getConnection("jdbc:duckdb:");
```

To use DuckDB-specific features such as the [Appender](#::appender), cast the object to a `DuckDBConnection`:

```java
import java.sql.DriverManager;
import org.duckdb.DuckDBConnection;

DuckDBConnection conn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:");
```

When using the `jdbc:duckdb:`  URL alone, an **in-memory database** is created. Note that for an in-memory database no data is persisted to disk (i.e., all data is lost when you exit the Java program). If you would like to access or create a persistent database, append its file name after the path. For example, if your database is stored in `/tmp/my_database`, use the JDBC URL `jdbc:duckdb:/tmp/my_database` to create a connection to it.

It is possible to open a DuckDB database file in **read-only** mode. This is for example useful if multiple Java processes want to read the same database file at the same time. To open an existing database file in read-only mode, set the connection property `duckdb.read_only` like so:

```java
Properties readOnlyProperty = new Properties();
readOnlyProperty.setProperty("duckdb.read_only", "true");
Connection conn = DriverManager.getConnection("jdbc:duckdb:/tmp/my_database", readOnlyProperty);
```

Additional connections can be created using the `DriverManager`. A more efficient mechanism is to call the `DuckDBConnection#duplicate()` method:

```java
Connection conn2 = ((DuckDBConnection) conn).duplicate();
```

Multiple connections are allowed, but mixing read-write and read-only connections is unsupported.

##### Configuring Connections {#docs:current:clients:java::configuring-connections}

Configuration options can be provided to change different settings of the database system. Note that many of these
settings can be changed later on using [`PRAGMA` statements](#docs:current:configuration:pragmas) as well.

```java
Properties connectionProperties = new Properties();
connectionProperties.setProperty("temp_directory", "/path/to/temp/dir/");
Connection conn = DriverManager.getConnection("jdbc:duckdb:/tmp/my_database", connectionProperties);
```

##### Querying {#docs:current:clients:java::querying}

DuckDB supports the standard JDBC methods to send queries and retrieve result sets. First a `Statement` object has to be created from the `Connection`, this object can then be used to send queries using `execute` and `executeQuery`. `execute()` is meant for queries where no results are expected like `CREATE TABLE` or `UPDATE` etc. and `executeQuery()` is meant to be used for queries that produce results (e.g., `SELECT`). Below two examples. See also the JDBC [`Statement`](https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html) and [`ResultSet`](https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html) documentations.

```java
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;

Connection conn = DriverManager.getConnection("jdbc:duckdb:");

// create a table
Statement stmt = conn.createStatement();
stmt.execute("CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)");
// insert two items into the table
stmt.execute("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)");

try (ResultSet rs = stmt.executeQuery("SELECT * FROM items")) {
    while (rs.next()) {
        System.out.println(rs.getString(1));
        System.out.println(rs.getInt(3));
    }
}
stmt.close();
```

```text
jeans
1
hammer
2
```

DuckDB also supports prepared statements as per the JDBC API:

```java
import java.sql.PreparedStatement;

try (PreparedStatement stmt = conn.prepareStatement("INSERT INTO items VALUES (?, ?, ?);")) {
    stmt.setString(1, "chainsaw");
    stmt.setDouble(2, 500.0);
    stmt.setInt(3, 42);
    stmt.execute();
    // more calls to execute() possible
}
```

> **Warning.** Do *not* use prepared statements to insert large amounts of data into DuckDB. See the [data import documentation](#docs:current:data:overview) for better options.

##### Arrow Methods {#docs:current:clients:java::arrow-methods}

Refer to the [API Reference](https://javadoc.io/doc/org.duckdb/duckdb_jdbc/latest/org/duckdb/DuckDBResultSet.html#arrowExportStream(java.lang.Object,long)) for type signatures

###### Arrow Export {#docs:current:clients:java::arrow-export}

The following demonstrates exporting an arrow stream and consuming it using the java arrow bindings

```java
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.ipc.ArrowReader;
import org.duckdb.DuckDBResultSet;

try (var conn = DriverManager.getConnection("jdbc:duckdb:");
    var stmt = conn.prepareStatement("SELECT * FROM generate_series(2000)");
    var resultset = (DuckDBResultSet) stmt.executeQuery();
    var allocator = new RootAllocator()) {
    try (var reader = (ArrowReader) resultset.arrowExportStream(allocator, 256)) {
        while (reader.loadNextBatch()) {
            System.out.println(reader.getVectorSchemaRoot().getVector("generate_series"));
        }
    }
    stmt.close();
}
```

###### Arrow Import {#docs:current:clients:java::arrow-import}

The following demonstrates consuming an Arrow stream from the Java Arrow bindings.

```java
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.ipc.ArrowReader;
import org.duckdb.DuckDBConnection;

// Arrow binding
try (var allocator = new RootAllocator();
     ArrowStreamReader reader = null; // should not be null of course
     var arrow_array_stream = ArrowArrayStream.allocateNew(allocator)) {
    Data.exportArrayStream(allocator, reader, arrow_array_stream);

    // DuckDB setup
    try (var conn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:")) {
        conn.registerArrowStream("asdf", arrow_array_stream);

        // run a query
        try (var stmt = conn.createStatement();
             var rs = (DuckDBResultSet) stmt.executeQuery("SELECT count(*) FROM asdf")) {
            while (rs.next()) {
                System.out.println(rs.getInt(1));
            }
        }
    }
}
```

##### Streaming Results {#docs:current:clients:java::streaming-results}

Result streaming is opt-in in the JDBC driver – by setting the `jdbc_stream_results` config to `true` before running a query. The easiest way to do that is to pass it in the `Properties` object.

```java
Properties props = new Properties();
props.setProperty(DuckDBDriver.JDBC_STREAM_RESULTS, String.valueOf(true));

Connection conn = DriverManager.getConnection("jdbc:duckdb:", props);
```

##### Appender {#docs:current:clients:java::appender}

The [Appender](#docs:current:data:appender) is available in the DuckDB JDBC driver via the `org.duckdb.DuckDBAppender` class.
The constructor of the class requires the schema name and the table name it is applied to.
The Appender is flushed when the `close()` method is called.

Example:

```java
import java.sql.DriverManager;
import java.sql.Statement;
import org.duckdb.DuckDBConnection;

DuckDBConnection conn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:");
try (var stmt = conn.createStatement()) {
    stmt.execute("CREATE TABLE tbl (x BIGINT, y FLOAT, s VARCHAR)"
);

// using try-with-resources to automatically close the appender at the end of the scope
try (var appender = conn.createAppender(DuckDBConnection.DEFAULT_SCHEMA, "tbl")) {
    appender.beginRow();
    appender.append(10);
    appender.append(3.2);
    appender.append("hello");
    appender.endRow();
    appender.beginRow();
    appender.append(20);
    appender.append(-8.1);
    appender.append("world");
    appender.endRow();
}
```

##### Batch Writer {#docs:current:clients:java::batch-writer}

The DuckDB JDBC driver offers batch write functionality.
The batch writer supports prepared statements to mitigate the overhead of query parsing.

> The preferred method for bulk inserts is to use the [Appender](#::appender) due to its higher performance.
> However, when using the Appender is not possible, the batch writer is available as alternative.

###### Batch Writer with Prepared Statements {#docs:current:clients:java::batch-writer-with-prepared-statements}

```java
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import org.duckdb.DuckDBConnection;

DuckDBConnection conn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:");
PreparedStatement stmt = conn.prepareStatement("INSERT INTO test (x, y, z) VALUES (?, ?, ?);");

stmt.setObject(1, 1);
stmt.setObject(2, 2);
stmt.setObject(3, 3);
stmt.addBatch();

stmt.setObject(1, 4);
stmt.setObject(2, 5);
stmt.setObject(3, 6);
stmt.addBatch();

stmt.executeBatch();
stmt.close();
```

###### Batch Writer with Vanilla Statements {#docs:current:clients:java::batch-writer-with-vanilla-statements}

The batch writer also supports vanilla SQL statements:

```java
import java.sql.DriverManager;
import java.sql.Statement;
import org.duckdb.DuckDBConnection;

DuckDBConnection conn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:");
Statement stmt = conn.createStatement();

stmt.execute("CREATE TABLE test (x INTEGER, y INTEGER, z INTEGER)");

stmt.addBatch("INSERT INTO test (x, y, z) VALUES (1, 2, 3);");
stmt.addBatch("INSERT INTO test (x, y, z) VALUES (4, 5, 6);");

stmt.executeBatch();
stmt.close();
```

#### Troubleshooting {#docs:current:clients:java::troubleshooting}

##### Driver Class Not Found {#docs:current:clients:java::driver-class-not-found}

If the Java application is unable to find the DuckDB driver, it may throw the following error:

```console
Exception in thread "main" java.sql.SQLException: No suitable driver found for jdbc:duckdb:
    at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:706)
    at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:252)
    ...
```

And when trying to load the class manually, it may result in this error:

```console
Exception in thread "main" java.lang.ClassNotFoundException: org.duckdb.DuckDBDriver
    at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
    at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520)
    at java.base/java.lang.Class.forName0(Native Method)
    at java.base/java.lang.Class.forName(Class.java:375)
    ...
```

These errors stem from the DuckDB Maven/Gradle dependency not being detected. To ensure that it is detected, force refresh the Maven configuration in your IDE.

## Node.js (Neo) {#clients:node_neo}

### Node.js Client (Neo) {#docs:current:clients:node_neo:overview}

> Installation To use the DuckDB Node.js client, visit the [Node.js installation page](https://duckdb.org/install/index.html?environment=nodejs).
>
> The latest stable version of the DuckDB Node.js (Neo) client is {% if site.current_duckdb_node_neo_version != "" %}{{ site.current_duckdb_node_neo_version }}{% else %}{{ site.lts_duckdb_node_neo_version }}{% endif %}.

An API for using [DuckDB](https://duckdb.org/index.html) in [Node.js](https://nodejs.org/).

The primary package, [@duckdb/node-api](https://www.npmjs.com/package/@duckdb/node-api), is a high-level API meant for applications.
It depends on low-level bindings that adhere closely to [DuckDB's C API](#docs:current:clients:c:overview),
available separately as [@duckdb/node-bindings](https://www.npmjs.com/package/@duckdb/node-bindings).

#### Roadmap {#docs:current:clients:node_neo:overview::roadmap}

Some features are not yet complete:

* Binding and appending the MAP and UNION data types
* Appending default values row-by-row
* User-defined types & functions
* Profiling info
* Table description
* APIs for Arrow

See the [issues list on GitHub](https://github.com/duckdb/duckdb-node-neo/issues)
for the most up-to-date roadmap.

#### Platforms {#docs:current:clients:node_neo:overview::platforms}

The Node.js (Neo) client supports the following [platforms](#docs:current:dev:building:overview::supported-platforms):

* `linux_amd64`
* `linux_arm64`
* `osx_amd64`
* `osx_arm64`
* `windows_amd64`

The `windows_arm64` platform is currently not supported.

#### Examples {#docs:current:clients:node_neo:overview::examples}

##### Get Basic Information {#docs:current:clients:node_neo:overview::get-basic-information}

```ts
import duckdb from '@duckdb/node-api';

console.log(duckdb.version());

console.log(duckdb.configurationOptionDescriptions());
```

##### Connect {#docs:current:clients:node_neo:overview::connect}

```ts
import { DuckDBConnection } from '@duckdb/node-api';

const connection = await DuckDBConnection.create();
```

This uses the default instance.
For advanced usage, you can create instances explicitly.

##### Create Instance {#docs:current:clients:node_neo:overview::create-instance}

```ts
import { DuckDBInstance } from '@duckdb/node-api';
```

Create with an in-memory database:
```ts
const instance = await DuckDBInstance.create(':memory:');
```

Equivalent to the above:
```ts
const instance = await DuckDBInstance.create();
```

Read from and write to a database file, which is created if needed:
```ts
const instance = await DuckDBInstance.create('my_duckdb.db');
```

Set [configuration options](#docs:current:configuration:overview):
```ts
const instance = await DuckDBInstance.create('my_duckdb.db', {
  threads: '4'
});
```

##### Instance Cache {#docs:current:clients:node_neo:overview::instance-cache}

Multiple instances in the same process should not
attach the same database.

To prevent this, an instance cache can be used:
```ts
const instance = await DuckDBInstance.fromCache('my_duckdb.db');
```

This uses the default instance cache. For advanced usage, you can create
instance caches explicitly:
```ts
import { DuckDBInstanceCache } from '@duckdb/node-api';

const cache = new DuckDBInstanceCache();
const instance = await cache.getOrCreateInstance('my_duckdb.db');
```

##### Connect to Instance {#docs:current:clients:node_neo:overview::connect-to-instance}

```ts
const connection = await instance.connect();
```

##### Disconnect {#docs:current:clients:node_neo:overview::disconnect}

Connections will be disconnected automatically soon after their reference
is dropped, but you can also disconnect explicitly if and when you want:

```ts
connection.disconnectSync();
```

or, equivalently:

```ts
connection.closeSync();
```

##### Run SQL {#docs:current:clients:node_neo:overview::run-sql}

```ts
const result = await connection.run('from test_all_types()');
```

##### Parameterize SQL {#docs:current:clients:node_neo:overview::parameterize-sql}

```ts
const prepared = await connection.prepare('select $1, $2, $3');
prepared.bindVarchar(1, 'duck');
prepared.bindInteger(2, 42);
prepared.bindList(3, listValue([10, 11, 12]), LIST(INTEGER));
const result = await prepared.run();
```

or:

```ts
const prepared = await connection.prepare('select $a, $b, $c');
prepared.bind({
  'a': 'duck',
  'b': 42,
  'c': listValue([10, 11, 12]),
}, {
  'a': VARCHAR,
  'b': INTEGER,
  'c': LIST(INTEGER),
});
const result = await prepared.run();
```

or even:

```ts
const result = await connection.run('select $a, $b, $c', {
  'a': 'duck',
  'b': 42,
  'c': listValue([10, 11, 12]),
}, {
  'a': VARCHAR,
  'b': INTEGER,
  'c': LIST(INTEGER),
});
```

Unspecified types will be inferred:

```ts
const result = await connection.run('select $a, $b, $c', {
  'a': 'duck',
  'b': 42,
  'c': listValue([10, 11, 12]),
});
```

##### Specifying Values {#docs:current:clients:node_neo:overview::specifying-values}

Values of many data types are represented using one of the JS primitives
`boolean`, `number`, `bigint`, or `string`.
Also, any type can have `null` values.

Values of some data types need to be constructed using special functions.
These are:

| Type | Function |
| ---- | -------- |
| `ARRAY` | `arrayValue` |
| `BIT` | `bitValue` |
| `BLOB` | `blobValue` |
| `DATE` | `dateValue` |
| `DECIMAL` | `decimalValue` |
| `INTERVAL` | `intervalValue` |
| `LIST` | `listValue` |
| `MAP` | `mapValue` |
| `STRUCT` | `structValue` |
| `TIME` | `timeValue` |
| `TIMETZ` | `timeTZValue` |
| `TIMESTAMP` | `timestampValue` |
| `TIMESTAMPTZ` | `timestampTZValue` |
| `TIMESTAMP_S` | `timestampSecondsValue` |
| `TIMESTAMP_MS` | `timestampMillisValue` |
| `TIMESTAMP_NS` | `timestampNanosValue` |
| `UNION` | `unionValue` |
| `UUID` | `uuidValue` |

##### Stream Results {#docs:current:clients:node_neo:overview::stream-results}

Streaming results evaluate lazily when rows are read.

```ts
const result = await connection.stream('from range(10_000)');
```

##### Inspect Result Metadata {#docs:current:clients:node_neo:overview::inspect-result-metadata}

Get column names and types:
```ts
const columnNames = result.columnNames();
const columnTypes = result.columnTypes();
```

##### Read Result Data {#docs:current:clients:node_neo:overview::read-result-data}

Run and read all data:
```ts
const reader = await connection.runAndReadAll('from test_all_types()');
const rows = reader.getRows();
// OR: const columns = reader.getColumns();
```

Stream and read up to (at least) some number of rows:
```ts
const reader = await connection.streamAndReadUntil(
  'from range(5000)',
  1000
);
const rows = reader.getRows();
// rows.length === 2048. (Rows are read in chunks of 2048.)
```

Read rows incrementally:
```ts
const reader = await connection.streamAndRead('from range(5000)');
reader.readUntil(2000);
// reader.currentRowCount === 2048 (Rows are read in chunks of 2048.)
// reader.done === false
reader.readUntil(4000);
// reader.currentRowCount === 4096
// reader.done === false
reader.readUntil(6000);
// reader.currentRowCount === 5000
// reader.done === true
```

##### Get Result Data {#docs:current:clients:node_neo:overview::get-result-data}

Result data can be retrieved in a variety of forms:

```ts
const reader = await connection.runAndReadAll(
  'from range(3) select range::int as i, 10 + i as n'
);

const rows = reader.getRows();
// [ [0, 10], [1, 11], [2, 12] ]

const rowObjects = reader.getRowObjects();
// [ { i: 0, n: 10 }, { i: 1, n: 11 }, { i: 2, n: 12 } ]

const columns = reader.getColumns();
// [ [0, 1, 2], [10, 11, 12] ]

const columnsObject = reader.getColumnsObject();
// { i: [0, 1, 2], n: [10, 11, 12] }
```

##### Convert Result Data {#docs:current:clients:node_neo:overview::convert-result-data}

By default, data values that cannot be represented as JS built-ins
are returned as specialized JS objects; see `Inspect Data Values` below.

To retrieve data in a different form, such as JS built-ins or values that
can be losslessly serialized to JSON, use the `JS` or `Json` forms of the
above result data methods.

Custom converters can be supplied as well. See the implementations of
[JSDuckDBValueConverter](https://github.com/duckdb/duckdb-node-neo/blob/main/api/src/JSDuckDBValueConverter.ts)
and [JsonDuckDBValueConverters](https://github.com/duckdb/duckdb-node-neo/blob/main/api/src/JsonDuckDBValueConverter.ts)
for how to do this.

Examples (using the `Json` forms):

```ts
const reader = await connection.runAndReadAll(
  'from test_all_types() select bigint, date, interval limit 2'
);

const rows = reader.getRowsJson();
// [
//   [
//     "-9223372036854775808",
//     "5877642-06-25 (BC)",
//     { "months": 0, "days": 0, "micros": "0" }
//   ],
//   [
//     "9223372036854775807",
//     "5881580-07-10",
//     { "months": 999, "days": 999, "micros": "999999999" }
//   ]
// ]

const rowObjects = reader.getRowObjectsJson();
// [
//   {
//     "bigint": "-9223372036854775808",
//     "date": "5877642-06-25 (BC)",
//     "interval": { "months": 0, "days": 0, "micros": "0" }
//   },
//   {
//     "bigint": "9223372036854775807",
//     "date": "5881580-07-10",
//     "interval": { "months": 999, "days": 999, "micros": "999999999" }
//   }
// ]

const columns = reader.getColumnsJson();
// [
//   [ "-9223372036854775808", "9223372036854775807" ],
//   [ "5877642-06-25 (BC)", "5881580-07-10" ],
//   [
//     { "months": 0, "days": 0, "micros": "0" },
//     { "months": 999, "days": 999, "micros": "999999999" }
//   ]
// ]

const columnsObject = reader.getColumnsObjectJson();
// {
//   "bigint": [ "-9223372036854775808", "9223372036854775807" ],
//   "date": [ "5877642-06-25 (BC)", "5881580-07-10" ],
//   "interval": [
//     { "months": 0, "days": 0, "micros": "0" },
//     { "months": 999, "days": 999, "micros": "999999999" }
//   ]
// }
```

These methods handle nested types as well:

```ts
const reader = await connection.runAndReadAll(
  'from test_all_types() select int_array, struct, map, "union" limit 2'
);

const rows = reader.getRowsJson();
// [
//   [
//     [],
//     { "a": null, "b": null },
//     [],
//     { "tag": "name", "value": "Frank" }
//   ],
//   [
//     [ 42, 999, null, null, -42],
//     { "a": 42, "b": "🦆🦆🦆🦆🦆🦆" },
//     [
//       { "key": "key1", "value": "🦆🦆🦆🦆🦆🦆" },
//       { "key": "key2", "value": "goose" }
//     ],
//     { "tag": "age", "value": 5 }
//   ]
// ]

const rowObjects = reader.getRowObjectsJson();
// [
//   {
//     "int_array": [],
//     "struct": { "a": null, "b": null },
//     "map": [],
//     "union": { "tag": "name", "value": "Frank" }
//   },
//   {
//     "int_array": [ 42, 999, null, null, -42 ],
//     "struct": { "a": 42, "b": "🦆🦆🦆🦆🦆🦆" },
//     "map": [
//       { "key": "key1", "value": "🦆🦆🦆🦆🦆🦆" },
//       { "key": "key2", "value": "goose" }
//     ],
//     "union": { "tag": "age", "value": 5 }
//   }
// ]

const columns = reader.getColumnsJson();
// [
//   [
//     [],
//     [42, 999, null, null, -42]
//   ],
//   [
//     { "a": null, "b": null },
//     { "a": 42, "b": "🦆🦆🦆🦆🦆🦆" }
//   ],
//   [
//     [],
//     [
//       { "key": "key1", "value": "🦆🦆🦆🦆🦆🦆" },
//       { "key": "key2", "value": "goose"}
//     ]
//   ],
//   [
//     { "tag": "name", "value": "Frank" },
//     { "tag": "age", "value": 5 }
//   ]
// ]

const columnsObject = reader.getColumnsObjectJson();
// {
//   "int_array": [
//     [],
//     [42, 999, null, null, -42]
//   ],
//   "struct": [
//     { "a": null, "b": null },
//     { "a": 42, "b": "🦆🦆🦆🦆🦆🦆" }
//   ],
//   "map": [
//     [],
//     [
//       { "key": "key1", "value": "🦆🦆🦆🦆🦆🦆" },
//       { "key": "key2", "value": "goose" }
//     ]
//   ],
//   "union": [
//     { "tag": "name", "value": "Frank" },
//     { "tag": "age", "value": 5 }
//   ]
// }
```

Column names and types can also be serialized to JSON:
```ts
const columnNamesAndTypes = reader.columnNamesAndTypesJson();
// {
//   "columnNames": [
//     "int_array",
//     "struct",
//     "map",
//     "union"
//   ],
//   "columnTypes": [
//     {
//       "typeId": 24,
//       "valueType": {
//         "typeId": 4
//       }
//     },
//     {
//       "typeId": 25,
//       "entryNames": [
//         "a",
//         "b"
//       ],
//       "entryTypes": [
//         {
//           "typeId": 4
//         },
//         {
//           "typeId": 17
//         }
//       ]
//     },
//     {
//       "typeId": 26,
//       "keyType": {
//         "typeId": 17
//       },
//       "valueType": {
//         "typeId": 17
//       }
//     },
//     {
//       "typeId": 28,
//       "memberTags": [
//         "name",
//         "age"
//       ],
//       "memberTypes": [
//         {
//           "typeId": 17
//         },
//         {
//           "typeId": 3
//         }
//       ]
//     }
//   ]
// }

const columnNameAndTypeObjects = reader.columnNameAndTypeObjectsJson();
// [
//   {
//     "columnName": "int_array",
//     "columnType": {
//       "typeId": 24,
//       "valueType": {
//         "typeId": 4
//       }
//     }
//   },
//   {
//     "columnName": "struct",
//     "columnType": {
//       "typeId": 25,
//       "entryNames": [
//         "a",
//         "b"
//       ],
//       "entryTypes": [
//         {
//           "typeId": 4
//         },
//         {
//           "typeId": 17
//         }
//       ]
//     }
//   },
//   {
//     "columnName": "map",
//     "columnType": {
//       "typeId": 26,
//       "keyType": {
//         "typeId": 17
//       },
//       "valueType": {
//         "typeId": 17
//       }
//     }
//   },
//   {
//     "columnName": "union",
//     "columnType": {
//       "typeId": 28,
//       "memberTags": [
//         "name",
//         "age"
//       ],
//       "memberTypes": [
//         {
//           "typeId": 17
//         },
//         {
//           "typeId": 3
//         }
//       ]
//     }
//   }
// ]
```

##### Fetch Chunks {#docs:current:clients:node_neo:overview::fetch-chunks}

Fetch all chunks:
```ts
const chunks = await result.fetchAllChunks();
```

Fetch one chunk at a time:
```ts
const chunks = [];
while (true) {
  const chunk = await result.fetchChunk();
  // Last chunk will have zero rows.
  if (chunk.rowCount === 0) {
    break;
  }
  chunks.push(chunk);
}
```

For materialized (non-streaming) results, chunks can be read by index:
```ts
const rowCount = result.rowCount;
const chunkCount = result.chunkCount;
for (let i = 0; i < chunkCount; i++) {
  const chunk = result.getChunk(i);
  // ...
}
```

Get chunk data:
```ts
const rows = chunk.getRows();

const rowObjects = chunk.getRowObjects(result.deduplicatedColumnNames());

const columns = chunk.getColumns();

const columnsObject =
  chunk.getColumnsObject(result.deduplicatedColumnNames());
```

Get chunk data (one value at a time)
```ts
const columns = [];
const columnCount = chunk.columnCount;
for (let columnIndex = 0; columnIndex < columnCount; columnIndex++) {
  const columnValues = [];
  const columnVector = chunk.getColumnVector(columnIndex);
  const itemCount = columnVector.itemCount;
  for (let itemIndex = 0; itemIndex < itemCount; itemIndex++) {
    const value = columnVector.getItem(itemIndex);
    columnValues.push(value);
  }
  columns.push(columnValues);
}
```

##### Inspect Data Types {#docs:current:clients:node_neo:overview::inspect-data-types}

```ts
import { DuckDBTypeId } from '@duckdb/node-api';

if (columnType.typeId === DuckDBTypeId.ARRAY) {
  const arrayValueType = columnType.valueType;
  const arrayLength = columnType.length;
}

if (columnType.typeId === DuckDBTypeId.DECIMAL) {
  const decimalWidth = columnType.width;
  const decimalScale = columnType.scale;
}

if (columnType.typeId === DuckDBTypeId.ENUM) {
  const enumValues = columnType.values;
}

if (columnType.typeId === DuckDBTypeId.LIST) {
  const listValueType = columnType.valueType;
}

if (columnType.typeId === DuckDBTypeId.MAP) {
  const mapKeyType = columnType.keyType;
  const mapValueType = columnType.valueType;
}

if (columnType.typeId === DuckDBTypeId.STRUCT) {
  const structEntryNames = columnType.names;
  const structEntryTypes = columnType.valueTypes;
}

if (columnType.typeId === DuckDBTypeId.UNION) {
  const unionMemberTags = columnType.memberTags;
  const unionMemberTypes = columnType.memberTypes;
}

// For the JSON type (https://duckdb.org/docs/data/json/json_type)
if (columnType.alias === 'JSON') {
  const json = JSON.parse(columnValue);
}
```

Every type implements toString.
The result is both human-friendly and readable by DuckDB in an appropriate expression.

```ts
const typeString = columnType.toString();
```

##### Inspect Data Values {#docs:current:clients:node_neo:overview::inspect-data-values}

```ts
import { DuckDBTypeId } from '@duckdb/node-api';

if (columnType.typeId === DuckDBTypeId.ARRAY) {
  const arrayItems = columnValue.items; // array of values
  const arrayString = columnValue.toString();
}

if (columnType.typeId === DuckDBTypeId.BIT) {
  const bools = columnValue.toBools(); // array of booleans
  const bits = columnValue.toBits(); // array of 0s and 1s
  const bitString = columnValue.toString(); // string of '0's and '1's
}

if (columnType.typeId === DuckDBTypeId.BLOB) {
  const blobBytes = columnValue.bytes; // Uint8Array
  const blobString = columnValue.toString();
}

if (columnType.typeId === DuckDBTypeId.DATE) {
  const dateDays = columnValue.days;
  const dateString = columnValue.toString();
  const { year, month, day } = columnValue.toParts();
}

if (columnType.typeId === DuckDBTypeId.DECIMAL) {
  const decimalWidth = columnValue.width;
  const decimalScale = columnValue.scale;
  // Scaled-up value. Represented number is value/(10^scale).
  const decimalValue = columnValue.value; // bigint
  const decimalString = columnValue.toString();
  const decimalDouble = columnValue.toDouble();
}

if (columnType.typeId === DuckDBTypeId.INTERVAL) {
  const intervalMonths = columnValue.months;
  const intervalDays = columnValue.days;
  const intervalMicros = columnValue.micros; // bigint
  const intervalString = columnValue.toString();
}

if (columnType.typeId === DuckDBTypeId.LIST) {
  const listItems = columnValue.items; // array of values
  const listString = columnValue.toString();
}

if (columnType.typeId === DuckDBTypeId.MAP) {
  const mapEntries = columnValue.entries; // array of { key, value }
  const mapString = columnValue.toString();
}

if (columnType.typeId === DuckDBTypeId.STRUCT) {
  // { name1: value1, name2: value2, ... }
  const structEntries = columnValue.entries;
  const structString = columnValue.toString();
}

if (columnType.typeId === DuckDBTypeId.TIMESTAMP_MS) {
  const timestampMillis = columnValue.milliseconds; // bigint
  const timestampMillisString = columnValue.toString();
}

if (columnType.typeId === DuckDBTypeId.TIMESTAMP_NS) {
  const timestampNanos = columnValue.nanoseconds; // bigint
  const timestampNanosString = columnValue.toString();
}

if (columnType.typeId === DuckDBTypeId.TIMESTAMP_S) {
  const timestampSecs = columnValue.seconds; // bigint
  const timestampSecsString = columnValue.toString();
}

if (columnType.typeId === DuckDBTypeId.TIMESTAMP_TZ) {
  const timestampTZMicros = columnValue.micros; // bigint
  const timestampTZString = columnValue.toString();
  const {
    date: { year, month, day },
    time: { hour, min, sec, micros },
  } = columnValue.toParts();
}

if (columnType.typeId === DuckDBTypeId.TIMESTAMP) {
  const timestampMicros = columnValue.micros; // bigint
  const timestampString = columnValue.toString();
  const {
    date: { year, month, day },
    time: { hour, min, sec, micros },
  } = columnValue.toParts();
}

if (columnType.typeId === DuckDBTypeId.TIME_TZ) {
  const timeTZMicros = columnValue.micros; // bigint
  const timeTZOffset = columnValue.offset;
  const timeTZString = columnValue.toString();
  const {
    time: { hour, min, sec, micros },
    offset,
  } = columnValue.toParts();
}

if (columnType.typeId === DuckDBTypeId.TIME) {
  const timeMicros = columnValue.micros; // bigint
  const timeString = columnValue.toString();
  const { hour, min, sec, micros } = columnValue.toParts();
}

if (columnType.typeId === DuckDBTypeId.UNION) {
  const unionTag = columnValue.tag;
  const unionValue = columnValue.value;
  const unionValueString = columnValue.toString();
}

if (columnType.typeId === DuckDBTypeId.UUID) {
  const uuidHugeint = columnValue.hugeint; // bigint
  const uuidString = columnValue.toString();
}

// other possible values are: null, boolean, number, bigint, or string
```

##### Displaying Timezones {#docs:current:clients:node_neo:overview::displaying-timezones}

Converting a TIMESTAMP_TZ value to a string depends on a timezone offset.
By default, this is set to the offset for the local timezone when the Node
process is started.

To change it, set the `timezoneOffsetInMinutes`
property of `DuckDBTimestampTZValue`:

```ts
DuckDBTimestampTZValue.timezoneOffsetInMinutes = -8 * 60;
const pst = DuckDBTimestampTZValue.Epoch.toString();
// 1969-12-31 16:00:00-08

DuckDBTimestampTZValue.timezoneOffsetInMinutes = +1 * 60;
const cet = DuckDBTimestampTZValue.Epoch.toString();
// 1970-01-01 01:00:00+01
```

Note that the timezone offset used for this string
conversion is distinct from the `TimeZone` setting of DuckDB.

The following sets this offset to match the `TimeZone` setting of DuckDB:

```ts
const reader = await connection.runAndReadAll(
  `select (timezone(current_timestamp) / 60)::int`
);
DuckDBTimestampTZValue.timezoneOffsetInMinutes =
  reader.getColumns()[0][0];
```

##### Append To Table {#docs:current:clients:node_neo:overview::append-to-table}

```ts
await connection.run(
  `create or replace table target_table(i integer, v varchar)`
);

const appender = await connection.createAppender('target_table');

appender.appendInteger(42);
appender.appendVarchar('duck');
appender.endRow();

appender.appendInteger(123);
appender.appendVarchar('mallard');
appender.endRow();

appender.flushSync();

appender.appendInteger(17);
appender.appendVarchar('goose');
appender.endRow();

appender.closeSync(); // also flushes
```

##### Append Data Chunk {#docs:current:clients:node_neo:overview::append-data-chunk}

```ts
await connection.run(
  `create or replace table target_table(i integer, v varchar)`
);

const appender = await connection.createAppender('target_table');

const chunk = DuckDBDataChunk.create([INTEGER, VARCHAR]);
chunk.setColumns([
  [42, 123, 17],
  ['duck', 'mallard', 'goose'],
]);
// OR:
// chunk.setRows([
//   [42, 'duck'],
//   [123, 'mallard'],
//   [17, 'goose'],
// ]);

appender.appendDataChunk(chunk);
appender.flushSync();
```

See "Specifying Values" above for how to supply values to the appender.

##### Extract Statements {#docs:current:clients:node_neo:overview::extract-statements}

```ts
const extractedStatements = await connection.extractStatements(` 
  create or replace table numbers as from range(?);
  from numbers where range < ?;
  drop table numbers;
`);
const parameterValues = [10, 7];
const statementCount = extractedStatements.count;
for (let stmtIndex = 0; stmtIndex < statementCount; stmtIndex++) {
  const prepared = await extractedStatements.prepare(stmtIndex);
  let parameterCount = prepared.parameterCount;
  for (let paramIndex = 1; paramIndex <= parameterCount; paramIndex++) {
    prepared.bindInteger(paramIndex, parameterValues.shift());
  }
  const result = await prepared.run();
  // ...
}
```

##### Control Evaluation of Tasks {#docs:current:clients:node_neo:overview::control-evaluation-of-tasks}

```ts
import { DuckDBPendingResultState } from '@duckdb/node-api';

async function sleep(ms) {
  return new Promise((resolve) => {
    setTimeout(resolve, ms);
  });
}

const prepared = await connection.prepare('from range(10_000_000)');
const pending = prepared.start();
while (pending.runTask() !== DuckDBPendingResultState.RESULT_READY) {
  console.log('not ready');
  await sleep(1);
}
console.log('ready');
const result = await pending.getResult();
// ...
```

##### Ways to Run SQL {#docs:current:clients:node_neo:overview::ways-to-run-sql}

```ts
// Run to completion but don't yet retrieve any rows.
// Optionally take values to bind to SQL parameters,
// and (optionally) types of those parameters,
// either as an array (for positional parameters),
// or an object keyed by parameter name.
const result = await connection.run(sql);
const result = await connection.run(sql, values);
const result = await connection.run(sql, values, types);

// Run to completion but don't yet retrieve any rows.
// Wrap in a DuckDBDataReader for convenient data retrieval.
const reader = await connection.runAndRead(sql);
const reader = await connection.runAndRead(sql, values);
const reader = await connection.runAndRead(sql, values, types);

// Run to completion, wrap in a reader, and read all rows.
const reader = await connection.runAndReadAll(sql);
const reader = await connection.runAndReadAll(sql, values);
const reader = await connection.runAndReadAll(sql, values, types);

// Run to completion, wrap in a reader, and read at least
// the given number of rows. (Rows are read in chunks, so more than
// the target may be read.)
const reader = await connection.runAndReadUntil(sql, targetRowCount);
const reader =
  await connection.runAndReadAll(sql, targetRowCount, values);
const reader =
  await connection.runAndReadAll(sql, targetRowCount, values, types);

// Create a streaming result and don't yet retrieve any rows.
const result = await connection.stream(sql);
const result = await connection.stream(sql, values);
const result = await connection.stream(sql, values, types);

// Create a streaming result and don't yet retrieve any rows.
// Wrap in a DuckDBDataReader for convenient data retrieval.
const reader = await connection.streamAndRead(sql);
const reader = await connection.streamAndRead(sql, values);
const reader = await connection.streamAndRead(sql, values, types);

// Create a streaming result, wrap in a reader, and read all rows.
const reader = await connection.streamAndReadAll(sql);
const reader = await connection.streamAndReadAll(sql, values);
const reader = await connection.streamAndReadAll(sql, values, types);

// Create a streaming result, wrap in a reader, and read at least
// the given number of rows.
const reader = await connection.streamAndReadUntil(sql, targetRowCount);
const reader =
  await connection.streamAndReadUntil(sql, targetRowCount, values);
const reader =
  await connection.streamAndReadUntil(sql, targetRowCount, values, types);

// Prepared Statements

// Prepare a possibly-parametered SQL statement to run later.
const prepared = await connection.prepare(sql);

// Bind values to the parameters.
prepared.bind(values);
prepared.bind(values, types);

// Run the prepared statement. These mirror the methods on the connection.
const result = prepared.run();

const reader = prepared.runAndRead();
const reader = prepared.runAndReadAll();
const reader = prepared.runAndReadUntil(targetRowCount);

const result = prepared.stream();

const reader = prepared.streamAndRead();
const reader = prepared.streamAndReadAll();
const reader = prepared.streamAndReadUntil(targetRowCount);

// Pending Results

// Create a pending result.
const pending = await connection.start(sql);
const pending = await connection.start(sql, values);
const pending = await connection.start(sql, values, types);

// Create a pending, streaming result.
const pending = await connection.startStream(sql);
const pending = await connection.startStream(sql, values);
const pending = await connection.startStream(sql, values, types);

// Create a pending result from a prepared statement.
const pending = await prepared.start();
const pending = await prepared.startStream();

while (pending.runTask() !== DuckDBPendingResultState.RESULT_READY) {
  // optionally sleep or do other work between tasks
}

// Retrieve the result. If not yet READY, will run until it is.
const result = await pending.getResult();

const reader = await pending.read();
const reader = await pending.readAll();
const reader = await pending.readUntil(targetRowCount);
```

##### Ways to Get Result Data {#docs:current:clients:node_neo:overview::ways-to-get-result-data}

```ts
// From a result

// Asynchronously retrieve data for all rows:
const columns = await result.getColumns();
const columnsJson = await result.getColumnsJson();
const columnsObject = await result.getColumnsObject();
const columnsObjectJson = await result.getColumnsObjectJson();
const rows = await result.getRows();
const rowsJson = await result.getRowsJson();
const rowObjects = await result.getRowObjects();
const rowObjectsJson = await result.getRowObjectsJson();

// From a reader

// First, (asynchronously) read some rows:
await reader.readAll();
// or:
await reader.readUntil(targetRowCount);

// Then, (synchronously) get result data for the rows read:
const columns = reader.getColumns();
const columnsJson = reader.getColumnsJson();
const columnsObject = reader.getColumnsObject();
const columnsObjectJson = reader.getColumnsObjectJson();
const rows = reader.getRows();
const rowsJson = reader.getRowsJson();
const rowObjects = reader.getRowObjects();
const rowObjectsJson = reader.getRowObjectsJson();

// Individual values can also be read directly:
const value = reader.value(columnIndex, rowIndex);

// Using chunks

// If desired, one or more chunks can be fetched from a result:
const chunk = await result.fetchChunk();
const chunks = await result.fetchAllChunks();

// And then data can be retrieved from each chunk:
const columnValues = chunk.getColumnValues(columnIndex);
const columns = chunk.getColumns();
const rowValues = chunk.getRowValues(rowIndex);
const rows = chunk.getRows();

// Or, values can be visited:
chunk.visitColumnValues(columnIndex,
  (value, rowIndex, columnIndex, type) => { /* ... */ }
);
chunk.visitColumns((column, columnIndex, type) => { /* ... */ });
chunk.visitColumnMajor(
  (value, rowIndex, columnIndex, type) => { /* ... */ }
);
chunk.visitRowValues(rowIndex,
  (value, rowIndex, columnIndex, type) => { /* ... */ }
);
chunk.visitRows((row, rowIndex) => { /* ... */ });
chunk.visitRowMajor(
  (value, rowIndex, columnIndex, type) => { /* ... */ }
);

// Or converted:
// The `converter` argument implements `DuckDBValueConverter`,
// which has the single method convertValue(value, type).
const columnValues = chunk.convertColumnValues(columnIndex, converter);
const columns = chunk.convertColumns(converter);
const rowValues = chunk.convertRowValues(rowIndex, converter);
const rows = chunk.convertRows(converter);

// The reader abstracts these low-level chunk manipulations
// and is recommended for most cases.
```

## ODBC {#clients:odbc}

### ODBC API Overview {#docs:current:clients:odbc:overview}

> Installation To use the DuckDB ODBC client, visit the [ODBC installation page](https://duckdb.org/install/index.html?environment=odbc).
>
> The latest stable version of the DuckDB ODBC client is {% if site.current_duckdb_odbc_short_version != "" %}{{ site.current_duckdb_odbc_short_version }}{% else %}{{ site.lts_duckdb_odbc_short_version }}{% endif %}.

The ODBC (Open Database Connectivity) is a C-style API that provides access to different flavors of Database Management Systems (DBMSs).
The ODBC API consists of the Driver Manager (DM) and the ODBC drivers.

The Driver Manager is part of the system library, e.g., unixODBC, which manages the communications between the user applications and the ODBC drivers.
Typically, applications are linked against the DM, which uses Data Source Name (DSN) to look up the correct ODBC driver.

The ODBC driver is a DBMS implementation of the ODBC API, which handles all the internals of that DBMS.

The DM maps user application calls of ODBC functions to the correct ODBC driver that performs the specified function and returns the proper values.

#### DuckDB ODBC Driver {#docs:current:clients:odbc:overview::duckdb-odbc-driver}

DuckDB supports the ODBC version 3.0 according to the [Core Interface Conformance](https://docs.microsoft.com/en-us/sql/odbc/reference/develop-app/core-interface-conformance?view=sql-server-ver15).

The ODBC driver is available for all operating systems. Visit the [installation page](https://duckdb.org/install) for direct links.

### ODBC API on Linux {#docs:current:clients:odbc:linux}

#### Driver Manager {#docs:current:clients:odbc:linux::driver-manager}

A driver manager is required to manage communication between applications and the ODBC driver.
We tested and support `unixODBC` that is a complete ODBC driver manager for Linux.
Users can install it from the command line:

On Debian-based distributions (Ubuntu, Mint, etc.), run:

```batch
sudo apt-get install unixodbc odbcinst
```

On Fedora-based distributions (Amazon Linux, RHEL, CentOS, etc.), run:

```batch
sudo yum install unixODBC
```

#### Setting Up the Driver {#docs:current:clients:odbc:linux::setting-up-the-driver}

1. Download the ODBC Linux Asset corresponding to your architecture:

   

   * [x86_64 (AMD64)](https://github.com/duckdb/duckdb-odbc/releases/download/v{% if site.current_duckdb_odbc_version != "" %}1.5.2.0{% else %}{{ site.lts_duckdb_odbc_version }}{% endif %}/duckdb_odbc-linux-amd64.zip)
   * [arm64 (AArch64)](https://github.com/duckdb/duckdb-odbc/releases/download/v{% if site.current_duckdb_odbc_version != "" %}1.5.2.0{% else %}{{ site.lts_duckdb_odbc_version }}{% endif %}/duckdb_odbc-linux-aarch64.zip)

   

2. The package contains the following files:

   * `libduckdb_odbc.so`: the DuckDB driver.
   * `unixodbc_setup.sh`: a setup script to aid the configuration on Linux.

   To extract them, run:

   ```batch
   mkdir duckdb_odbc && unzip duckdb_odbc-linux-amd64.zip -d duckdb_odbc
   ```

3. The `unixodbc_setup.sh` script performs the configuration of the DuckDB ODBC Driver. It is based on the unixODBC package that provides some commands to handle the ODBC setup and test like `odbcinst` and `isql`.

   Run the following commands with either option `-u` or `-s` to configure DuckDB ODBC.

   The `-u` option based on the user home directory to setup the ODBC init files.

   ```batch
   ./unixodbc_setup.sh -u
   ```

   The `-s` option changes the system level files that will be visible for all users, because of that it requires root privileges.

   ```batch
   sudo ./unixodbc_setup.sh -s
   ```

   The option `--help` shows the usage of `unixodbc_setup.sh` prints the help.

   ```batch
   ./unixodbc_setup.sh --help
   ```

   ```text
   Usage: ./unixodbc_setup.sh <level> [options]

   Example: ./unixodbc_setup.sh -u -db ~/database_path -D ~/driver_path/libduckdb_odbc.so

   Level:
   -s: System-level, using 'sudo' to configure DuckDB ODBC at the system-level, changing the files: /etc/odbc[inst].ini
   -u: User-level, configuring the DuckDB ODBC at the user-level, changing the files: ~/.odbc[inst].ini.

   Options:
   -db database_path>: the DuckDB database file path, the default is ':memory:' if not provided.
   -D driver_path: the driver file path (i.e., the path for libduckdb_odbc.so), the default is using the base script directory
   ```

4. The ODBC setup on Linux is based on the `.odbc.ini` and `.odbcinst.ini` files.

   These files can be placed to the user home directory `/home/⟨username⟩`{:.language-sql .highlight} or in the system `/etc`{:.language-sql .highlight} directory.
   The Driver Manager prioritizes the user configuration files over the system files.

   For the details of the configuration parameters, see the [ODBC configuration page](#docs:current:clients:odbc:configuration).

### ODBC API on Windows {#docs:current:clients:odbc:windows}

#### Setup {#docs:current:clients:odbc:windows::setup}

Using the DuckDB ODBC API on Windows requires the following steps:

1. Microsoft Windows requires an ODBC Driver Manager to manage communication between applications and the ODBC drivers.
   The Driver Manager on Windows is provided in a DLL file `odbccp32.dll`, and other files and tools.
   For detailed information check out the [Common ODBC Component Files](https://docs.microsoft.com/en-us/previous-versions/windows/desktop/odbc/dn170563(v=vs.85)).

2.  DuckDB releases the ODBC driver as an asset. For Windows, download it from the [Windows ODBC asset (x86_64/AMD64)](https://github.com/duckdb/duckdb-odbc/releases/download/v{% if site.current_duckdb_odbc_version != "" %}1.5.2.0{% else %}{{ site.lts_duckdb_odbc_version }}{% endif %}/duckdb_odbc-windows-amd64.zip). 

3. The archive contains the following artifacts:

   * `duckdb_odbc.dll`: the DuckDB driver compiled for Windows.
   * `duckdb_odbc_setup.dll`: a setup DLL used by the Windows ODBC Data Source Administrator tool.
   * `odbc_install.exe`: an installation script to aid the configuration on Windows.

   Decompress the archive to a directory (e.g., `duckdb_odbc`).

4. The `odbc_install.exe` binary performs the configuration of the DuckDB ODBC Driver on Windows. It depends on the `Odbccp32.dll` that provides functions to configure the ODBC registry entries.

   Inside the permanent directory (e.g., `duckdb_odbc`), double-click on the `odbc_install.exe`.

   Windows administrator privileges are required. In case of a non-administrator, a User Account Control prompt will occur.

5. `odbc_install.exe` adds a default DSN configuration into the ODBC registries with a default database `:memory:`.

##### DSN Windows Setup {#docs:current:clients:odbc:windows::dsn-windows-setup}

After the installation, it is possible to change the default DSN configuration or add a new one using the Windows ODBC Data Source Administrator tool `odbcad32.exe`.

It also can be launched through the Windows start:

![](../images/blog/odbc/launch_odbcad.png)


##### Default DuckDB DSN {#docs:current:clients:odbc:windows::default-duckdb-dsn}

The newly installed DSN is visible on the ***System DSN*** in the Windows ODBC Data Source Administrator tool:

![Windows ODBC Config Tool](../images/blog/odbc/odbcad32_exe.png)

##### Changing DuckDB DSN {#docs:current:clients:odbc:windows::changing-duckdb-dsn}

When selecting the default DSN (i.e., `DuckDB`) or adding a new configuration, the following setup window will display:

![DuckDB Windows DSN Setup](../images/blog/odbc/duckdb_DSN_setup.png)

This window allows you to set the DSN and the database file path associated with that DSN.

#### More Detailed Windows Setup {#docs:current:clients:odbc:windows::more-detailed-windows-setup}

There are two ways to configure the ODBC driver, either by altering the registry keys as detailed below,
or by connecting with [`SQLDriverConnect`](https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqldriverconnect-function?view=sql-server-ver16).
A combination of the two is also possible.

Furthermore, the ODBC driver supports all the [configuration options](#docs:current:configuration:overview)
included in DuckDB.

> If a configuration is set in both the connection string passed to `SQLDriverConnect` and in the `odbc.ini` file,
> the one passed to `SQLDriverConnect` will take precedence.

For the details of the configuration parameters, see the [ODBC configuration page](#docs:current:clients:odbc:configuration).

##### Registry Keys {#docs:current:clients:odbc:windows::registry-keys}

The ODBC setup on Windows is based on registry keys (see [Registry Entries for ODBC Components](https://docs.microsoft.com/en-us/sql/odbc/reference/install/registry-entries-for-odbc-components?view=sql-server-ver15)).
The ODBC entries can be placed at the current user registry key (` HKCU`) or the system registry key (` HKLM`).

We have tested and used the system entries based on `HKLM->SOFTWARE->ODBC`.
The `odbc_install.exe` changes this entry that has two subkeys: `ODBC.INI` and `ODBCINST.INI`.

The `ODBC.INI` is where users usually insert DSN registry entries for the drivers.

For example, the DSN registry for DuckDB would look like this:

![`HKLM->SOFTWARE->ODBC->ODBC.INI->DuckDB`](../images/blog/odbc/odbc_ini-registry-entry.png)

The `ODBCINST.INI` contains one entry for each ODBC driver and other keys predefined for [Windows ODBC configuration](https://docs.microsoft.com/en-us/sql/odbc/reference/install/registry-entries-for-odbc-components?view=sql-server-ver15).

##### Updating the ODBC Driver {#docs:current:clients:odbc:windows::updating-the-odbc-driver}

When a new version of the ODBC driver is released, installing the new version will overwrite the existing one.
However, the installer doesn't always update the version number in the registry.
To ensure the correct version is used,
check that `HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBCINST.INI\DuckDB Driver` has the most recent version,
and `HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBC.INI\DuckDB\Driver` has the correct path to the new driver.

### ODBC API on macOS {#docs:current:clients:odbc:macos}

1. A driver manager is required to manage communication between applications and the ODBC driver. DuckDB supports `unixODBC`, which is a complete ODBC driver manager for macOS and Linux. Users can install it from the command line via [Homebrew](https://brew.sh/):

   ```batch
   brew install unixodbc
   ```

2.  DuckDB releases a universal [ODBC driver for macOS](https://github.com/duckdb/duckdb-odbc/releases/download/v{% if site.current_duckdb_odbc_version != "" %}1.5.2.0{% else %}{{ site.lts_duckdb_odbc_version }}{% endif %}/duckdb_odbc-osx-universal.zip) (supporting both Intel and Apple Silicon CPUs). To download it, run:

   ```batch
   wget https://github.com/duckdb/duckdb-odbc/releases/download/v{% if site.current_duckdb_odbc_version != "" %}1.5.2.0{% else %}{{ site.lts_duckdb_odbc_version }}{% endif %}/duckdb_odbc-osx-universal.zip
   ```

   

3. The archive contains the `libduckdb_odbc.dylib` artifact. To extract it to a directory, run:

   ```batch
   mkdir duckdb_odbc && unzip duckdb_odbc-osx-universal.zip -d duckdb_odbc
   ```

4. There are two ways to configure the ODBC driver, either by initializing via the configuration files, or by connecting with [`SQLDriverConnect`](https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqldriverconnect-function?view=sql-server-ver16).
   A combination of the two is also possible.

   Furthermore, the ODBC driver supports all the [configuration options](#docs:current:configuration:overview) included in DuckDB.

   > If a configuration is set in both the connection string passed to `SQLDriverConnect` and in the `odbc.ini` file,
   > the one passed to `SQLDriverConnect` will take precedence.

   For the details of the configuration parameters, see the [ODBC configuration page](#docs:current:clients:odbc:configuration).

5. After the configuration, to validate the installation, it is possible to use an ODBC client. unixODBC uses a command line tool called `isql`.

   Use the DSN defined in `odbc.ini` as a parameter of `isql`.

   ```batch
   isql DuckDB
   ```

   ```text
   +---------------------------------------+
   | Connected!                            |
   |                                       |
   | sql-statement                         |
   | help [tablename]                      |
   | echo [string]                         |
   | quit                                  |
   |                                       |
   +---------------------------------------+
   ```

   ```sql
   SQL> SELECT 42;
   ```

   ```text
   +------------+
   | 42         |
   +------------+
   | 42         |
   +------------+

   SQLRowCount returns -1
   1 rows fetched
   ```

### ODBC Configuration {#docs:current:clients:odbc:configuration}

This page documents the files using the ODBC configuration, [`odbc.ini`](#::odbcini-and-odbcini) and [`odbcinst.ini`](#::odbcinstini-and-odbcinstini).
These are either placed in the home directory as dotfiles (` .odbc.ini` and `.odbcinst.ini`, respectively) or in a system directory.
For platform-specific details, see the pages for [Linux](#docs:current:clients:odbc:linux), [macOS](#docs:current:clients:odbc:macos), and [Windows](#docs:current:clients:odbc:windows).

#### `odbc.ini` and `.odbc.ini` {#docs:current:clients:odbc:configuration::odbcini-and-odbcini}

The `odbc.ini` file contains the DSNs for the drivers, which can have specific knobs.
An example of `odbc.ini` with DuckDB:

```ini
[DuckDB]
Driver = DuckDB Driver
Database = :memory:
access_mode = read_only
```

The lines correspond to the following parameters:

* `[DuckDB]`: between the brackets is a DSN for the DuckDB.
* `Driver`: Describes the driver's name, as well as where to find the configurations in the `odbcinst.ini`.
* `Database`: Describes the database name used by DuckDB, can also be a file path to a `.db` in the system.
* `access_mode`: The mode in which to connect to the database.

#### `odbcinst.ini` and `.odbcinst.ini` {#docs:current:clients:odbc:configuration::odbcinstini-and-odbcinstini}

The `odbcinst.ini` file contains general configurations for the ODBC installed drivers in the system.
A driver section starts with the driver name between brackets, and then it follows specific configuration knobs belonging to that driver.

Example of `odbcinst.ini` with the DuckDB:

```ini
[ODBC]
Trace = yes
TraceFile = /tmp/odbctrace

[DuckDB Driver]
Driver = /path/to/libduckdb_odbc.dylib
```

The lines correspond to the following parameters:

* `[ODBC]`: The DM configuration section.
* `Trace`: Enables the ODBC trace file using the option `yes`.
* `TraceFile`: The absolute system file path for the ODBC trace file.
* `[DuckDB Driver]`: The section of the DuckDB installed driver.
* `Driver`: The absolute system file path of the DuckDB driver. Change to match your configuration.

## Python {#clients:python}

### Python API {#docs:current:clients:python:overview}

> Installation To use the DuckDB Python client, visit the [Python installation page](https://duckdb.org/install/index.html?environment=python).
>
> The latest stable version of the DuckDB Python client is 1.5.2.

#### Installation {#docs:current:clients:python:overview::installation}

The DuckDB Python API can be installed using [pip](https://pip.pypa.io): `pip install duckdb`. Please see the [installation page](https://duckdb.org/install): `conda install python-duckdb -c conda-forge`.

**Python version:**
DuckDB requires Python 3.9 or newer.

#### Basic API Usage {#docs:current:clients:python:overview::basic-api-usage}

The most straight-forward manner of running SQL queries using DuckDB is using the `duckdb.sql` command.

```python
import duckdb

duckdb.sql("SELECT 42").show()
```

This will run queries using an **in-memory database** that is stored globally inside the Python module. The result of the query is returned as a **Relation**. A relation is a symbolic representation of the query. The query is not executed until the result is fetched or requested to be printed to the screen.

Relations can be referenced in subsequent queries by storing them inside variables, and using them as tables. This way queries can be constructed incrementally.

```python
import duckdb

r1 = duckdb.sql("SELECT 42 AS i")
duckdb.sql("SELECT i * 2 AS k FROM r1").show()
```

#### Data Input {#docs:current:clients:python:overview::data-input}

DuckDB can ingest data from a wide variety of formats – both on-disk and in-memory. See the [data ingestion page](#docs:current:clients:python:data_ingestion) for more information.

```python
import duckdb

duckdb.read_csv("example.csv")                # read a CSV file into a Relation
duckdb.read_parquet("example.parquet")        # read a Parquet file into a Relation
duckdb.read_json("example.json")              # read a JSON file into a Relation

duckdb.sql("SELECT * FROM 'example.csv'")     # directly query a CSV file
duckdb.sql("SELECT * FROM 'example.parquet'") # directly query a Parquet file
duckdb.sql("SELECT * FROM 'example.json'")    # directly query a JSON file
```

##### DataFrames {#docs:current:clients:python:overview::dataframes}

DuckDB can directly query Pandas DataFrames, Polars DataFrames and Arrow tables.
Note that these are read-only, i.e., editing these tables via [`INSERT`](#docs:current:sql:statements:insert) or [`UPDATE` statements](#docs:current:sql:statements:update) is not possible.

###### Pandas {#docs:current:clients:python:overview::pandas}

To directly query a Pandas DataFrame, run:

```python
import duckdb
import pandas as pd

pandas_df = pd.DataFrame({"a": [42]})
duckdb.sql("SELECT * FROM pandas_df")
```

```text
┌───────┐
│   a   │
│ int64 │
├───────┤
│    42 │
└───────┘
```

###### Polars {#docs:current:clients:python:overview::polars}

To directly query a Polars DataFrame, run:

```python
import duckdb
import polars as pl

polars_df = pl.DataFrame({"a": [42]})
duckdb.sql("SELECT * FROM polars_df")
```

```text
┌───────┐
│   a   │
│ int64 │
├───────┤
│    42 │
└───────┘
```

###### PyArrow {#docs:current:clients:python:overview::pyarrow}

To directly query a PyArrow table, run:

```python
import duckdb
import pyarrow as pa

arrow_table = pa.Table.from_pydict({"a": [42]})
duckdb.sql("SELECT * FROM arrow_table")
```

```text
┌───────┐
│   a   │
│ int64 │
├───────┤
│    42 │
└───────┘
```

#### Result Conversion {#docs:current:clients:python:overview::result-conversion}

DuckDB supports converting query results efficiently to a variety of formats. See the [result conversion page](#docs:current:clients:python:conversion) for more information.

```python
import duckdb

duckdb.sql("SELECT 42").fetchall()   # Python objects
duckdb.sql("SELECT 42").df()         # Pandas DataFrame
duckdb.sql("SELECT 42").pl()         # Polars DataFrame
duckdb.sql("SELECT 42").arrow()      # Arrow Table
duckdb.sql("SELECT 42").fetchnumpy() # NumPy Arrays
```

#### Writing Data to Disk {#docs:current:clients:python:overview::writing-data-to-disk}

DuckDB supports writing Relation objects directly to disk in a variety of formats. The [`COPY` statement](#docs:current:sql:statements:copy) can be used to write data to disk using SQL as an alternative.

```python
import duckdb

duckdb.sql("SELECT 42").write_parquet("out.parquet") # Write to a Parquet file
duckdb.sql("SELECT 42").write_csv("out.csv")         # Write to a CSV file
duckdb.sql("COPY (SELECT 42) TO 'out.parquet'")      # Copy to a Parquet file
```

#### Connection Options {#docs:current:clients:python:overview::connection-options}

Applications can open a new DuckDB connection via the `duckdb.connect()` method.

##### Using an In-Memory Database {#docs:current:clients:python:overview::using-an-in-memory-database}

When using DuckDB through `duckdb.sql()`, it operates on an **in-memory** database, i.e., no tables are persisted on disk.
Invoking the `duckdb.connect()` method without arguments returns a connection, which also uses an in-memory database:

```python
import duckdb

con = duckdb.connect()
con.sql("SELECT 42 AS x").show()
```

##### Persistent Storage {#docs:current:clients:python:overview::persistent-storage}

The `duckdb.connect(dbname)` creates a connection to a **persistent** database.
Any data written to that connection will be persisted, and can be reloaded by reconnecting to the same file, both from Python and from other DuckDB clients.

```python
import duckdb

# create a connection to a file called 'file.db'
con = duckdb.connect("file.db")
# create a table and load data into it
con.sql("CREATE TABLE test (i INTEGER)")
con.sql("INSERT INTO test VALUES (42)")
# query the table
con.table("test").show()
# explicitly close the connection
con.close()
# Note: connections also closed implicitly when they go out of scope
```

You can also use a context manager to ensure that the connection is closed:

```python
import duckdb

with duckdb.connect("file.db") as con:
    con.sql("CREATE TABLE test (i INTEGER)")
    con.sql("INSERT INTO test VALUES (42)")
    con.table("test").show()
    # the context manager closes the connection automatically
```

##### Configuration {#docs:current:clients:python:overview::configuration}

The `duckdb.connect()` accepts a `config` dictionary, where [configuration options](#docs:current:configuration:overview::configuration-reference) can be specified. For example:

```python
import duckdb

con = duckdb.connect(config = {'threads': 1})
```

To specify the [storage version](#docs:current:internals:storage), pass the `storage_compatibility_version` option:

```python
import duckdb

con = duckdb.connect(config = {'storage_compatibility_version': 'latest'})
```

##### Connection Object and Module {#docs:current:clients:python:overview::connection-object-and-module}

The connection object and the `duckdb` module can be used interchangeably – they support the same methods. The only difference is that when using the `duckdb` module a global in-memory database is used.

> If you are developing a package designed for others to use, and use DuckDB in the package, it is recommended that you create connection objects instead of using the methods on the `duckdb` module. That is because the `duckdb` module uses a shared global database – which can cause hard to debug issues if used from within multiple different packages.

##### Using Connections in Parallel Python Programs  {#docs:current:clients:python:overview::using-connections-in-parallel-python-programs-}

###### Thread Safety of `duckdb.sql()` and the Global Connection {#docs:current:clients:python:overview::thread-safety-of-duckdbsql-and-the-global-connection}

`duckdb.sql()` and `duckdb.connect(':default:')` use a shared global in-memory connection. This connection is not thread-safe, and running queries on it from multiple threads can cause issues. To run DuckDB in parallel, each thread must have its own connection:

```python
def good_use():
    con = duckdb.connect()
    # uses new connection
    con.sql("SELECT 1").fetchall()
```

Conversely, the following could cause concurrency issues because they rely on a global connection:

```python
def bad_use():
    con = duckdb.connect(':default:')
    # uses global connection
    return con.sql("SELECT 1").fetchall()
```

Or:

```python
def also_bad():
    return duckdb.sql("SELECT 1").fetchall()
    # uses global connection 
```

Avoid using `duckdb.sql()` or sharing a single connection across threads. 

###### About `cursor()`  {#docs:current:clients:python:overview::about-cursor-}

A [`DuckDBPyConnection.cursor()` method](#docs:current:clients:python:reference:index::duckdb.DuckDBPyConnection.cursor) creates another handle on the same connection. It does not open a new connection. Therefore, all cursors created from one connection cannot run queries at the same time.

##### Community Extensions {#docs:current:clients:python:overview::community-extensions}

To load [community extensions](#community_extensions:index), use the `repository="community"` argument with the `install_extension` method.

For example, install and load the `h3` community extension as follows:

```python
import duckdb

con = duckdb.connect()
con.install_extension("h3", repository="community")
con.load_extension("h3")
```

##### Unsigned Extensions {#docs:current:clients:python:overview::unsigned-extensions}

To load [unsigned extensions](#docs:current:extensions:overview::unsigned-extensions), use:

```python
con = duckdb.connect(config={"allow_unsigned_extensions": "true"})
```

> **Warning.** Only load unsigned extensions from sources you trust.
> Avoid loading unsigned extensions over HTTP.
> Consult the [Securing DuckDB page](#docs:current:operations_manual:securing_duckdb:securing_extensions) for guidelines on how set up DuckDB in a secure manner.

### Data Ingestion {#docs:current:clients:python:data_ingestion}

This page contains examples for data ingestion to Python using DuckDB. First, import the DuckDB package:

```python
import duckdb
```

Then, proceed with any of the following sections.

#### CSV Files {#docs:current:clients:python:data_ingestion::csv-files}

CSV files can be read using the `read_csv` function, called either from within Python or directly from within SQL. By default, the `read_csv` function attempts to auto-detect the CSV settings by sampling from the provided file.

Read from a file using fully auto-detected settings:

```python
duckdb.read_csv("example.csv")
```

Read multiple CSV files from a folder:

```python
duckdb.read_csv("folder/*.csv")
```

Specify options on how the CSV is formatted internally:

```python
duckdb.read_csv("example.csv", header = False, sep = ",")
```

Override types of the first two columns:

```python
duckdb.read_csv("example.csv", dtype = ["int", "varchar"])
```

Directly read a CSV file from within SQL:

```python
duckdb.sql("SELECT * FROM 'example.csv'")
```

Call `read_csv` from within SQL:

```python
duckdb.sql("SELECT * FROM read_csv('example.csv')")
```

See the [CSV Import](#docs:current:data:csv:overview) page for more information.

#### Parquet Files {#docs:current:clients:python:data_ingestion::parquet-files}

Parquet files can be read using the `read_parquet` function, called either from within Python or directly from within SQL.

Read from a single Parquet file:

```python
duckdb.read_parquet("example.parquet")
```

Read multiple Parquet files from a folder:

```python
duckdb.read_parquet("folder/*.parquet")
```

Read a Parquet file over [https](#docs:current:core_extensions:httpfs:overview):

```python
duckdb.read_parquet("https://some.url/some_file.parquet")
```

Read a list of Parquet files:

```python
duckdb.read_parquet(["file1.parquet", "file2.parquet", "file3.parquet"])
```

Directly read a Parquet file from within SQL:

```python
duckdb.sql("SELECT * FROM 'example.parquet'")
```

Call `read_parquet` from within SQL:

```python
duckdb.sql("SELECT * FROM read_parquet('example.parquet')")
```

See the [Parquet Loading](#docs:current:data:parquet:overview) page for more information.

#### JSON Files {#docs:current:clients:python:data_ingestion::json-files}

JSON files can be read using the `read_json` function, called either from within Python or directly from within SQL. By default, the `read_json` function will automatically detect if a file contains newline-delimited JSON or regular JSON, and will detect the schema of the objects stored within the JSON file.

Read from a single JSON file:

```python
duckdb.read_json("example.json")
```

Read multiple JSON files from a folder:

```python
duckdb.read_json("folder/*.json")
```

Directly read a JSON file from within SQL:

```python
duckdb.sql("SELECT * FROM 'example.json'")
```

Call `read_json` from within SQL:

```python
duckdb.sql("SELECT * FROM read_json_auto('example.json')")
```

#### Directly Accessing DataFrames and Arrow Objects {#docs:current:clients:python:data_ingestion::directly-accessing-dataframes-and-arrow-objects}

DuckDB is automatically able to query certain Python variables by referring to their variable name (as if it was a table).
These types include the following: Pandas DataFrame, Polars DataFrame, Polars LazyFrame, NumPy arrays, [relations](#docs:current:clients:python:relational_api) and Arrow objects.

Only variables that are visible to Python code at the location of the `sql()` or `execute()` call can be used in this manner.
Accessing these variables is made possible by [replacement scans](#docs:current:clients:c:replacement_scans). To disable replacement scans entirely, use:

```sql
SET python_enable_replacements = false;
```

DuckDB supports querying multiple types of Apache Arrow objects including [tables](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html), [datasets](https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Dataset.html), [RecordBatchReaders](https://arrow.apache.org/docs/python/generated/pyarrow.ipc.RecordBatchStreamReader.html) and [scanners](https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Scanner.html). See the Python [guides](#docs:current:python:overview) for more examples.

```python
import duckdb
import pandas as pd

test_df = pd.DataFrame.from_dict({"i": [1, 2, 3, 4], "j": ["one", "two", "three", "four"]})
print(duckdb.sql("SELECT * FROM test_df").fetchall())
```

```text
[(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]
```

DuckDB also supports “registering” a DataFrame or Arrow object as a virtual table, comparable to a SQL `VIEW`. This is useful when querying a DataFrame/Arrow object that is stored in another way (as a class variable, or a value in a dictionary). Below is a Pandas example:

If your Pandas DataFrame is stored in another location, here is an example of manually registering it:

```python
import duckdb
import pandas as pd

my_dictionary = {}
my_dictionary["test_df"] = pd.DataFrame.from_dict({"i": [1, 2, 3, 4], "j": ["one", "two", "three", "four"]})
duckdb.register("test_df_view", my_dictionary["test_df"])
print(duckdb.sql("SELECT * FROM test_df_view").fetchall())
```

```text
[(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]
```

You can also create a persistent table in DuckDB from the contents of the DataFrame (or the view):

```python
# create a new table from the contents of a DataFrame
con.execute("CREATE TABLE test_df_table AS SELECT * FROM test_df")
# insert into an existing table from the contents of a DataFrame
con.execute("INSERT INTO test_df_table SELECT * FROM test_df")
```

The precedence of objects with the same name is as follows:

- Objects explicitly registered via `register()`
- Native DuckDB tables and views
- [Replacement scans](#docs:current:clients:c:replacement_scans)

##### Pandas DataFrames – `object` Columns {#docs:current:clients:python:data_ingestion::pandas-dataframes--object-columns}

`pandas.DataFrame` columns of an `object` dtype require some special care, since this stores values of arbitrary type.
To convert these columns to DuckDB, we first go through an analyze phase before converting the values.
In this analyze phase a sample of all the rows of the column are analyzed to determine the target type.
This sample size is by default set to 1000.
If the type picked during the analyze step is incorrect, this will result in `Invalid Input Error: Failed to cast value`, in which case you will need to increase the sample size.
The sample size can be changed by setting the `pandas_analyze_sample` config option.

```python
# example setting the sample size to 100k
duckdb.execute("SET GLOBAL pandas_analyze_sample = 100_000")
```

### Conversion between DuckDB and Python {#docs:current:clients:python:conversion}

This page documents the rules for converting [Python objects to DuckDB](#::object-conversion-python-object-to-duckdb) and [DuckDB results to Python](#::result-conversion-duckdb-results-to-python).

#### Object Conversion: Python Object to DuckDB {#docs:current:clients:python:conversion::object-conversion-python-object-to-duckdb}

This is a mapping of Python object types to DuckDB [Logical Types](#docs:current:sql:data_types:overview):

* `None` → `NULL`
* `bool` → `BOOLEAN`
* `datetime.timedelta` → `INTERVAL`
* `str` → `VARCHAR`
* `bytearray` → `BLOB`
* `memoryview` → `BLOB`
* `decimal.Decimal` → `DECIMAL` / `DOUBLE`
* `uuid.UUID` → `UUID`

The rest of the conversion rules are as follows.

##### `int` {#docs:current:clients:python:conversion::int}

Since integers can be of arbitrary size in Python, there is not a one-to-one conversion possible for ints.
Instead we perform these casts in order until one succeeds:

* `BIGINT`
* `INTEGER`
* `UBIGINT`
* `UINTEGER`
* `DOUBLE`

When using the DuckDB Value class, it's possible to set a target type, which will influence the conversion.

##### `float` {#docs:current:clients:python:conversion::float}

These casts are tried in order until one succeeds:

* `DOUBLE`
* `FLOAT`

##### `datetime.datetime` {#docs:current:clients:python:conversion::datetimedatetime}

For `datetime` we will check `pandas.isnull` if it's available and return `NULL` if it returns `true`.
We check against `datetime.datetime.min` and `datetime.datetime.max` to convert to `-inf` and `+inf` respectively.

If the `datetime` has tzinfo, we will use `TIMESTAMPTZ`, otherwise it becomes `TIMESTAMP`.

##### `datetime.time` {#docs:current:clients:python:conversion::datetimetime}

If the `time` has tzinfo, we will use `TIMETZ`, otherwise it becomes `TIME`.

##### `datetime.date` {#docs:current:clients:python:conversion::datetimedate}

`date` converts to the `DATE` type.
We check against `datetime.date.min` and `datetime.date.max` to convert to `-inf` and `+inf` respectively.

##### `bytes` {#docs:current:clients:python:conversion::bytes}

`bytes` converts to `BLOB` by default, when it's used to construct a Value object of type `BITSTRING`, it maps to `BITSTRING` instead.

##### `list` {#docs:current:clients:python:conversion::list}

`list` becomes a `LIST` type of the “most permissive” type of its children, for example:

```python
my_list_value = [
    12345,
    "test"
]
```

Will become `VARCHAR[]` because 12345 can convert to `VARCHAR` but `test` can not convert to `INTEGER`.

```sql
[12345, test]
```

##### `dict` {#docs:current:clients:python:conversion::dict}

The `dict` object can convert to either `STRUCT(...)` or `MAP(..., ...)` depending on its structure.
If the dict has a structure similar to:

```python
import duckdb

my_map_dict = {
    "key": [
        1, 2, 3
    ],
    "value": [
        "one", "two", "three"
    ]
}

duckdb.values(my_map_dict)
```

Then we'll convert it to a `MAP` of key-value pairs of the two lists zipped together.
The example above becomes a `MAP(INTEGER, VARCHAR)`:

```text
┌─────────────────────────┐
│ {1=one, 2=two, 3=three} │
│  map(integer, varchar)  │
├─────────────────────────┤
│ {1=one, 2=two, 3=three} │
└─────────────────────────┘
```

If the dict is returned by a [function](#docs:current:clients:python:function), 
the function will return a `MAP`, therefore the function `return_type` has to be specified. Providing
a return type which cannot convert to `MAP` will raise an error:
```python
import duckdb
duckdb_conn = duckdb.connect()

def get_map() -> dict[str,list[str]|list[int]]:
    return {
        "key": [
            1, 2, 3
        ],
        "value": [
            "one", "two", "three"
        ]
    }

duckdb_conn.create_function("get_map", get_map, return_type=dict[int, str])

duckdb_conn.sql("select get_map()").show()

duckdb_conn.create_function("get_map_error", get_map)

duckdb_conn.sql("select get_map_error()").show()
```
 ```text
┌─────────────────────────┐
│        get_map()        │
│  map(bigint, varchar)   │
├─────────────────────────┤
│ {1=one, 2=two, 3=three} │
└─────────────────────────┘

ConversionException: Conversion Error: Type VARCHAR can't be cast as UNION(u1 VARCHAR[], u2 BIGINT[]). VARCHAR can't be implicitly cast to any of the union member types: VARCHAR[], BIGINT[]
```

> The names of the fields matter and the two lists need to have the same size.

Otherwise we'll try to convert it to a `STRUCT`.

```python
import duckdb

my_struct_dict = {
    1: "one",
    "2": 2,
    "three": [1, 2, 3],
    False: True
}

duckdb.values(my_struct_dict)
```
Becomes:

```text
┌────────────────────────────────────────────────────────────────────┐
│      {'1': 'one', '2': 2, 'three': [1, 2, 3], 'False': true}       │
│ struct("1" varchar, "2" integer, three integer[], "false" boolean) │
├────────────────────────────────────────────────────────────────────┤
│ {'1': one, '2': 2, 'three': [1, 2, 3], 'False': true}              │
└────────────────────────────────────────────────────────────────────┘
```

If the dict is returned by a [function](#docs:current:clients:python:function), 
the function will return a `MAP`, due to [automatic conversion](#docs:current:clients:python:types::dictkey_type-value_type).
To return a `STRUCT`, the `return_type` has to be provided:
```python
import duckdb
from duckdb.sqltypes import BOOLEAN, INTEGER, VARCHAR
from duckdb import list_type, struct_type

duckdb_conn = duckdb.connect()

my_struct_dict = {
    1: "one",
    "2": 2,
    "three": [1, 2, 3],
    False: True
}

def get_struct() -> dict[str|int|bool,str|int|list[int]|bool]:
    return my_struct_dict

duckdb_conn.create_function("get_struct_as_map", get_struct)

duckdb_conn.sql("select get_struct_as_map()").show()

duckdb_conn.create_function("get_struct", get_struct, return_type=struct_type({
    1: VARCHAR,
    "2": INTEGER,
    "three": list_type(INTEGER),
    False: BOOLEAN
}))

duckdb_conn.sql("select get_struct()").show()
```

```text
┌──────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                         get_struct_as_map()                                          │
│ map(union(u1 varchar, u2 bigint, u3 boolean), union(u1 varchar, u2 bigint, u3 bigint[], u4 boolean)) │
├──────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ {1=one, 2=2, three=[1, 2, 3], false=true}                                                            │
└──────────────────────────────────────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────┐
│                            get_struct()                            │
│ struct("1" varchar, "2" integer, three integer[], "false" boolean) │
├────────────────────────────────────────────────────────────────────┤
│ {'1': one, '2': 2, 'three': [1, 2, 3], 'False': true}              │
└────────────────────────────────────────────────────────────────────┘
```
> Every `key` of the dictionary is converted to string.

##### `tuple` {#docs:current:clients:python:conversion::tuple}

`tuple` converts to `LIST` by default, when it's used to construct a Value object of type `STRUCT` it will convert to `STRUCT` instead.

##### `numpy.ndarray` and `numpy.datetime64` {#docs:current:clients:python:conversion::numpyndarray-and-numpydatetime64}

`ndarray` and `datetime64` are converted by calling `tolist()` and converting the result of that.

#### Result Conversion: DuckDB Results to Python {#docs:current:clients:python:conversion::result-conversion-duckdb-results-to-python}

DuckDB's Python client provides multiple additional methods that can be used to efficiently retrieve data.

##### NumPy {#docs:current:clients:python:conversion::numpy}

* `fetchnumpy()` fetches the data as a dictionary of NumPy arrays

##### Pandas {#docs:current:clients:python:conversion::pandas}

* `df()` fetches the data as a Pandas DataFrame
* `fetchdf()` is an alias of `df()`
* `fetch_df()` is an alias of `df()`
* `fetch_df_chunk(vector_multiple)` fetches a portion of the results into a DataFrame. The number of rows returned in each chunk is the vector size (2048 by default) * vector_multiple (1 by default).

##### Apache Arrow {#docs:current:clients:python:conversion::apache-arrow}

* `to_arrow_table()` fetches the data as an [Arrow table](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html)
* `to_arrow_reader(chunk_size)` returns an [Arrow record batch reader](https://arrow.apache.org/docs/python/generated/pyarrow.ipc.RecordBatchStreamReader.html) with `chunk_size` rows per batch
* `arrow()` returns an [Arrow record batch reader](https://arrow.apache.org/docs/python/generated/pyarrow.ipc.RecordBatchStreamReader.html). We recommend using `to_arrow_reader()` instead.

> **Deprecated.** `fetch_arrow_table()` and `fetch_record_batch()` are deprecated. Use `to_arrow_table()` and `to_arrow_reader()` instead.

##### Polars {#docs:current:clients:python:conversion::polars}

* `pl()` fetches the data as a Polars DataFrame

##### Examples {#docs:current:clients:python:conversion::examples}

Below are some examples using this functionality. See the [Python guides](#docs:current:python:overview) for more examples.

Fetch as Pandas DataFrame:

```python
df = con.execute("SELECT * FROM items").fetchdf()
print(df)
```

```text
       item   value  count
0     jeans    20.0      1
1    hammer    42.2      2
2    laptop  2000.0      1
3  chainsaw   500.0     10
4    iphone   300.0      2
```

Fetch as dictionary of NumPy arrays:

```python
arr = con.execute("SELECT * FROM items").fetchnumpy()
print(arr)
```

```text
{'item': masked_array(data=['jeans', 'hammer', 'laptop', 'chainsaw', 'iphone'],
             mask=[False, False, False, False, False],
       fill_value='?',
            dtype=object), 'value': masked_array(data=[20.0, 42.2, 2000.0, 500.0, 300.0],
             mask=[False, False, False, False, False],
       fill_value=1e+20), 'count': masked_array(data=[1, 2, 1, 10, 2],
             mask=[False, False, False, False, False],
       fill_value=999999,
            dtype=int32)}
```

Fetch as an Arrow table. Converting to Pandas afterwards just for pretty printing:

```python
tbl = con.execute("SELECT * FROM items").to_arrow_table()
print(tbl.to_pandas())
```

```text
       item    value  count
0     jeans    20.00      1
1    hammer    42.20      2
2    laptop  2000.00      1
3  chainsaw   500.00     10
4    iphone   300.00      2
```

### Python DB API {#docs:current:clients:python:dbapi}

The standard DuckDB Python API provides a SQL interface compliant with the [DB-API 2.0 specification described by PEP 249](https://www.python.org/dev/peps/pep-0249/) similar to the [SQLite Python API](https://docs.python.org/3.7/library/sqlite3.html).

#### Connection {#docs:current:clients:python:dbapi::connection}

To use the module, you must first create a `DuckDBPyConnection` object that represents a connection to a database.
This is done through the [`duckdb.connect`](#docs:current:clients:python:reference:index::duckdb.connect) method.

The 'config' keyword argument can be used to provide a `dict` that contains key->value pairs referencing [settings](#docs:current:configuration:overview::configuration-reference) understood by DuckDB.

##### In-Memory Connection {#docs:current:clients:python:dbapi::in-memory-connection}

The special value `:memory:` can be used to create an **in-memory database**. Note that for an in-memory database no data is persisted to disk (i.e., all data is lost when you exit the Python process).

###### Named In-memory Connections {#docs:current:clients:python:dbapi::named-in-memory-connections}

The special value `:memory:` can also be postfixed with a name, for example: `:memory:conn3`.
When a name is provided, subsequent `duckdb.connect` calls will create a new connection to the same database, sharing the catalogs (views, tables, macros etc.).

Using `:memory:` without a name will always create a new and separate database instance.

##### Default Connection {#docs:current:clients:python:dbapi::default-connection}

By default we create an (unnamed) **in-memory-database** that lives inside the `duckdb` module.
Every method of `DuckDBPyConnection` is also available on the `duckdb` module, this connection is what's used by these methods.

The special value `:default:` can be used to get this default connection.

##### File-Based Connection {#docs:current:clients:python:dbapi::file-based-connection}

If the `database` is a file path, a connection to a persistent database is established.
If the file does not exist the file will be created (the extension of the file is irrelevant and can be `.db`, `.duckdb` or anything else).

###### `read_only` Connections {#docs:current:clients:python:dbapi::read_only-connections}

If you would like to connect in read-only mode, you can set the `read_only` flag to `True`. If the file does not exist, it is **not** created when connecting in read-only mode.
Read-only mode is required if multiple Python processes want to access the same database file at the same time.

```python
import duckdb

duckdb.execute("CREATE TABLE tbl AS SELECT 42 a")
con = duckdb.connect(":default:")
con.sql("SELECT * FROM tbl")
# or
duckdb.default_connection().sql("SELECT * FROM tbl")
```

```text
┌───────┐
│   a   │
│ int32 │
├───────┤
│    42 │
└───────┘
```

```python
import duckdb

# to start an in-memory database
con = duckdb.connect(database = ":memory:")
# to use a database file (not shared between processes)
con = duckdb.connect(database = "my-db.duckdb", read_only = False)
# to use a database file (shared between processes)
con = duckdb.connect(database = "my-db.duckdb", read_only = True)
# to explicitly get the default connection
con = duckdb.connect(database = ":default:")
```

If you want to create a second connection to an existing database, you can use the `cursor()` method. This might be useful for example to allow parallel threads running queries independently. A single connection is thread-safe but is locked for the duration of the queries, effectively serializing database access in this case.

Connections are closed implicitly when they go out of scope or if they are explicitly closed using `close()`. Once the last connection to a database instance is closed, the database instance is closed as well.

#### Querying {#docs:current:clients:python:dbapi::querying}

SQL queries can be sent to DuckDB using the `execute()` method of connections. Once a query has been executed, results can be retrieved using the `fetchone` and `fetchall` methods on the connection. `fetchall` will retrieve all results and complete the transaction. `fetchone` will retrieve a single row of results each time that it is invoked until no more results are available. The transaction will only close once `fetchone` is called and there are no more results remaining (the return value will be `None`). As an example, in the case of a query only returning a single row, `fetchone` should be called once to retrieve the results and a second time to close the transaction. Below are some short examples:

```python
# create a table
con.execute("CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)")
# insert two items into the table
con.execute("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)")

# retrieve the items again
con.execute("SELECT * FROM items")
print(con.fetchall())
# [('jeans', Decimal('20.00'), 1), ('hammer', Decimal('42.20'), 2)]

# retrieve the items one at a time
con.execute("SELECT * FROM items")
print(con.fetchone())
# ('jeans', Decimal('20.00'), 1)
print(con.fetchone())
# ('hammer', Decimal('42.20'), 2)
print(con.fetchone()) # This closes the transaction. Any subsequent calls to .fetchone will return None
# None
```

The `description` property of the connection object contains the column names as per the standard.

##### Prepared Statements {#docs:current:clients:python:dbapi::prepared-statements}

DuckDB also supports [prepared statements](#docs:current:sql:query_syntax:prepared_statements) in the API with the `execute` and `executemany` methods. The values may be passed as an additional parameter after a query that contains `?` or `$1` (dollar symbol and a number) placeholders. Using the `?` notation adds the values in the same sequence as passed within the Python parameter. Using the `$` notation allows for values to be reused within the SQL statement based on the number and index of the value found within the Python parameter. Values are converted according to the [conversion rules](#docs:current:clients:python:conversion::object-conversion-python-object-to-duckdb).

Here are some examples. First, insert a row using a [prepared statement](#docs:current:sql:query_syntax:prepared_statements):

```python
con.execute("INSERT INTO items VALUES (?, ?, ?)", ["laptop", 2000, 1])
```

Second, insert several rows using a [prepared statement](#docs:current:sql:query_syntax:prepared_statements):

```python
con.executemany("INSERT INTO items VALUES (?, ?, ?)", [["chainsaw", 500, 10], ["iphone", 300, 2]] )
```

Query the database using a [prepared statement](#docs:current:sql:query_syntax:prepared_statements):

```python
con.execute("SELECT item FROM items WHERE value > ?", [400])
print(con.fetchall())
```

```text
[('laptop',), ('chainsaw',)]
```

Query using the `$` notation for a [prepared statement](#docs:current:sql:query_syntax:prepared_statements) and reused values:

```python
con.execute("SELECT $1, $1, $2", ["duck", "goose"])
print(con.fetchall())
```

```text
[('duck', 'duck', 'goose')]
```

> **Warning.** Do *not* use `executemany` to insert large amounts of data into DuckDB. See the [data ingestion page](#docs:current:clients:python:data_ingestion) for better options.

#### Named Parameters {#docs:current:clients:python:dbapi::named-parameters}

Besides the standard unnamed parameters, like `$1`, `$2` etc., it's also possible to supply named parameters, like `$my_parameter`.
When using named parameters, you have to provide a dictionary mapping of `str` to value in the `parameters` argument.
An example use is the following:

```python
import duckdb

res = duckdb.execute("""
    SELECT
        $my_param,
        $other_param,
        $also_param
    """,
    {
        "my_param": 5,
        "other_param": "DuckDB",
        "also_param": [42]
    }
).fetchall()
print(res)
```

```text
[(5, 'DuckDB', [42])]
```

### Relational API {#docs:current:clients:python:relational_api}







The Relational API is an alternative API that can be used to incrementally construct queries. 
The API is centered around `DuckDBPyRelation` nodes. The relations can be seen as symbolic representations of SQL queries. 

#### Lazy Evaluation {#docs:current:clients:python:relational_api::lazy-evaluation}

The relations do not hold any data – and nothing is executed – until [a method that triggers execution](#::output) is called.

For example, we create a relation, which loads 1 billion rows:

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("from range(1_000_000_000)")
```
At the moment of execution, `rel` does not hold any data and no data is retrieved from the database.

By calling `rel.show()` or simply printing `rel` on the terminal, the first 10K rows are fetched.
If there are more than 10K rows, the output window will show >9999 rows (as the amount of rows in the relation is unknown).

By calling an [output](#::output) method, the data is retrieved and stored in the specified format:

```python
rel.to_table("example_rel")

# 100% ▕████████████████████████████████████████████████████████████▏ 
```



#### Relation Creation  {#docs:current:clients:python:relational_api::relation-creation-}

This section contains the details on how a relation is created.         The methods are [lazy evaluated](#::lazy-evaluation).

| Name | Description |
|:--|:-------|
| [`from_arrow`](#::from_arrow) | Create a relation object from an Arrow object |
| [`from_csv_auto`](#::from_csv_auto) | Create a relation object from the CSV file in 'name' |
| [`from_df`](#::from_df) | Create a relation object from the DataFrame in df |
| [`from_parquet`](#::from_parquet) | Create a relation object from the Parquet files |
| [`from_query`](#::from_query) | Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is. |
| [`query`](#::query) | Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is. |
| [`read_csv`](#::read_csv) | Create a relation object from the CSV file in 'name' |
| [`read_json`](#::read_json) | Create a relation object from the JSON file in 'name' |
| [`read_parquet`](#::read_parquet) | Create a relation object from the Parquet files |
| [`sql`](#::sql) | Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is. |
| [`table`](#::table) | Create a relation object for the named table |
| [`table_function`](#::table_function) | Create a relation object from the named table function with given parameters |
| [`values`](#::values) | Create a relation object from the passed values |
| [`view`](#::view) | Create a relation object for the named view |

###### `from_arrow` {#docs:current:clients:python:relational_api::from_arrow}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
from_arrow(self: _duckdb.DuckDBPyConnection, arrow_object: object) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Create a relation object from an Arrow object

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **arrow_object** : pyarrow.Table, pyarrow.RecordBatch
                            
	Arrow object to create a relation from

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb
import pyarrow as pa

ids = pa.array([1], type=pa.int8())
texts = pa.array(['a'], type=pa.string())
example_table = pa.table([ids, texts], names=["id", "text"])

duckdb_conn = duckdb.connect()

rel = duckdb_conn.from_arrow(example_table)

rel.show()
```

####### Result {#docs:current:clients:python:relational_api::result}

```text
┌──────┬─────────┐
│  id  │  text   │
│ int8 │ varchar │
├──────┼─────────┤
│    1 │ a       │
└──────┴─────────┘
```

----

###### `from_csv_auto` {#docs:current:clients:python:relational_api::from_csv_auto}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
from_csv_auto(self: _duckdb.DuckDBPyConnection, path_or_buffer: object, **kwargs) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Create a relation object from the CSV file in 'name'

**Aliases**: [`read_csv`](#::read_csv)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **path_or_buffer** : Union[str, StringIO, TextIOBase]
                            
	Path to the CSV file or buffer to read from.
- **header** : Optional[bool], Optional[int]
                            
	Row number(s) to use as the column names, or None if no header.
- **compression** : Optional[str]
                            
	Compression type (e.g., 'gzip', 'bz2').
- **sep** : Optional[str]
                            
	Delimiter to use; defaults to comma.
- **delimiter** : Optional[str]
                            
	Alternative delimiter to use.
- **dtype** : Optional[Dict[str, str]], Optional[List[str]]
                            
	Data types for columns.
- **na_values** : Optional[str], Optional[List[str]]
                            
	Additional strings to recognize as NA/NaN.
- **skiprows** : Optional[int]
                            
	Number of rows to skip at the start.
- **quotechar** : Optional[str]
                            
	Character used to quote fields.
- **escapechar** : Optional[str]
                            
	Character used to escape delimiter or quote characters.
- **encoding** : Optional[str]
                            
	Encoding to use for UTF when reading/writing.
- **parallel** : Optional[bool]
                            
	Enable parallel reading.
- **date_format** : Optional[str]
                            
	Format to parse dates.
- **timestamp_format** : Optional[str]
                            
	Format to parse timestamps.
- **sample_size** : Optional[int]
                            
	Number of rows to sample for schema inference.
- **all_varchar** : Optional[bool]
                            
	Treat all columns as VARCHAR.
- **normalize_names** : Optional[bool]
                            
	Normalize column names to lowercase.
- **null_padding** : Optional[bool]
                            
	Enable null padding for rows with missing columns.
- **names** : Optional[List[str]]
                            
	List of column names to use.
- **lineterminator** : Optional[str]
                            
	Character to break lines on.
- **columns** : Optional[Dict[str, str]]
                            
	Column mapping for schema.
- **auto_type_candidates** : Optional[List[str]]
                            
	List of columns for automatic type inference.
- **max_line_size** : Optional[int]
                            
	Maximum line size in bytes.
- **ignore_errors** : Optional[bool]
                            
	Ignore parsing errors.
- **store_rejects** : Optional[bool]
                            
	Store rejected rows.
- **rejects_table** : Optional[str]
                            
	Table name to store rejected rows.
- **rejects_scan** : Optional[str]
                            
	Scan to use for rejects.
- **rejects_limit** : Optional[int]
                            
	Limit number of rejects stored.
- **force_not_null** : Optional[List[str]]
                            
	List of columns to force as NOT NULL.
- **buffer_size** : Optional[int]
                            
	Buffer size in bytes.
- **decimal** : Optional[str]
                            
	Character to recognize as decimal point.
- **allow_quoted_nulls** : Optional[bool]
                            
	Allow quoted NULL values.
- **filename** : Optional[bool], Optional[str]
                            
	Add filename column or specify filename.
- **hive_partitioning** : Optional[bool]
                            
	Enable Hive-style partitioning.
- **union_by_name** : Optional[bool]
                            
	Union files by column name instead of position.
- **hive_types** : Optional[Dict[str, str]]
                            
	Hive types for columns.
- **hive_types_autocast** : Optional[bool]
                            
	Automatically cast Hive types.
- **connection** : DuckDBPyConnection
                            
	DuckDB connection to use.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import csv
import duckdb

duckdb_conn = duckdb.connect()

with open('code_example.csv', 'w', newline='') as csvfile:
    fieldnames = ['id', 'text']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerow({'id': '1', 'text': 'a'})

rel = duckdb_conn.from_csv_auto("code_example.csv")

rel.show()
```

####### Result {#docs:current:clients:python:relational_api::result}

```text
┌───────┬─────────┐
│  id   │  text   │
│ int64 │ varchar │
├───────┼─────────┤
│     1 │ a       │
└───────┴─────────┘
```

----

###### `from_df` {#docs:current:clients:python:relational_api::from_df}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
from_df(self: _duckdb.DuckDBPyConnection, df: pandas.DataFrame) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Create a relation object from the DataFrame in df

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **df** : pandas.DataFrame
                            
	A pandas DataFrame to be converted into a DuckDB relation.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb
import pandas as pd

df = pd.DataFrame(data = {'id': [1], "text":["a"]})

duckdb_conn = duckdb.connect()

rel = duckdb_conn.from_df(df)

rel.show()
```

####### Result {#docs:current:clients:python:relational_api::result}

```text
┌───────┬─────────┐
│  id   │  text   │
│ int64 │ varchar │
├───────┼─────────┤
│     1 │ a       │
└───────┴─────────┘
```

----

###### `from_parquet` {#docs:current:clients:python:relational_api::from_parquet}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
from_parquet(*args, **kwargs)
Overloaded function.

1. from_parquet(self: _duckdb.DuckDBPyConnection, file_glob: str, binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> _duckdb.DuckDBPyRelation

Create a relation object from the Parquet files in file_glob

2. from_parquet(self: _duckdb.DuckDBPyConnection, file_globs: collections.abc.Sequence[str], binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> _duckdb.DuckDBPyRelation

Create a relation object from the Parquet files in file_globs
```

####### Description {#docs:current:clients:python:relational_api::description}

Create a relation object from the Parquet files

**Aliases**: [`read_parquet`](#::read_parquet)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **file_glob** : str
                            
	File path or glob pattern pointing to Parquet files to be read.
- **binary_as_string** : bool, default: False
                            
	Interpret binary columns as strings instead of blobs.
- **file_row_number** : bool, default: False
                            
	Add a column containing the row number within each file.
- **filename** : bool, default: False
                            
	Add a column containing the name of the file each row came from.
- **hive_partitioning** : bool, default: False
                            
	Enable automatic detection of Hive-style partitions in file paths.
- **union_by_name** : bool, default: False
                            
	Union Parquet files by matching column names instead of positions.
- **compression** : object
                            
	Optional compression codec to use when reading the Parquet files.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb
import pyarrow as pa
import pyarrow.parquet as pq

ids = pa.array([1], type=pa.int8())
texts = pa.array(['a'], type=pa.string())
example_table = pa.table([ids, texts], names=["id", "text"])

pq.write_table(example_table, "code_example.parquet")

duckdb_conn = duckdb.connect()

rel = duckdb_conn.from_parquet("code_example.parquet")

rel.show()
```

####### Result {#docs:current:clients:python:relational_api::result}

```text
┌──────┬─────────┐
│  id  │  text   │
│ int8 │ varchar │
├──────┼─────────┤
│    1 │ a       │
└──────┴─────────┘
```

----

###### `from_query` {#docs:current:clients:python:relational_api::from_query}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
from_query(self: _duckdb.DuckDBPyConnection, query: object, *, alias: str = '', params: object = None) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.

> **Warning.** Passing `params` to this method is [discouraged](#docs:current:clients:python:known_issues::parameterized-queries-in-relational-api) due to significant performance overhead. Use [`execute()`](#docs:current:clients:python:dbapi::prepared-statements) for parameterized queries instead.

**Aliases**: [`query`](#::query), [`sql`](#::sql)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **query** : object

	The SQL query or subquery to be executed and converted into a relation.
- **alias** : str, default: ''

	Optional alias name to assign to the resulting relation.
- **params** : object

	Optional query parameters. **Discouraged** due to [significant performance overhead](#docs:current:clients:python:known_issues::parameterized-queries-in-relational-api). Use [`execute()`](#docs:current:clients:python:dbapi::prepared-statements) for parameterized queries instead.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.from_query("from range(1,2) tbl(id)")

rel.show()
```

####### Result {#docs:current:clients:python:relational_api::result}

```text
┌───────┐
│  id   │
│ int64 │
├───────┤
│     1 │
└───────┘
```

----

###### `query` {#docs:current:clients:python:relational_api::query}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
query(self: _duckdb.DuckDBPyConnection, query: object, *, alias: str = '', params: object = None) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.

> **Warning.** Passing `params` to this method is [discouraged](#docs:current:clients:python:known_issues::parameterized-queries-in-relational-api) due to significant performance overhead. Use [`execute()`](#docs:current:clients:python:dbapi::prepared-statements) for parameterized queries instead.

**Aliases**: [`from_query`](#::from_query), [`sql`](#::sql)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **query** : object

	The SQL query or subquery to be executed and converted into a relation.
- **alias** : str, default: ''

	Optional alias name to assign to the resulting relation.
- **params** : object

	Optional query parameters. **Discouraged** due to [significant performance overhead](#docs:current:clients:python:known_issues::parameterized-queries-in-relational-api). Use [`execute()`](#docs:current:clients:python:dbapi::prepared-statements) for parameterized queries instead.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.query("from range(1,2) tbl(id)")

rel.show()
```

####### Result {#docs:current:clients:python:relational_api::result}

```text
┌───────┐
│  id   │
│ int64 │
├───────┤
│     1 │
└───────┘
```

----

###### `read_csv` {#docs:current:clients:python:relational_api::read_csv}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
read_csv(self: _duckdb.DuckDBPyConnection, path_or_buffer: object, **kwargs) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Create a relation object from the CSV file in 'name'

**Aliases**: [`from_csv_auto`](#::from_csv_auto)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **path_or_buffer** : Union[str, StringIO, TextIOBase]
                            
	Path to the CSV file or buffer to read from.
- **header** : Optional[bool], Optional[int]
                            
	Row number(s) to use as the column names, or None if no header.
- **compression** : Optional[str]
                            
	Compression type (e.g., 'gzip', 'bz2').
- **sep** : Optional[str]
                            
	Delimiter to use; defaults to comma.
- **delimiter** : Optional[str]
                            
	Alternative delimiter to use.
- **dtype** : Optional[Dict[str, str]], Optional[List[str]]
                            
	Data types for columns.
- **na_values** : Optional[str], Optional[List[str]]
                            
	Additional strings to recognize as NA/NaN.
- **skiprows** : Optional[int]
                            
	Number of rows to skip at the start.
- **quotechar** : Optional[str]
                            
	Character used to quote fields.
- **escapechar** : Optional[str]
                            
	Character used to escape delimiter or quote characters.
- **encoding** : Optional[str]
                            
	Encoding to use for UTF when reading/writing.
- **parallel** : Optional[bool]
                            
	Enable parallel reading.
- **date_format** : Optional[str]
                            
	Format to parse dates.
- **timestamp_format** : Optional[str]
                            
	Format to parse timestamps.
- **sample_size** : Optional[int]
                            
	Number of rows to sample for schema inference.
- **all_varchar** : Optional[bool]
                            
	Treat all columns as VARCHAR.
- **normalize_names** : Optional[bool]
                            
	Normalize column names to lowercase.
- **null_padding** : Optional[bool]
                            
	Enable null padding for rows with missing columns.
- **names** : Optional[List[str]]
                            
	List of column names to use.
- **lineterminator** : Optional[str]
                            
	Character to break lines on.
- **columns** : Optional[Dict[str, str]]
                            
	Column mapping for schema.
- **auto_type_candidates** : Optional[List[str]]
                            
	List of columns for automatic type inference.
- **max_line_size** : Optional[int]
                            
	Maximum line size in bytes.
- **ignore_errors** : Optional[bool]
                            
	Ignore parsing errors.
- **store_rejects** : Optional[bool]
                            
	Store rejected rows.
- **rejects_table** : Optional[str]
                            
	Table name to store rejected rows.
- **rejects_scan** : Optional[str]
                            
	Scan to use for rejects.
- **rejects_limit** : Optional[int]
                            
	Limit number of rejects stored.
- **force_not_null** : Optional[List[str]]
                            
	List of columns to force as NOT NULL.
- **buffer_size** : Optional[int]
                            
	Buffer size in bytes.
- **decimal** : Optional[str]
                            
	Character to recognize as decimal point.
- **allow_quoted_nulls** : Optional[bool]
                            
	Allow quoted NULL values.
- **filename** : Optional[bool], Optional[str]
                            
	Add filename column or specify filename.
- **hive_partitioning** : Optional[bool]
                            
	Enable Hive-style partitioning.
- **union_by_name** : Optional[bool]
                            
	Union files by column name instead of position.
- **hive_types** : Optional[Dict[str, str]]
                            
	Hive types for columns.
- **hive_types_autocast** : Optional[bool]
                            
	Automatically cast Hive types.
- **connection** : DuckDBPyConnection
                            
	DuckDB connection to use.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import csv
import duckdb

duckdb_conn = duckdb.connect()

with open('code_example.csv', 'w', newline='') as csvfile:
    fieldnames = ['id', 'text']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerow({'id': '1', 'text': 'a'})

rel = duckdb_conn.read_csv("code_example.csv")

rel.show()
```

####### Result {#docs:current:clients:python:relational_api::result}

```text
┌───────┬─────────┐
│  id   │  text   │
│ int64 │ varchar │
├───────┼─────────┤
│     1 │ a       │
└───────┴─────────┘
```

----

###### `read_json` {#docs:current:clients:python:relational_api::read_json}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
read_json(self: _duckdb.DuckDBPyConnection, path_or_buffer: object, *, columns: typing.Optional[object] = None, sample_size: typing.Optional[object] = None, maximum_depth: typing.Optional[object] = None, records: typing.Optional[str] = None, format: typing.Optional[str] = None, date_format: typing.Optional[object] = None, timestamp_format: typing.Optional[object] = None, compression: typing.Optional[object] = None, maximum_object_size: typing.Optional[object] = None, ignore_errors: typing.Optional[object] = None, convert_strings_to_integers: typing.Optional[object] = None, field_appearance_threshold: typing.Optional[object] = None, map_inference_threshold: typing.Optional[object] = None, maximum_sample_files: typing.Optional[object] = None, filename: typing.Optional[object] = None, hive_partitioning: typing.Optional[object] = None, union_by_name: typing.Optional[object] = None, hive_types: typing.Optional[object] = None, hive_types_autocast: typing.Optional[object] = None) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Create a relation object from the JSON file in 'name'

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **path_or_buffer** : object
                            
	File path or file-like object containing JSON data to be read.
- **columns** : object
                            
	Optional list of column names to project from the JSON data.
- **sample_size** : object
                            
	Number of rows to sample for inferring JSON schema.
- **maximum_depth** : object
                            
	Maximum depth to which JSON objects should be parsed.
- **records** : str
                            
	Format string specifying whether JSON is in records mode.
- **format** : str
                            
	Format of the JSON data (e.g., 'auto', 'newline_delimited').
- **date_format** : object
                            
	Format string for parsing date fields.
- **timestamp_format** : object
                            
	Format string for parsing timestamp fields.
- **compression** : object
                            
	Compression codec used on the JSON data (e.g., 'gzip').
- **maximum_object_size** : object
                            
	Maximum size in bytes for individual JSON objects.
- **ignore_errors** : object
                            
	If True, skip over JSON records with parsing errors.
- **convert_strings_to_integers** : object
                            
	If True, attempt to convert strings to integers where appropriate.
- **field_appearance_threshold** : object
                            
	Threshold for inferring optional fields in nested JSON.
- **map_inference_threshold** : object
                            
	Threshold for inferring maps from JSON object patterns.
- **maximum_sample_files** : object
                            
	Maximum number of files to sample for schema inference.
- **filename** : object
                            
	If True, include a column with the source filename for each row.
- **hive_partitioning** : object
                            
	If True, enable Hive partitioning based on directory structure.
- **union_by_name** : object
                            
	If True, align JSON columns by name instead of position.
- **hive_types** : object
                            
	If True, use Hive types from directory structure for schema.
- **hive_types_autocast** : object
                            
	If True, automatically cast data types to match Hive types.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb
import json

with open("code_example.json", mode="w") as f:
    json.dump([{'id': 1, "text":"a"}], f)
    
duckdb_conn = duckdb.connect()

rel = duckdb_conn.read_json("code_example.json")

rel.show()
```

####### Result {#docs:current:clients:python:relational_api::result}

```text
┌───────┬─────────┐
│  id   │  text   │
│ int64 │ varchar │
├───────┼─────────┤
│     1 │ a       │
└───────┴─────────┘
```

----

###### `read_parquet` {#docs:current:clients:python:relational_api::read_parquet}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
read_parquet(*args, **kwargs)
Overloaded function.

1. read_parquet(self: _duckdb.DuckDBPyConnection, file_glob: str, binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> _duckdb.DuckDBPyRelation

Create a relation object from the Parquet files in file_glob

2. read_parquet(self: _duckdb.DuckDBPyConnection, file_globs: collections.abc.Sequence[str], binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> _duckdb.DuckDBPyRelation

Create a relation object from the Parquet files in file_globs
```

####### Description {#docs:current:clients:python:relational_api::description}

Create a relation object from the Parquet files

**Aliases**: [`from_parquet`](#::from_parquet)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **file_glob** : str
                            
	File path or glob pattern pointing to Parquet files to be read.
- **binary_as_string** : bool, default: False
                            
	Interpret binary columns as strings instead of blobs.
- **file_row_number** : bool, default: False
                            
	Add a column containing the row number within each file.
- **filename** : bool, default: False
                            
	Add a column containing the name of the file each row came from.
- **hive_partitioning** : bool, default: False
                            
	Enable automatic detection of Hive-style partitions in file paths.
- **union_by_name** : bool, default: False
                            
	Union Parquet files by matching column names instead of positions.
- **compression** : object
                            
	Optional compression codec to use when reading the Parquet files.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb
import pyarrow as pa
import pyarrow.parquet as pq

ids = pa.array([1], type=pa.int8())
texts = pa.array(['a'], type=pa.string())
example_table = pa.table([ids, texts], names=["id", "text"])

pq.write_table(example_table, "code_example.parquet")

duckdb_conn = duckdb.connect()

rel = duckdb_conn.read_parquet("code_example.parquet")

rel.show()
```

####### Result {#docs:current:clients:python:relational_api::result}

```text
┌──────┬─────────┐
│  id  │  text   │
│ int8 │ varchar │
├──────┼─────────┤
│    1 │ a       │
└──────┴─────────┘
```

----

###### `sql` {#docs:current:clients:python:relational_api::sql}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
sql(self: _duckdb.DuckDBPyConnection, query: object, *, alias: str = '', params: object = None) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.

> **Warning.** Passing `params` to this method is [discouraged](#docs:current:clients:python:known_issues::parameterized-queries-in-relational-api) due to significant performance overhead. Use [`execute()`](#docs:current:clients:python:dbapi::prepared-statements) for parameterized queries instead.

**Aliases**: [`from_query`](#::from_query), [`query`](#::query)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **query** : object

	The SQL query or subquery to be executed and converted into a relation.
- **alias** : str, default: ''

	Optional alias name to assign to the resulting relation.
- **params** : object

	Optional query parameters. **Discouraged** due to [significant performance overhead](#docs:current:clients:python:known_issues::parameterized-queries-in-relational-api). Use [`execute()`](#docs:current:clients:python:dbapi::prepared-statements) for parameterized queries instead.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("from range(1,2) tbl(id)")

rel.show()
```

####### Result {#docs:current:clients:python:relational_api::result}

```text
┌───────┐
│  id   │
│ int64 │
├───────┤
│     1 │
└───────┘
```

----

###### `table` {#docs:current:clients:python:relational_api::table}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
table(self: _duckdb.DuckDBPyConnection, table_name: str) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Create a relation object for the named table

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **table_name** : str
                            
	Name of the table to create a relation from.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

duckdb_conn.sql("create table code_example as select * from range(1,2) tbl(id)")

rel = duckdb_conn.table("code_example")

rel.show()
```

####### Result {#docs:current:clients:python:relational_api::result}

```text
┌───────┐
│  id   │
│ int64 │
├───────┤
│     1 │
└───────┘
```

----

###### `table_function` {#docs:current:clients:python:relational_api::table_function}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
table_function(self: _duckdb.DuckDBPyConnection, name: str, parameters: object = None) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Create a relation object from the named table function with given parameters

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **name** : str
                            
	Name of the table function to call.
- **parameters** : object
                            
	Optional parameters to pass to the table function.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

duckdb_conn.sql("""
    create macro get_record_for(x) as table
    select x*range from range(1,2)
""")

rel = duckdb_conn.table_function(name="get_record_for", parameters=[1])

rel.show()
```

####### Result {#docs:current:clients:python:relational_api::result}

```text
┌───────────────┐
│ (1 * "range") │
│     int64     │
├───────────────┤
│             1 │
└───────────────┘
```

----

###### `values` {#docs:current:clients:python:relational_api::values}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
values(self: _duckdb.DuckDBPyConnection, *args) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Create a relation object from the passed values

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.values([1, 'a'])

rel.show()
```

####### Result {#docs:current:clients:python:relational_api::result}

```text
┌───────┬─────────┐
│ col0  │  col1   │
│ int32 │ varchar │
├───────┼─────────┤
│     1 │ a       │
└───────┴─────────┘
```

----

###### `view` {#docs:current:clients:python:relational_api::view}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
view(self: _duckdb.DuckDBPyConnection, view_name: str) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Create a relation object for the named view

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **view_name** : str
                            
	Name of the view to create a relation from.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

duckdb_conn.sql("create table code_example as select * from range(1,2) tbl(id)")

rel = duckdb_conn.view("code_example")

rel.show()
```

####### Result {#docs:current:clients:python:relational_api::result}

```text
┌───────┐
│  id   │
│ int64 │
├───────┤
│     1 │
└───────┘
```

#### Relation Definition Details  {#docs:current:clients:python:relational_api::relation-definition-details-}

This section contains the details on how to inspect a relation.

| Name | Description |
|:--|:-------|
| [`alias`](#::alias) | Get the name of the current alias |
| [`columns`](#::columns) | Return a list containing the names of the columns of the relation. |
| [`describe`](#::describe) | Gives basic statistics (e.g., min, max) and if NULL exists for each column of the relation. |
| [`description`](#::description) | Return the description of the result |
| [`dtypes`](#::dtypes) | Return a list containing the types of the columns of the relation. |
| [`explain`](#::explain) | explain(self: _duckdb.DuckDBPyRelation, type: _duckdb.ExplainType = 'standard') -> str |
| [`query`](#::query-1) | Run the given SQL query in sql_query on the view named virtual_table_name that refers to the relation object |
| [`set_alias`](#::set_alias) | Rename the relation object to new alias |
| [`shape`](#::shape) | Tuple of # of rows, # of columns in relation. |
| [`show`](#::show) | Display a summary of the data |
| [`sql_query`](#::sql_query) | Get the SQL query that is equivalent to the relation |
| [`type`](#::type) | Get the type of the relation. |
| [`types`](#::types) | Return a list containing the types of the columns of the relation. |

###### `alias` {#docs:current:clients:python:relational_api::alias}

####### Description {#docs:current:clients:python:relational_api::description}

Get the name of the current alias

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.alias
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
unnamed_relation_43c808c247431be5
```

----

###### `columns` {#docs:current:clients:python:relational_api::columns}

####### Description {#docs:current:clients:python:relational_api::description}

Return a list containing the names of the columns of the relation.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.columns
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
 ['id', 'description', 'value', 'created_timestamp']
```

----

###### `describe` {#docs:current:clients:python:relational_api::describe}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
describe(self: _duckdb.DuckDBPyRelation) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Gives basic statistics (e.g., min, max) and if NULL exists for each column of the relation.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.describe()
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────┬──────────────────────────────────────┬─────────────────┬────────────────────┬────────────────────────────┐
│  aggr   │                  id                  │   description   │       value        │     created_timestamp      │
│ varchar │               varchar                │     varchar     │       double       │          varchar           │
├─────────┼──────────────────────────────────────┼─────────────────┼────────────────────┼────────────────────────────┤
│ count   │ 9                                    │ 9               │                9.0 │ 9                          │
│ mean    │ NULL                                 │ NULL            │                5.0 │ NULL                       │
│ stddev  │ NULL                                 │ NULL            │ 2.7386127875258306 │ NULL                       │
│ min     │ 08fdcbf8-4e53-4290-9e81-423af263b518 │ value is even   │                1.0 │ 2025-04-09 15:41:20.642+02 │
│ max     │ fb10390e-fad5-4694-91cb-e82728cb6f9f │ value is uneven │                9.0 │ 2025-04-09 15:49:20.642+02 │
│ median  │ NULL                                 │ NULL            │                5.0 │ NULL                       │
└─────────┴──────────────────────────────────────┴─────────────────┴────────────────────┴────────────────────────────┘ 
```

----

###### `description` {#docs:current:clients:python:relational_api::description}

####### Description {#docs:current:clients:python:relational_api::description}

Return the description of the result

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.description
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
[('id', 'UUID', None, None, None, None, None),
 ('description', 'STRING', None, None, None, None, None),
 ('value', 'NUMBER', None, None, None, None, None),
 ('created_timestamp', 'DATETIME', None, None, None, None, None)]  
```

----

###### `dtypes` {#docs:current:clients:python:relational_api::dtypes}

####### Description {#docs:current:clients:python:relational_api::description}

Return a list containing the types of the columns of the relation.

**Aliases**: [`types`](#::types)

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.dtypes
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
 [UUID, VARCHAR, BIGINT, TIMESTAMP WITH TIME ZONE]
```

----

###### `explain` {#docs:current:clients:python:relational_api::explain}

####### Description {#docs:current:clients:python:relational_api::description}

explain(self: _duckdb.DuckDBPyRelation, type: _duckdb.ExplainType = 'standard') -> str

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.explain()
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌───────────────────────────┐
│         PROJECTION        │
│    ────────────────────   │
│             id            │
│        description        │
│           value           │
│     created_timestamp     │
│                           │
│          ~9 Rows          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│           RANGE           │
│    ────────────────────   │
│      Function: RANGE      │
│                           │
│          ~9 Rows          │
└───────────────────────────┘

```

----

###### `query` {#docs:current:clients:python:relational_api::query}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
query(self: _duckdb.DuckDBPyRelation, virtual_table_name: str, sql_query: str) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Run the given SQL query in sql_query on the view named virtual_table_name that refers to the relation object

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **virtual_table_name** : str
                            
	The name to assign to the current relation when referenced in the SQL query.
- **sql_query** : str
                            
	The SQL query string that uses the virtual table name to query the relation.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.query(virtual_table_name="rel_view", sql_query="from rel")

duckdb_conn.sql("show rel_view")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌───────────────────┬──────────────────────────┬─────────┬─────────┬─────────┬─────────┐
│    column_name    │       column_type        │  null   │   key   │ default │  extra  │
│      varchar      │         varchar          │ varchar │ varchar │ varchar │ varchar │
├───────────────────┼──────────────────────────┼─────────┼─────────┼─────────┼─────────┤
│ id                │ UUID                     │ YES     │ NULL    │ NULL    │ NULL    │
│ description       │ VARCHAR                  │ YES     │ NULL    │ NULL    │ NULL    │
│ value             │ BIGINT                   │ YES     │ NULL    │ NULL    │ NULL    │
│ created_timestamp │ TIMESTAMP WITH TIME ZONE │ YES     │ NULL    │ NULL    │ NULL    │
└───────────────────┴──────────────────────────┴─────────┴─────────┴─────────┴─────────┘
```

----

###### `set_alias` {#docs:current:clients:python:relational_api::set_alias}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
set_alias(self: _duckdb.DuckDBPyRelation, alias: str) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Rename the relation object to new alias

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **alias** : str
                            
	The alias name to assign to the relation.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.set_alias('abc').select('abc.id')
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
In the SQL query, the alias will be `abc`
```

----

###### `shape` {#docs:current:clients:python:relational_api::shape}

####### Description {#docs:current:clients:python:relational_api::description}

Tuple of # of rows, # of columns in relation.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.shape
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
(9, 4)
```

----

###### `show` {#docs:current:clients:python:relational_api::show}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
show(self: _duckdb.DuckDBPyRelation, *, max_width: typing.Optional[typing.SupportsInt] = None, max_rows: typing.Optional[typing.SupportsInt] = None, max_col_width: typing.Optional[typing.SupportsInt] = None, null_value: typing.Optional[str] = None, render_mode: object = None) -> None
```

####### Description {#docs:current:clients:python:relational_api::description}

Display a summary of the data

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **max_width** : int
                            
	Maximum display width for the entire output in characters.
- **max_rows** : int
                            
	Maximum number of rows to display.
- **max_col_width** : int
                            
	Maximum number of characters to display per column.
- **null_value** : str
                            
	String to display in place of NULL values.
- **render_mode** : object
                            
	Render mode for displaying the output.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.show()
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌──────────────────────────────────────┬─────────────────┬───────┬────────────────────────────┐
│                  id                  │   description   │ value │     created_timestamp      │
│                 uuid                 │     varchar     │ int64 │  timestamp with time zone  │
├──────────────────────────────────────┼─────────────────┼───────┼────────────────────────────┤
│ 642ea3d7-793d-4867-a759-91c1226c25a0 │ value is uneven │     1 │ 2025-04-09 15:41:20.642+02 │
│ 6817dd31-297c-40a8-8e40-8521f00b2d08 │ value is even   │     2 │ 2025-04-09 15:42:20.642+02 │
│ 45143f9a-e16e-4e59-91b2-3a0800eed6d6 │ value is uneven │     3 │ 2025-04-09 15:43:20.642+02 │
│ fb10390e-fad5-4694-91cb-e82728cb6f9f │ value is even   │     4 │ 2025-04-09 15:44:20.642+02 │
│ 111ced5c-9155-418e-b087-c331b814db90 │ value is uneven │     5 │ 2025-04-09 15:45:20.642+02 │
│ 66a870a6-aef0-4085-87d5-5d1b35d21c66 │ value is even   │     6 │ 2025-04-09 15:46:20.642+02 │
│ a7e8e796-bca0-44cd-a269-1d71090fb5cc │ value is uneven │     7 │ 2025-04-09 15:47:20.642+02 │
│ 74908d48-7f2d-4bdd-9c92-1e7920b115b5 │ value is even   │     8 │ 2025-04-09 15:48:20.642+02 │
│ 08fdcbf8-4e53-4290-9e81-423af263b518 │ value is uneven │     9 │ 2025-04-09 15:49:20.642+02 │
└──────────────────────────────────────┴─────────────────┴───────┴────────────────────────────┘
```

----

###### `sql_query` {#docs:current:clients:python:relational_api::sql_query}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
sql_query(self: _duckdb.DuckDBPyRelation) -> str
```

####### Description {#docs:current:clients:python:relational_api::description}

Get the SQL query that is equivalent to the relation

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.sql_query()
```


####### Result {#docs:current:clients:python:relational_api::result}

```sql
SELECT 
    gen_random_uuid() AS id, 
    concat('value is ', CASE  WHEN ((mod("range", 2) = 0)) THEN ('even') ELSE 'uneven' END) AS description, 
    "range" AS "value", 
    (now() + CAST(concat("range", ' ', 'minutes') AS INTERVAL)) AS created_timestamp 
FROM "range"(1, 10)
```

----

###### `type` {#docs:current:clients:python:relational_api::type}

####### Description {#docs:current:clients:python:relational_api::description}

Get the type of the relation.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.type
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
QUERY_RELATION
```

----

###### `types` {#docs:current:clients:python:relational_api::types}

####### Description {#docs:current:clients:python:relational_api::description}

Return a list containing the types of the columns of the relation.

**Aliases**: [`dtypes`](#::dtypes)

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.types
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
[UUID, VARCHAR, BIGINT, TIMESTAMP WITH TIME ZONE]
```

#### Transformation  {#docs:current:clients:python:relational_api::transformation-}

This section contains the methods which can be used to chain queries.        The methods are [lazy evaluated](#::lazy-evaluation).

| Name | Description |
|:--|:-------|
| [`aggregate`](#::aggregate) | Compute the aggregate aggr_expr by the optional groups group_expr on the relation |
| [`apply`](#::apply) | Compute the function of a single column or a list of columns by the optional groups on the relation |
| [`cross`](#::cross) | Create cross/cartesian product of two relational objects |
| [`except_`](#::except_) | Create the set except of this relation object with another relation object in other_rel |
| [`filter`](#::filter) | Filter the relation object by the filter in filter_expr |
| [`insert`](#::insert) | Inserts the given values into the relation |
| [`insert_into`](#::insert_into) | Inserts the relation object into an existing table named table_name |
| [`intersect`](#::intersect) | Create the set intersection of this relation object with another relation object in other_rel |
| [`join`](#::join) | Join the relation object with another relation object in other_rel using the join condition expression in join_condition. Types supported are 'inner', 'left', 'right', 'outer', 'semi' and 'anti' |
| [`limit`](#::limit) | Only retrieve the first n rows from this relation object, starting at offset |
| [`map`](#::map) | Calls the passed function on the relation |
| [`order`](#::order) | Reorder the relation object by order_expr |
| [`project`](#::project) | Project the relation object by the projection in project_expr |
| [`select`](#::select) | Project the relation object by the projection in project_expr |
| [`sort`](#::sort) | Reorder the relation object by the provided expressions |
| [`union`](#::union) | Create the set union of this relation object with another relation object in other_rel |
| [`update`](#::update) | Update the given relation with the provided expressions |

###### `aggregate` {#docs:current:clients:python:relational_api::aggregate}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
aggregate(self: _duckdb.DuckDBPyRelation, aggr_expr: object, group_expr: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Compute the aggregate aggr_expr by the optional groups group_expr on the relation

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **aggr_expr** : str, list[Expression]
                            
	The list of columns and aggregation functions.
- **group_expr** : str, default: ''
                            
	The list of columns to be included in `group_by`. If `None`, `group by all` is applied.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel = rel.aggregate('max(value)')
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌──────────────┐
│ max("value") │
│    int64     │
├──────────────┤
│            9 │
└──────────────┘
        
```

----

###### `apply` {#docs:current:clients:python:relational_api::apply}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
apply(self: _duckdb.DuckDBPyRelation, function_name: str, function_aggr: str, group_expr: str = '', function_parameter: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Compute the function of a single column or a list of columns by the optional groups on the relation

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **function_name** : str
                            
	Name of the function to apply over the relation.
- **function_aggr** : str
                            
	The list of columns to apply the function over.
- **group_expr** : str, default: ''
                            
	Optional SQL expression for grouping.
- **function_parameter** : str, default: ''
                            
	Optional parameters to pass into the function.
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.apply(
    function_name="count", 
    function_aggr="id", 
    group_expr="description",
    projected_columns="description"
)
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬───────────┐
│   description   │ count(id) │
│     varchar     │   int64   │
├─────────────────┼───────────┤
│ value is uneven │         5 │
│ value is even   │         4 │
└─────────────────┴───────────┘
```

----

###### `cross` {#docs:current:clients:python:relational_api::cross}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
cross(self: _duckdb.DuckDBPyRelation, other_rel: _duckdb.DuckDBPyRelation) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Create cross/cartesian product of two relational objects

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **other_rel** : _duckdb.DuckDBPyRelation
                            
	Another relation to perform a cross product with.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.cross(other_rel=rel.set_alias("other_rel"))
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────────────────┬─────────────────┬───────┬───────────────────────────┬──────────────────────────────────────┬─────────────────┬───────┬───────────────────────────┐
│             id              │   description   │ value │     created_timestamp     │                  id                  │   description   │ value │     created_timestamp     │
│            uuid             │     varchar     │ int64 │ timestamp with time zone  │                 uuid                 │     varchar     │ int64 │ timestamp with time zone  │
├─────────────────────────────┼─────────────────┼───────┼───────────────────────────┼──────────────────────────────────────┼─────────────────┼───────┼───────────────────────────┤
│ cb2b453f-1a06-4f5e-abe1-b…  │ value is uneven │     1 │ 2025-04-10 09:53:29.78+02 │ cb2b453f-1a06-4f5e-abe1-bfd413581bcf │ value is uneven │     1 │ 2025-04-10 09:53:29.78+02 │
...
```

----

###### `except_` {#docs:current:clients:python:relational_api::except_}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
except_(self: _duckdb.DuckDBPyRelation, other_rel: _duckdb.DuckDBPyRelation) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Create the set except of this relation object with another relation object in other_rel

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **other_rel** : _duckdb.DuckDBPyRelation
                            
	The relation to subtract from the current relation (set difference).

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.except_(other_rel=rel.set_alias("other_rel"))
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
The relation query is executed twice, therefore generating different ids and timestamps:
┌──────────────────────────────────────┬─────────────────┬───────┬────────────────────────────┐
│                  id                  │   description   │ value │     created_timestamp      │
│                 uuid                 │     varchar     │ int64 │  timestamp with time zone  │
├──────────────────────────────────────┼─────────────────┼───────┼────────────────────────────┤
│ f69ed6dd-a7fe-4de2-b6af-1c2418096d69 │ value is uneven │     3 │ 2025-04-10 11:43:05.711+02 │
│ 08ad11dc-a9c2-4aaa-9272-760b27ad1f5d │ value is uneven │     7 │ 2025-04-10 11:47:05.711+02 │
...
```

----

###### `filter` {#docs:current:clients:python:relational_api::filter}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
filter(self: _duckdb.DuckDBPyRelation, filter_expr: object) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Filter the relation object by the filter in filter_expr

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **filter_expr** : str, Expression
                            
	The filter expression to apply over the relation.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.filter("value = 2")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌──────────────────────────────────────┬───────────────┬───────┬───────────────────────────┐
│                  id                  │  description  │ value │     created_timestamp     │
│                 uuid                 │    varchar    │ int64 │ timestamp with time zone  │
├──────────────────────────────────────┼───────────────┼───────┼───────────────────────────┤
│ b0684ab7-fcbf-41c5-8e4a-a51bdde86926 │ value is even │     2 │ 2025-04-10 09:54:29.78+02 │
└──────────────────────────────────────┴───────────────┴───────┴───────────────────────────┘
```

----

###### `insert` {#docs:current:clients:python:relational_api::insert}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
insert(self: _duckdb.DuckDBPyRelation, values: object) -> None
```

####### Description {#docs:current:clients:python:relational_api::description}

Inserts the given values into the relation

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **values** : object
                            
	A tuple of values matching the relation column list, to be inserted.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

from datetime import datetime
from uuid import uuid4

duckdb_conn = duckdb.connect()

duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
).to_table("code_example")

rel = duckdb_conn.table("code_example")

rel.insert(
    (
        uuid4(), 
        'value is even',
        10, 
        datetime.now()
    )
)

rel.filter("value = 10")
```

####### Result {#docs:current:clients:python:relational_api::result}

```text
┌──────────────────────────────────────┬───────────────┬───────┬───────────────────────────────┐
│                  id                  │  description  │ value │       created_timestamp       │
│                 uuid                 │    varchar    │ int64 │   timestamp with time zone    │
├──────────────────────────────────────┼───────────────┼───────┼───────────────────────────────┤
│ c6dfab87-fae6-4213-8f76-1b96a8d179f6 │ value is even │    10 │ 2025-04-10 10:02:24.652218+02 │
└──────────────────────────────────────┴───────────────┴───────┴───────────────────────────────┘
```

----

###### `insert_into` {#docs:current:clients:python:relational_api::insert_into}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
insert_into(self: _duckdb.DuckDBPyRelation, table_name: str) -> None
```

####### Description {#docs:current:clients:python:relational_api::description}

Inserts the relation object into an existing table named table_name

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **table_name** : str
                            
	The table name to insert the data into. The relation must respect the column order of the table.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

from datetime import datetime
from uuid import uuid4

duckdb_conn = duckdb.connect()

duckdb_conn.sql("""
        select
            gen_random_uuid() as id,
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value,
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
).to_table("code_example")

rel = duckdb_conn.values(
    [
        uuid4(),
        'value is even',
        10,
        datetime.now()
    ]
)

rel.insert_into("code_example")

duckdb_conn.table("code_example").filter("value = 10")
```

####### Result {#docs:current:clients:python:relational_api::result}

```text
┌──────────────────────────────────────┬───────────────┬───────┬───────────────────────────────┐
│                  id                  │  description  │ value │       created_timestamp       │
│                 uuid                 │    varchar    │ int64 │   timestamp with time zone    │
├──────────────────────────────────────┼───────────────┼───────┼───────────────────────────────┤
│ 271c5ddd-c1d5-4638-b5a0-d8c7dc9e8220 │ value is even │    10 │ 2025-04-10 14:29:18.616379+02 │
└──────────────────────────────────────┴───────────────┴───────┴───────────────────────────────┘
```

----

###### `intersect` {#docs:current:clients:python:relational_api::intersect}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
intersect(self: _duckdb.DuckDBPyRelation, other_rel: _duckdb.DuckDBPyRelation) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Create the set intersection of this relation object with another relation object in other_rel

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **other_rel** : _duckdb.DuckDBPyRelation
                            
	The relation to intersect with the current relation (set intersection).

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.intersect(other_rel=rel.set_alias("other_rel"))
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
The relation query is executed once with `rel` and once with `other_rel`,
therefore generating different ids and timestamps:
┌──────┬─────────────┬───────┬──────────────────────────┐
│  id  │ description │ value │    created_timestamp     │
│ uuid │   varchar   │ int64 │ timestamp with time zone │
├──────┴─────────────┴───────┴──────────────────────────┤
│                        0 rows                         │
└───────────────────────────────────────────────────────┘
```

----

###### `join` {#docs:current:clients:python:relational_api::join}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
join(self: _duckdb.DuckDBPyRelation, other_rel: _duckdb.DuckDBPyRelation, condition: object, how: str = 'inner') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Join the relation object with another relation object in other_rel using the join condition expression in join_condition. Types supported are 'inner', 'left', 'right', 'outer', 'semi' and 'anti'

Depending on how the `condition` parameter is provided, the JOIN clause generated is:
- `USING`

```python
import duckdb

duckdb_conn = duckdb.connect()

rel1 = duckdb_conn.sql("select range as id, concat('dummy 1', range) as text from range(1,10)")
rel2 = duckdb_conn.sql("select range as id, concat('dummy 2', range) as text from range(5,7)")

rel1.join(rel2, condition="id", how="inner").sql_query()
```
with following SQL:

```sql
SELECT * 
FROM (
        SELECT "range" AS id, 
            concat('dummy 1', "range") AS "text" 
        FROM "range"(1, 10)
    ) AS unnamed_relation_41bc15e744037078 
INNER JOIN (
        SELECT "range" AS id, 
        concat('dummy 2', "range") AS "text" 
        FROM "range"(5, 7)
    ) AS unnamed_relation_307e245965aa2c2b 
USING (id)
```
- `ON`

```python
import duckdb

duckdb_conn = duckdb.connect()

rel1 = duckdb_conn.sql("select range as id, concat('dummy 1', range) as text from range(1,10)")
rel2 = duckdb_conn.sql("select range as id, concat('dummy 2', range) as text from range(5,7)")

rel1.join(rel2, condition=f"{rel1.alias}.id = {rel2.alias}.id", how="inner").sql_query()
```

with the following SQL:

```sql
SELECT * 
FROM (
        SELECT "range" AS id, 
            concat('dummy 1', "range") AS "text" 
        FROM "range"(1, 10)
    ) AS unnamed_relation_41bc15e744037078 
INNER JOIN (
        SELECT "range" AS id, 
        concat('dummy 2', "range") AS "text" 
        FROM "range"(5, 7)
    ) AS unnamed_relation_307e245965aa2c2b 
ON ((unnamed_relation_41bc15e744037078.id = unnamed_relation_307e245965aa2c2b.id))
```

> `NATURAL`, `POSITIONAL` and `ASOF` joins are not provided by the relational API.
> `CROSS` joins are provided through the [cross method](#::cross). 


####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **other_rel** : _duckdb.DuckDBPyRelation
                            
	The relation to join with the current relation.
- **condition** : object
                            
	The join condition, typically a SQL expression or the duplicated column name to join on.
- **how** : str, default: 'inner'
                            
	The type of join to perform: 'inner', 'left', 'right', 'outer', 'semi' and 'anti'.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel = rel.set_alias("rel").join(
    other_rel=rel.set_alias("other_rel"), 
    condition="rel.id = other_rel.id",
    how="left"
)

rel.count("*")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌──────────────┐
│ count_star() │
│    int64     │
├──────────────┤
│            9 │
└──────────────┘
```

----

###### `limit` {#docs:current:clients:python:relational_api::limit}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
limit(self: _duckdb.DuckDBPyRelation, n: typing.SupportsInt, offset: typing.SupportsInt = 0) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Only retrieve the first n rows from this relation object, starting at offset

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **n** : int
                            
	The maximum number of rows to return.
- **offset** : int, default: 0
                            
	The number of rows to skip before starting to return rows.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.limit(1)
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌──────────────────────────────────────┬─────────────────┬───────┬────────────────────────────┐
│                  id                  │   description   │ value │     created_timestamp      │
│                 uuid                 │     varchar     │ int64 │  timestamp with time zone  │
├──────────────────────────────────────┼─────────────────┼───────┼────────────────────────────┤
│ 4135597b-29e7-4cb9-a443-41f3d54f25df │ value is uneven │     1 │ 2025-04-10 10:52:03.678+02 │
└──────────────────────────────────────┴─────────────────┴───────┴────────────────────────────┘
```

----

###### `map` {#docs:current:clients:python:relational_api::map}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
map(self: _duckdb.DuckDBPyRelation, map_function: collections.abc.Callable, *, schema: typing.Optional[object] = None) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Calls the passed function on the relation

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **map_function** : Callable
                            
	A Python function that takes a DataFrame and returns a transformed DataFrame.
- **schema** : object, default: None
                            
	Optional schema describing the structure of the output relation.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb
from pandas import DataFrame

def multiply_by_2(df: DataFrame):
    df["id"] = df["id"] * 2
    return df

duckdb_conn = duckdb.connect()
rel = duckdb_conn.sql("select range as id, 'dummy' as text from range(1,3)")

rel.map(multiply_by_2, schema={"id": int, "text": str})
```

####### Result {#docs:current:clients:python:relational_api::result}

```text
┌───────┬─────────┐
│  id   │  text   │
│ int64 │ varchar │
├───────┼─────────┤
│     2 │ dummy   │
│     4 │ dummy   │
└───────┴─────────┘
```

----

###### `order` {#docs:current:clients:python:relational_api::order}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
order(self: _duckdb.DuckDBPyRelation, order_expr: str) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Reorder the relation object by order_expr

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **order_expr** : str
                            
	SQL expression defining the ordering of the result rows.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.order("value desc").limit(1, offset=4)
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌──────────────────────────────────────┬─────────────────┬───────┬────────────────────────────┐
│                  id                  │   description   │ value │     created_timestamp      │
│                 uuid                 │     varchar     │ int64 │  timestamp with time zone  │
├──────────────────────────────────────┼─────────────────┼───────┼────────────────────────────┤
│ 55899131-e3d3-463c-a215-f65cb8aef3bf │ value is uneven │     5 │ 2025-04-10 10:56:03.678+02 │
└──────────────────────────────────────┴─────────────────┴───────┴────────────────────────────┘
```

----

###### `project` {#docs:current:clients:python:relational_api::project}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
project(self: _duckdb.DuckDBPyRelation, *args, groups: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Project the relation object by the projection in project_expr

**Aliases**: [`select`](#::select)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.project("description").limit(1)
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┐
│   description   │
│     varchar     │
├─────────────────┤
│ value is uneven │
└─────────────────┘
```

----

###### `select` {#docs:current:clients:python:relational_api::select}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
select(self: _duckdb.DuckDBPyRelation, *args, groups: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Project the relation object by the projection in project_expr

**Aliases**: [`project`](#::project)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.select("description").limit(1)
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┐
│   description   │
│     varchar     │
├─────────────────┤
│ value is uneven │
└─────────────────┘
```

----

###### `sort` {#docs:current:clients:python:relational_api::sort}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
sort(self: _duckdb.DuckDBPyRelation, *args) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Reorder the relation object by the provided expressions

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.sort("description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌──────────────────────────────────────┬─────────────────┬───────┬────────────────────────────┐
│                  id                  │   description   │ value │     created_timestamp      │
│                 uuid                 │     varchar     │ int64 │  timestamp with time zone  │
├──────────────────────────────────────┼─────────────────┼───────┼────────────────────────────┤
│ 5e0dfa8c-de4d-4ccd-8cff-450dabb86bde │ value is even   │     6 │ 2025-04-10 16:52:15.605+02 │
│ 95f1ad48-facf-4a84-a971-0a4fecce68c7 │ value is even   │     2 │ 2025-04-10 16:48:15.605+02 │
...
```

----

###### `union` {#docs:current:clients:python:relational_api::union}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
union(self: _duckdb.DuckDBPyRelation, union_rel: _duckdb.DuckDBPyRelation) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Create the set union of this relation object with another relation object in other_rel
>The union is `union all`. In order to retrieve distinct values, apply [distinct](#::distinct).

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **union_rel** : _duckdb.DuckDBPyRelation
                            
	The relation to union with the current relation (set union).

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel = rel.union(union_rel=rel)

rel.count("*")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌──────────────┐
│ count_star() │
│    int64     │
├──────────────┤
│           18 │
└──────────────┘
```

----

###### `update` {#docs:current:clients:python:relational_api::update}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
update(self: _duckdb.DuckDBPyRelation, set: object, *, condition: object = None) -> None
```

####### Description {#docs:current:clients:python:relational_api::description}

Update the given relation with the provided expressions

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **set** : object
                            
	Mapping of columns to new values for the update operation.
- **condition** : object, default: None
                            
	Optional condition to filter which rows to update.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

from duckdb import ColumnExpression

duckdb_conn = duckdb.connect()

duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
).to_table("code_example")

rel = duckdb_conn.table("code_example")

rel.update(set={"description":None}, condition=ColumnExpression("value") == 1)

# the update is executed on the table, but not reflected on the relationship
# the relationship has to be recreated to retrieve the modified data
rel = duckdb_conn.table("code_example")

rel.show()
```

####### Result {#docs:current:clients:python:relational_api::result}

```text
┌──────────────────────────────────────┬─────────────────┬───────┬────────────────────────────┐
│                  id                  │   description   │ value │     created_timestamp      │
│                 uuid                 │     varchar     │ int64 │  timestamp with time zone  │
├──────────────────────────────────────┼─────────────────┼───────┼────────────────────────────┤
│ 66dcaa14-f4a6-4a55-af3b-7f6aa23ab4ad │ NULL            │     1 │ 2025-04-10 16:54:49.317+02 │
│ c6a18a42-67fb-4c95-827b-c966f2f95b88 │ value is even   │     2 │ 2025-04-10 16:55:49.317+02 │
...
```

#### Functions  {#docs:current:clients:python:relational_api::functions-}

This section contains the functions which can be applied to a relation,         in order to get a (scalar) result. The functions are [lazy evaluated](#::lazy-evaluation).

| Name | Description |
|:--|:-------|
| [`any_value`](#::any_value) | Returns the first non-null value from a given expression |
| [`arg_max`](#::arg_max) | Finds the row with the maximum value for a value column and returns the value of that row for an argument column |
| [`arg_min`](#::arg_min) | Finds the row with the minimum value for a value column and returns the value of that row for an argument column |
| [`avg`](#::avg) | Computes the average of a given expression |
| [`bit_and`](#::bit_and) | Computes the bitwise AND of all bits present in a given expression |
| [`bit_or`](#::bit_or) | Computes the bitwise OR of all bits present in a given expression |
| [`bit_xor`](#::bit_xor) | Computes the bitwise XOR of all bits present in a given expression |
| [`bitstring_agg`](#::bitstring_agg) | Computes a bitstring with bits set for each distinct value in a given expression |
| [`bool_and`](#::bool_and) | Computes the logical AND of all values present in a given expression |
| [`bool_or`](#::bool_or) | Computes the logical OR of all values present in a given expression |
| [`count`](#::count) | Computes the number of elements present in a given expression |
| [`cume_dist`](#::cume_dist) | Computes the cumulative distribution within the partition |
| [`dense_rank`](#::dense_rank) | Computes the dense rank within the partition |
| [`distinct`](#::distinct) | Retrieve distinct rows from this relation object |
| [`favg`](#::favg) | Computes the average of all values present in a given expression using a more accurate floating point summation (Kahan Sum) |
| [`first`](#::first) | Returns the first value of a given expression |
| [`first_value`](#::first_value) | Computes the first value within the group or partition |
| [`fsum`](#::fsum) | Computes the sum of all values present in a given expression using a more accurate floating point summation (Kahan Sum) |
| [`geomean`](#::geomean) | Computes the geometric mean over all values present in a given expression |
| [`histogram`](#::histogram) | Computes the histogram over all values present in a given expression |
| [`lag`](#::lag) | Computes the lag within the partition |
| [`last`](#::last) | Returns the last value of a given expression |
| [`last_value`](#::last_value) | Computes the last value within the group or partition |
| [`lead`](#::lead) | Computes the lead within the partition |
| [`list`](#::list) | Returns a list containing all values present in a given expression |
| [`max`](#::max) | Returns the maximum value present in a given expression |
| [`mean`](#::mean) | Computes the average of a given expression |
| [`median`](#::median) | Computes the median over all values present in a given expression |
| [`min`](#::min) | Returns the minimum value present in a given expression |
| [`mode`](#::mode) | Computes the mode over all values present in a given expression |
| [`n_tile`](#::n_tile) | Divides the partition as equally as possible into num_buckets |
| [`nth_value`](#::nth_value) | Computes the nth value within the partition |
| [`percent_rank`](#::percent_rank) | Computes the relative rank within the partition |
| [`product`](#::product) | Returns the product of all values present in a given expression |
| [`quantile`](#::quantile) | Computes the exact quantile value for a given expression |
| [`quantile_cont`](#::quantile_cont) | Computes the interpolated quantile value for a given expression |
| [`quantile_disc`](#::quantile_disc) | Computes the exact quantile value for a given expression |
| [`rank`](#::rank) | Computes the rank within the partition |
| [`rank_dense`](#::rank_dense) | Computes the dense rank within the partition |
| [`row_number`](#::row_number) | Computes the row number within the partition |
| [`select_dtypes`](#::select_dtypes) | Select columns from the relation, by filtering based on type(s) |
| [`select_types`](#::select_types) | Select columns from the relation, by filtering based on type(s) |
| [`std`](#::std) | Computes the sample standard deviation for a given expression |
| [`stddev`](#::stddev) | Computes the sample standard deviation for a given expression |
| [`stddev_pop`](#::stddev_pop) | Computes the population standard deviation for a given expression |
| [`stddev_samp`](#::stddev_samp) | Computes the sample standard deviation for a given expression |
| [`string_agg`](#::string_agg) | Concatenates the values present in a given expression with a separator |
| [`sum`](#::sum) | Computes the sum of all values present in a given expression |
| [`unique`](#::unique) | Returns the distinct values in a column. |
| [`value_counts`](#::value_counts) | Computes the number of elements present in a given expression, also projecting the original expression |
| [`var`](#::var) | Computes the sample variance for a given expression |
| [`var_pop`](#::var_pop) | Computes the population variance for a given expression |
| [`var_samp`](#::var_samp) | Computes the sample variance for a given expression |
| [`variance`](#::variance) | Computes the sample variance for a given expression |

###### `any_value` {#docs:current:clients:python:relational_api::any_value}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
any_value(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Returns the first non-null value from a given expression

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name from which to retrieve any value.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`.
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.any_value('id')
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌──────────────────────────────────────┐
│            any_value(id)             │
│                 uuid                 │
├──────────────────────────────────────┤
│ 642ea3d7-793d-4867-a759-91c1226c25a0 │
└──────────────────────────────────────┘
```

----

###### `arg_max` {#docs:current:clients:python:relational_api::arg_max}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
arg_max(self: _duckdb.DuckDBPyRelation, arg_column: str, value_column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Finds the row with the maximum value for a value column and returns the value of that row for an argument column

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **arg_column** : str
                            
	The column name for which to find the argument maximizing the value.
- **value_column** : str
                            
	The column name containing values used to determine the maximum.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`.
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.arg_max(arg_column="value", value_column="value", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬───────────────────────────┐
│   description   │ arg_max("value", "value") │
│     varchar     │           int64           │
├─────────────────┼───────────────────────────┤
│ value is uneven │                         9 │
│ value is even   │                         8 │
└─────────────────┴───────────────────────────┘
```

----

###### `arg_min` {#docs:current:clients:python:relational_api::arg_min}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
arg_min(self: _duckdb.DuckDBPyRelation, arg_column: str, value_column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Finds the row with the minimum value for a value column and returns the value of that row for an argument column

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **arg_column** : str
                            
	The column name for which to find the argument minimizing the value.
- **value_column** : str
                            
	The column name containing values used to determine the minimum.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.arg_min(arg_column="value", value_column="value", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬───────────────────────────┐
│   description   │ arg_min("value", "value") │
│     varchar     │           int64           │
├─────────────────┼───────────────────────────┤
│ value is even   │                         2 │
│ value is uneven │                         1 │
└─────────────────┴───────────────────────────┘
```

----

###### `avg` {#docs:current:clients:python:relational_api::avg}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
avg(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the average of a given expression

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to calculate the average on.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.avg('value')
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌──────────────┐
│ avg("value") │
│    double    │
├──────────────┤
│          5.0 │
└──────────────┘
 
```

----

###### `bit_and` {#docs:current:clients:python:relational_api::bit_and}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
bit_and(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the bitwise AND of all bits present in a given expression

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to perform the bitwise AND aggregation on.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel = rel.select("description, value::bit as value_bit")

rel.bit_and(column="value_bit", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬──────────────────────────────────────────────────────────────────┐
│   description   │                        bit_and(value_bit)                        │
│     varchar     │                               bit                                │
├─────────────────┼──────────────────────────────────────────────────────────────────┤
│ value is uneven │ 0000000000000000000000000000000000000000000000000000000000000001 │
│ value is even   │ 0000000000000000000000000000000000000000000000000000000000000000 │
└─────────────────┴──────────────────────────────────────────────────────────────────┘    
```

----

###### `bit_or` {#docs:current:clients:python:relational_api::bit_or}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
bit_or(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the bitwise OR of all bits present in a given expression

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to perform the bitwise OR aggregation on.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel = rel.select("description, value::bit as value_bit")

rel.bit_or(column="value_bit", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬──────────────────────────────────────────────────────────────────┐
│   description   │                        bit_or(value_bit)                         │
│     varchar     │                               bit                                │
├─────────────────┼──────────────────────────────────────────────────────────────────┤
│ value is uneven │ 0000000000000000000000000000000000000000000000000000000000001111 │
│ value is even   │ 0000000000000000000000000000000000000000000000000000000000001110 │
└─────────────────┴──────────────────────────────────────────────────────────────────┘    
```

----

###### `bit_xor` {#docs:current:clients:python:relational_api::bit_xor}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
bit_xor(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the bitwise XOR of all bits present in a given expression

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to perform the bitwise XOR aggregation on.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel = rel.select("description, value::bit as value_bit")

rel.bit_xor(column="value_bit", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬──────────────────────────────────────────────────────────────────┐
│   description   │                        bit_xor(value_bit)                        │
│     varchar     │                               bit                                │
├─────────────────┼──────────────────────────────────────────────────────────────────┤
│ value is even   │ 0000000000000000000000000000000000000000000000000000000000001000 │
│ value is uneven │ 0000000000000000000000000000000000000000000000000000000000001001 │
└─────────────────┴──────────────────────────────────────────────────────────────────┘
```

----

###### `bitstring_agg` {#docs:current:clients:python:relational_api::bitstring_agg}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
bitstring_agg(self: _duckdb.DuckDBPyRelation, expression: str, min: typing.Optional[object] = None, max: typing.Optional[object] = None, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes a bitstring with bits set for each distinct value in a given expression

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to aggregate as a bitstring.
- **min** : object, default: None
                            
	Optional minimum bitstring value for aggregation.
- **max** : object, default: None
                            
	Optional maximum bitstring value for aggregation.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.bitstring_agg(column="value", groups="description", projected_columns="description", min=1, max=9)
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬────────────────────────┐
│   description   │ bitstring_agg("value") │
│     varchar     │          bit           │
├─────────────────┼────────────────────────┤
│ value is uneven │ 101010101              │
│ value is even   │ 010101010              │
└─────────────────┴────────────────────────┘
```

----

###### `bool_and` {#docs:current:clients:python:relational_api::bool_and}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
bool_and(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the logical AND of all values present in a given expression

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to perform the boolean AND aggregation on.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel = rel.select("description, mod(value,2)::boolean as uneven")

rel.bool_and(column="uneven", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬──────────────────┐
│   description   │ bool_and(uneven) │
│     varchar     │     boolean      │
├─────────────────┼──────────────────┤
│ value is even   │ false            │
│ value is uneven │ true             │
└─────────────────┴──────────────────┘
```

----

###### `bool_or` {#docs:current:clients:python:relational_api::bool_or}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
bool_or(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the logical OR of all values present in a given expression

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to perform the boolean OR aggregation on.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel = rel.select("description, mod(value,2)::boolean as uneven")

rel.bool_or(column="uneven", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬─────────────────┐
│   description   │ bool_or(uneven) │
│     varchar     │     boolean     │
├─────────────────┼─────────────────┤
│ value is even   │ false           │
│ value is uneven │ true            │
└─────────────────┴─────────────────┘                
```

----

###### `count` {#docs:current:clients:python:relational_api::count}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
count(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the number of elements present in a given expression

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to perform count on.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.count("id")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌───────────┐
│ count(id) │
│   int64   │
├───────────┤
│         9 │
└───────────┘
```

----

###### `cume_dist` {#docs:current:clients:python:relational_api::cume_dist}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
cume_dist(self: _duckdb.DuckDBPyRelation, window_spec: str, projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the cumulative distribution within the partition

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **window_spec** : str
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.cume_dist(window_spec="over (partition by description order by value)", projected_columns="description, value")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬───────┬──────────────────────────────────────────────────────────────┐
│   description   │ value │ cume_dist() OVER (PARTITION BY description ORDER BY "value") │
│     varchar     │ int64 │                            double                            │
├─────────────────┼───────┼──────────────────────────────────────────────────────────────┤
│ value is uneven │     1 │                                                          0.2 │
│ value is uneven │     3 │                                                          0.4 │
│ value is uneven │     5 │                                                          0.6 │
│ value is uneven │     7 │                                                          0.8 │
│ value is uneven │     9 │                                                          1.0 │
│ value is even   │     2 │                                                         0.25 │
│ value is even   │     4 │                                                          0.5 │
│ value is even   │     6 │                                                         0.75 │
│ value is even   │     8 │                                                          1.0 │
└─────────────────┴───────┴──────────────────────────────────────────────────────────────┘
```

----

###### `dense_rank` {#docs:current:clients:python:relational_api::dense_rank}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
dense_rank(self: _duckdb.DuckDBPyRelation, window_spec: str, projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the dense rank within the partition

**Aliases**: [`rank_dense`](#::rank_dense)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **window_spec** : str
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

 rel.dense_rank(window_spec="over (partition by description order by value)", projected_columns="description, value")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬───────┬───────────────────────────────────────────────────────────────┐
│   description   │ value │ dense_rank() OVER (PARTITION BY description ORDER BY "value") │
│     varchar     │ int64 │                             int64                             │
├─────────────────┼───────┼───────────────────────────────────────────────────────────────┤
│ value is even   │     2 │                                                             1 │
│ value is even   │     4 │                                                             2 │
│ value is even   │     6 │                                                             3 │
│ value is even   │     8 │                                                             4 │
│ value is uneven │     1 │                                                             1 │
│ value is uneven │     3 │                                                             2 │
│ value is uneven │     5 │                                                             3 │
│ value is uneven │     7 │                                                             4 │
│ value is uneven │     9 │                                                             5 │
└─────────────────┴───────┴───────────────────────────────────────────────────────────────┘
```

----

###### `distinct` {#docs:current:clients:python:relational_api::distinct}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
distinct(self: _duckdb.DuckDBPyRelation) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Retrieve distinct rows from this relation object

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("select range from range(1,4)")

rel = rel.union(union_rel=rel)

rel.distinct().order("range")
```

####### Result {#docs:current:clients:python:relational_api::result}

```text
┌───────┐
│ range │
│ int64 │
├───────┤
│     1 │
│     2 │
│     3 │
└───────┘
```

----

###### `favg` {#docs:current:clients:python:relational_api::favg}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
favg(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the average of all values present in a given expression using a more accurate floating point summation (Kahan Sum)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to calculate the average on.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.favg(column="value", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬───────────────┐
│   description   │ favg("value") │
│     varchar     │    double     │
├─────────────────┼───────────────┤
│ value is uneven │           5.0 │
│ value is even   │           5.0 │
└─────────────────┴───────────────┘
```

----

###### `first` {#docs:current:clients:python:relational_api::first}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
first(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Returns the first value of a given expression

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name from which to retrieve the first value.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.first(column="value", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬──────────────────┐
│   description   │ "first"("value") │
│     varchar     │      int64       │
├─────────────────┼──────────────────┤
│ value is even   │                2 │
│ value is uneven │                1 │
└─────────────────┴──────────────────┘
```

----

###### `first_value` {#docs:current:clients:python:relational_api::first_value}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
first_value(self: _duckdb.DuckDBPyRelation, expression: str, window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the first value within the group or partition

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name from which to retrieve the first value.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.first_value(column="value", window_spec="over (partition by description order by value)", projected_columns="description").distinct()
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬───────────────────────────────────────────────────────────────────────┐
│   description   │ first_value("value") OVER (PARTITION BY description ORDER BY "value") │
│     varchar     │                                 int64                                 │
├─────────────────┼───────────────────────────────────────────────────────────────────────┤
│ value is even   │                                                                     2 │
│ value is uneven │                                                                     1 │
└─────────────────┴───────────────────────────────────────────────────────────────────────┘
```

----

###### `fsum` {#docs:current:clients:python:relational_api::fsum}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
fsum(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the sum of all values present in a given expression using a more accurate floating point summation (Kahan Sum)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to calculate the sum on.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.fsum(column="value", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬───────────────┐
│   description   │ fsum("value") │
│     varchar     │    double     │
├─────────────────┼───────────────┤
│ value is even   │          20.0 │
│ value is uneven │          25.0 │
└─────────────────┴───────────────┘
```

----

###### `geomean` {#docs:current:clients:python:relational_api::geomean}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
geomean(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the geometric mean over all values present in a given expression

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to calculate the geometric mean on.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.geomean(column="value", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬───────────────────┐
│   description   │ geomean("value")  │
│     varchar     │      double       │
├─────────────────┼───────────────────┤
│ value is uneven │ 3.936283427035351 │
│ value is even   │ 4.426727678801287 │
└─────────────────┴───────────────────┘
```

----

###### `histogram` {#docs:current:clients:python:relational_api::histogram}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
histogram(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the histogram over all values present in a given expression

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to calculate the histogram on.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.histogram(column="value", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬───────────────────────────┐
│   description   │    histogram("value")     │
│     varchar     │   map(bigint, ubigint)    │
├─────────────────┼───────────────────────────┤
│ value is uneven │ {1=1, 3=1, 5=1, 7=1, 9=1} │
│ value is even   │ {2=1, 4=1, 6=1, 8=1}      │
└─────────────────┴───────────────────────────┘
```

----

###### `lag` {#docs:current:clients:python:relational_api::lag}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
lag(self: _duckdb.DuckDBPyRelation, expression: str, window_spec: str, offset: typing.SupportsInt = 1, default_value: str = 'NULL', ignore_nulls: bool = False, projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the lag within the partition

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to apply the lag function on.
- **window_spec** : str
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **offset** : int, default: 1
                            
	The number of rows to lag behind.
- **default_value** : str, default: 'NULL'
                            
	The default value to return when the lag offset goes out of bounds.
- **ignore_nulls** : bool, default: False
                            
	Whether to ignore NULL values when computing the lag.
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.lag(column="description", window_spec="over (order by value)", projected_columns="description, value")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬───────┬───────────────────────────────────────────────────┐
│   description   │ value │ lag(description, 1, NULL) OVER (ORDER BY "value") │
│     varchar     │ int64 │                      varchar                      │
├─────────────────┼───────┼───────────────────────────────────────────────────┤
│ value is uneven │     1 │ NULL                                              │
│ value is even   │     2 │ value is uneven                                   │
│ value is uneven │     3 │ value is even                                     │
│ value is even   │     4 │ value is uneven                                   │
│ value is uneven │     5 │ value is even                                     │
│ value is even   │     6 │ value is uneven                                   │
│ value is uneven │     7 │ value is even                                     │
│ value is even   │     8 │ value is uneven                                   │
│ value is uneven │     9 │ value is even                                     │
└─────────────────┴───────┴───────────────────────────────────────────────────┘
```

----

###### `last` {#docs:current:clients:python:relational_api::last}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
last(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Returns the last value of a given expression

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name from which to retrieve the last value.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.last(column="value", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬─────────────────┐
│   description   │ "last"("value") │
│     varchar     │      int64      │
├─────────────────┼─────────────────┤
│ value is even   │               8 │
│ value is uneven │               9 │
└─────────────────┴─────────────────┘
```

----

###### `last_value` {#docs:current:clients:python:relational_api::last_value}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
last_value(self: _duckdb.DuckDBPyRelation, expression: str, window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the last value within the group or partition

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name from which to retrieve the last value within the window.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.last_value(column="value", window_spec="over (order by description)", projected_columns="description").distinct()
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬─────────────────────────────────────────────────┐
│   description   │ last_value("value") OVER (ORDER BY description) │
│     varchar     │                      int64                      │
├─────────────────┼─────────────────────────────────────────────────┤
│ value is uneven │                                               9 │
│ value is even   │                                               8 │
└─────────────────┴─────────────────────────────────────────────────┘
```

----

###### `lead` {#docs:current:clients:python:relational_api::lead}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
lead(self: _duckdb.DuckDBPyRelation, expression: str, window_spec: str, offset: typing.SupportsInt = 1, default_value: str = 'NULL', ignore_nulls: bool = False, projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the lead within the partition

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to apply the lead function on.
- **window_spec** : str
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **offset** : int, default: 1
                            
	The number of rows to lead ahead.
- **default_value** : str, default: 'NULL'
                            
	The default value to return when the lead offset goes out of bounds.
- **ignore_nulls** : bool, default: False
                            
	Whether to ignore NULL values when computing the lead.
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.lead(column="description", window_spec="over (order by value)", projected_columns="description, value")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬───────┬────────────────────────────────────────────────────┐
│   description   │ value │ lead(description, 1, NULL) OVER (ORDER BY "value") │
│     varchar     │ int64 │                      varchar                       │
├─────────────────┼───────┼────────────────────────────────────────────────────┤
│ value is uneven │     1 │ value is even                                      │
│ value is even   │     2 │ value is uneven                                    │
│ value is uneven │     3 │ value is even                                      │
│ value is even   │     4 │ value is uneven                                    │
│ value is uneven │     5 │ value is even                                      │
│ value is even   │     6 │ value is uneven                                    │
│ value is uneven │     7 │ value is even                                      │
│ value is even   │     8 │ value is uneven                                    │
│ value is uneven │     9 │ NULL                                               │
└─────────────────┴───────┴────────────────────────────────────────────────────┘
```

----

###### `list` {#docs:current:clients:python:relational_api::list}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
list(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Returns a list containing all values present in a given expression

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to aggregate values into a list.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.list(column="value", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬─────────────────┐
│   description   │  list("value")  │
│     varchar     │     int64[]     │
├─────────────────┼─────────────────┤
│ value is even   │ [2, 4, 6, 8]    │
│ value is uneven │ [1, 3, 5, 7, 9] │
└─────────────────┴─────────────────┘
```

----

###### `max` {#docs:current:clients:python:relational_api::max}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
max(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Returns the maximum value present in a given expression

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to calculate the maximum value of.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

 rel.max(column="value", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬──────────────┐
│   description   │ max("value") │
│     varchar     │    int64     │
├─────────────────┼──────────────┤
│ value is even   │            8 │
│ value is uneven │            9 │
└─────────────────┴──────────────┘
```

----

###### `mean` {#docs:current:clients:python:relational_api::mean}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
mean(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the average of a given expression

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to calculate the mean value of.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.mean(column="value", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬──────────────┐
│   description   │ avg("value") │
│     varchar     │    double    │
├─────────────────┼──────────────┤
│ value is even   │          5.0 │
│ value is uneven │          5.0 │
└─────────────────┴──────────────┘
```

----

###### `median` {#docs:current:clients:python:relational_api::median}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
median(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the median over all values present in a given expression

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to calculate the median value of.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.median(column="value", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬─────────────────┐
│   description   │ median("value") │
│     varchar     │     double      │
├─────────────────┼─────────────────┤
│ value is even   │             5.0 │
│ value is uneven │             5.0 │
└─────────────────┴─────────────────┘
```

----

###### `min` {#docs:current:clients:python:relational_api::min}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
min(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Returns the minimum value present in a given expression

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to calculate the min value of.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.min(column="value", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬──────────────┐
│   description   │ min("value") │
│     varchar     │    int64     │
├─────────────────┼──────────────┤
│ value is uneven │            1 │
│ value is even   │            2 │
└─────────────────┴──────────────┘
```

----

###### `mode` {#docs:current:clients:python:relational_api::mode}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
mode(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the mode over all values present in a given expression

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to calculate the mode (most frequent value) of.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.mode(column="value", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬─────────────────┐
│   description   │ "mode"("value") │
│     varchar     │      int64      │
├─────────────────┼─────────────────┤
│ value is uneven │               1 │
│ value is even   │               2 │
└─────────────────┴─────────────────┘
```

----

###### `n_tile` {#docs:current:clients:python:relational_api::n_tile}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
n_tile(self: _duckdb.DuckDBPyRelation, window_spec: str, num_buckets: typing.SupportsInt, projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Divides the partition as equally as possible into num_buckets

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **window_spec** : str
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **num_buckets** : int
                            
	The number of buckets to divide the rows into.
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.n_tile(window_spec="over (partition by description)", num_buckets=2, projected_columns="description, value")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬───────┬──────────────────────────────────────────┐
│   description   │ value │ ntile(2) OVER (PARTITION BY description) │
│     varchar     │ int64 │                  int64                   │
├─────────────────┼───────┼──────────────────────────────────────────┤
│ value is uneven │     1 │                                        1 │
│ value is uneven │     3 │                                        1 │
│ value is uneven │     5 │                                        1 │
│ value is uneven │     7 │                                        2 │
│ value is uneven │     9 │                                        2 │
│ value is even   │     2 │                                        1 │
│ value is even   │     4 │                                        1 │
│ value is even   │     6 │                                        2 │
│ value is even   │     8 │                                        2 │
└─────────────────┴───────┴──────────────────────────────────────────┘
```

----

###### `nth_value` {#docs:current:clients:python:relational_api::nth_value}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
nth_value(self: _duckdb.DuckDBPyRelation, expression: str, window_spec: str, offset: typing.SupportsInt, ignore_nulls: bool = False, projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the nth value within the partition

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name from which to retrieve the nth value within the window.
- **window_spec** : str
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **offset** : int
                            
	The position of the value to retrieve within the window (1-based index).
- **ignore_nulls** : bool, default: False
                            
	Whether to ignore NULL values when computing the nth value.
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.nth_value(column="value", window_spec="over (partition by description)", projected_columns="description", offset=1)
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬───────────────────────────────────────────────────────┐
│   description   │ nth_value("value", 1) OVER (PARTITION BY description) │
│     varchar     │                         int64                         │
├─────────────────┼───────────────────────────────────────────────────────┤
│ value is even   │                                                     2 │
│ value is even   │                                                     2 │
│ value is even   │                                                     2 │
│ value is even   │                                                     2 │
│ value is uneven │                                                     1 │
│ value is uneven │                                                     1 │
│ value is uneven │                                                     1 │
│ value is uneven │                                                     1 │
│ value is uneven │                                                     1 │
└─────────────────┴───────────────────────────────────────────────────────┘
```

----

###### `percent_rank` {#docs:current:clients:python:relational_api::percent_rank}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
percent_rank(self: _duckdb.DuckDBPyRelation, window_spec: str, projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the relative rank within the partition

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **window_spec** : str
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.percent_rank(window_spec="over (partition by description order by value)", projected_columns="description, value")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬───────┬─────────────────────────────────────────────────────────────────┐
│   description   │ value │ percent_rank() OVER (PARTITION BY description ORDER BY "value") │
│     varchar     │ int64 │                             double                              │
├─────────────────┼───────┼─────────────────────────────────────────────────────────────────┤
│ value is even   │     2 │                                                             0.0 │
│ value is even   │     4 │                                              0.3333333333333333 │
│ value is even   │     6 │                                              0.6666666666666666 │
│ value is even   │     8 │                                                             1.0 │
│ value is uneven │     1 │                                                             0.0 │
│ value is uneven │     3 │                                                            0.25 │
│ value is uneven │     5 │                                                             0.5 │
│ value is uneven │     7 │                                                            0.75 │
│ value is uneven │     9 │                                                             1.0 │
└─────────────────┴───────┴─────────────────────────────────────────────────────────────────┘
```

----

###### `product` {#docs:current:clients:python:relational_api::product}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
product(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Returns the product of all values present in a given expression

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to calculate the product of.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.product(column="value", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬──────────────────┐
│   description   │ product("value") │
│     varchar     │      double      │
├─────────────────┼──────────────────┤
│ value is uneven │            945.0 │
│ value is even   │            384.0 │
└─────────────────┴──────────────────┘
```

----

###### `quantile` {#docs:current:clients:python:relational_api::quantile}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
quantile(self: _duckdb.DuckDBPyRelation, expression: str, q: object = 0.5, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the exact quantile value for a given expression

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to compute the quantile for.
- **q** : object, default: 0.5
                            
	The quantile value to compute (e.g., 0.5 for median).
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.quantile(column="value", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬──────────────────────────────────┐
│   description   │ quantile_disc("value", 0.500000) │
│     varchar     │              int64               │
├─────────────────┼──────────────────────────────────┤
│ value is uneven │                                5 │
│ value is even   │                                4 │
└─────────────────┴──────────────────────────────────┘
```

----

###### `quantile_cont` {#docs:current:clients:python:relational_api::quantile_cont}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
quantile_cont(self: _duckdb.DuckDBPyRelation, expression: str, q: object = 0.5, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the interpolated quantile value for a given expression

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to compute the continuous quantile for.
- **q** : object, default: 0.5
                            
	The quantile value to compute (e.g., 0.5 for median).
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.quantile_cont(column="value", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬──────────────────────────────────┐
│   description   │ quantile_cont("value", 0.500000) │
│     varchar     │              double              │
├─────────────────┼──────────────────────────────────┤
│ value is even   │                              5.0 │
│ value is uneven │                              5.0 │
└─────────────────┴──────────────────────────────────┘
```

----

###### `quantile_disc` {#docs:current:clients:python:relational_api::quantile_disc}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
quantile_disc(self: _duckdb.DuckDBPyRelation, expression: str, q: object = 0.5, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the exact quantile value for a given expression

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to compute the discrete quantile for.
- **q** : object, default: 0.5
                            
	The quantile value to compute (e.g., 0.5 for median).
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.quantile_disc(column="value", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬──────────────────────────────────┐
│   description   │ quantile_disc("value", 0.500000) │
│     varchar     │              int64               │
├─────────────────┼──────────────────────────────────┤
│ value is even   │                                4 │
│ value is uneven │                                5 │
└─────────────────┴──────────────────────────────────┘
```

----

###### `rank` {#docs:current:clients:python:relational_api::rank}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
rank(self: _duckdb.DuckDBPyRelation, window_spec: str, projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the rank within the partition

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **window_spec** : str
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.rank(window_spec="over (partition by description order by value)", projected_columns="description, value")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬───────┬─────────────────────────────────────────────────────────┐
│   description   │ value │ rank() OVER (PARTITION BY description ORDER BY "value") │
│     varchar     │ int64 │                          int64                          │
├─────────────────┼───────┼─────────────────────────────────────────────────────────┤
│ value is uneven │     1 │                                                       1 │
│ value is uneven │     3 │                                                       2 │
│ value is uneven │     5 │                                                       3 │
│ value is uneven │     7 │                                                       4 │
│ value is uneven │     9 │                                                       5 │
│ value is even   │     2 │                                                       1 │
│ value is even   │     4 │                                                       2 │
│ value is even   │     6 │                                                       3 │
│ value is even   │     8 │                                                       4 │
└─────────────────┴───────┴─────────────────────────────────────────────────────────┘
```

----

###### `rank_dense` {#docs:current:clients:python:relational_api::rank_dense}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
rank_dense(self: _duckdb.DuckDBPyRelation, window_spec: str, projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the dense rank within the partition

**Aliases**: [`dense_rank`](#::dense_rank)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **window_spec** : str
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

 rel.rank_dense(window_spec="over (partition by description order by value)", projected_columns="description, value")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬───────┬───────────────────────────────────────────────────────────────┐
│   description   │ value │ dense_rank() OVER (PARTITION BY description ORDER BY "value") │
│     varchar     │ int64 │                             int64                             │
├─────────────────┼───────┼───────────────────────────────────────────────────────────────┤
│ value is uneven │     1 │                                                             1 │
│ value is uneven │     3 │                                                             2 │
│ value is uneven │     5 │                                                             3 │
│ value is uneven │     7 │                                                             4 │
│ value is uneven │     9 │                                                             5 │
│ value is even   │     2 │                                                             1 │
│ value is even   │     4 │                                                             2 │
│ value is even   │     6 │                                                             3 │
│ value is even   │     8 │                                                             4 │
└─────────────────┴───────┴───────────────────────────────────────────────────────────────┘
```

----

###### `row_number` {#docs:current:clients:python:relational_api::row_number}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
row_number(self: _duckdb.DuckDBPyRelation, window_spec: str, projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the row number within the partition

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **window_spec** : str
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.row_number(window_spec="over (partition by description order by value)", projected_columns="description, value")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬───────┬───────────────────────────────────────────────────────────────┐
│   description   │ value │ row_number() OVER (PARTITION BY description ORDER BY "value") │
│     varchar     │ int64 │                             int64                             │
├─────────────────┼───────┼───────────────────────────────────────────────────────────────┤
│ value is uneven │     1 │                                                             1 │
│ value is uneven │     3 │                                                             2 │
│ value is uneven │     5 │                                                             3 │
│ value is uneven │     7 │                                                             4 │
│ value is uneven │     9 │                                                             5 │
│ value is even   │     2 │                                                             1 │
│ value is even   │     4 │                                                             2 │
│ value is even   │     6 │                                                             3 │
│ value is even   │     8 │                                                             4 │
└─────────────────┴───────┴───────────────────────────────────────────────────────────────┘
```

----

###### `select_dtypes` {#docs:current:clients:python:relational_api::select_dtypes}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
select_dtypes(self: _duckdb.DuckDBPyRelation, types: object) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Select columns from the relation, by filtering based on type(s)

**Aliases**: [`select_types`](#::select_types)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **types** : object
                            
	Data type(s) to select columns by. Can be a single type or a collection of types.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.select_dtypes(types=[duckdb.sqltypes.VARCHAR]).distinct()
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┐
│   description   │
│     varchar     │
├─────────────────┤
│ value is even   │
│ value is uneven │
└─────────────────┘
```

----

###### `select_types` {#docs:current:clients:python:relational_api::select_types}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
select_types(self: _duckdb.DuckDBPyRelation, types: object) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Select columns from the relation, by filtering based on type(s)

**Aliases**: [`select_dtypes`](#::select_dtypes)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **types** : object
                            
	Data type(s) to select columns by. Can be a single type or a collection of types.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.select_types(types=[duckdb.sqltypes.VARCHAR]).distinct()
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┐
│   description   │
│     varchar     │
├─────────────────┤
│ value is even   │
│ value is uneven │
└─────────────────┘
```

----

###### `std` {#docs:current:clients:python:relational_api::std}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
std(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the sample standard deviation for a given expression

**Aliases**: [`stddev`](#::stddev), [`stddev_samp`](#::stddev_samp)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to calculate the standard deviation for.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.std(column="value", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬──────────────────────┐
│   description   │ stddev_samp("value") │
│     varchar     │        double        │
├─────────────────┼──────────────────────┤
│ value is uneven │   3.1622776601683795 │
│ value is even   │    2.581988897471611 │
└─────────────────┴──────────────────────┘
```

----

###### `stddev` {#docs:current:clients:python:relational_api::stddev}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
stddev(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the sample standard deviation for a given expression

**Aliases**: [`std`](#::std), [`stddev_samp`](#::stddev_samp)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to calculate the standard deviation for.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.stddev(column="value", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬──────────────────────┐
│   description   │ stddev_samp("value") │
│     varchar     │        double        │
├─────────────────┼──────────────────────┤
│ value is even   │    2.581988897471611 │
│ value is uneven │   3.1622776601683795 │
└─────────────────┴──────────────────────┘
```

----

###### `stddev_pop` {#docs:current:clients:python:relational_api::stddev_pop}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
stddev_pop(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the population standard deviation for a given expression

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to calculate the standard deviation for.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.stddev_pop(column="value", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬─────────────────────┐
│   description   │ stddev_pop("value") │
│     varchar     │       double        │
├─────────────────┼─────────────────────┤
│ value is even   │    2.23606797749979 │
│ value is uneven │  2.8284271247461903 │
└─────────────────┴─────────────────────┘
```

----

###### `stddev_samp` {#docs:current:clients:python:relational_api::stddev_samp}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
stddev_samp(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the sample standard deviation for a given expression

**Aliases**: [`stddev`](#::stddev), [`std`](#::std)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to calculate the standard deviation for.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.stddev_samp(column="value", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬──────────────────────┐
│   description   │ stddev_samp("value") │
│     varchar     │        double        │
├─────────────────┼──────────────────────┤
│ value is even   │    2.581988897471611 │
│ value is uneven │   3.1622776601683795 │
└─────────────────┴──────────────────────┘
```

----

###### `string_agg` {#docs:current:clients:python:relational_api::string_agg}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
string_agg(self: _duckdb.DuckDBPyRelation, expression: str, sep: str = ',', groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Concatenates the values present in a given expression with a separator

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to concatenate values from.
- **sep** : str, default: ','
                            
	Separator string to use between concatenated values.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.string_agg(column="value", sep=",", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬──────────────────────────┐
│   description   │ string_agg("value", ',') │
│     varchar     │         varchar          │
├─────────────────┼──────────────────────────┤
│ value is even   │ 2,4,6,8                  │
│ value is uneven │ 1,3,5,7,9                │
└─────────────────┴──────────────────────────┘
```

----

###### `sum` {#docs:current:clients:python:relational_api::sum}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
sum(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the sum of all values present in a given expression

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to calculate the sum for.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.sum(column="value", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬──────────────┐
│   description   │ sum("value") │
│     varchar     │    int128    │
├─────────────────┼──────────────┤
│ value is even   │           20 │
│ value is uneven │           25 │
└─────────────────┴──────────────┘
```

----

###### `unique` {#docs:current:clients:python:relational_api::unique}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
unique(self: _duckdb.DuckDBPyRelation, unique_aggr: str) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Returns the distinct values in a column.

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **unique_aggr** : str
                            
	The column to get the distinct values for.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.unique(unique_aggr="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┐
│   description   │
│     varchar     │
├─────────────────┤
│ value is even   │
│ value is uneven │
└─────────────────┘
```

----

###### `value_counts` {#docs:current:clients:python:relational_api::value_counts}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
value_counts(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the number of elements present in a given expression, also projecting the original expression

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to count values from.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.value_counts(column="description", groups="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬────────────────────┐
│   description   │ count(description) │
│     varchar     │       int64        │
├─────────────────┼────────────────────┤
│ value is uneven │                  5 │
│ value is even   │                  4 │
└─────────────────┴────────────────────┘
```

----

###### `var` {#docs:current:clients:python:relational_api::var}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
var(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the sample variance for a given expression

**Aliases**: [`variance`](#::variance), [`var_samp`](#::var_samp)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to calculate the sample variance for.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.var(column="value", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬───────────────────┐
│   description   │ var_samp("value") │
│     varchar     │      double       │
├─────────────────┼───────────────────┤
│ value is even   │ 6.666666666666667 │
│ value is uneven │              10.0 │
└─────────────────┴───────────────────┘
```

----

###### `var_pop` {#docs:current:clients:python:relational_api::var_pop}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
var_pop(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the population variance for a given expression

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to calculate the population variance for.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.var_pop(column="value", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬──────────────────┐
│   description   │ var_pop("value") │
│     varchar     │      double      │
├─────────────────┼──────────────────┤
│ value is even   │              5.0 │
│ value is uneven │              8.0 │
└─────────────────┴──────────────────┘
```

----

###### `var_samp` {#docs:current:clients:python:relational_api::var_samp}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
var_samp(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the sample variance for a given expression

**Aliases**: [`variance`](#::variance), [`var`](#::var)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to calculate the sample variance for.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.var_samp(column="value", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬───────────────────┐
│   description   │ var_samp("value") │
│     varchar     │      double       │
├─────────────────┼───────────────────┤
│ value is even   │ 6.666666666666667 │
│ value is uneven │              10.0 │
└─────────────────┴───────────────────┘
```

----

###### `variance` {#docs:current:clients:python:relational_api::variance}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
variance(self: _duckdb.DuckDBPyRelation, expression: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Computes the sample variance for a given expression

**Aliases**: [`var`](#::var), [`var_samp`](#::var_samp)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **column** : str
                            
	The column name to calculate the sample variance for.
- **groups** : str, default: ''
                            
	Comma-separated list of columns to include in the `group by`.
- **window_spec** : str, default: ''
                            
	Optional window specification for window functions, provided as `over (partition by ... order by ...)`
- **projected_columns** : str, default: ''
                            
	Comma-separated list of columns to include in the result.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.variance(column="value", groups="description", projected_columns="description")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌─────────────────┬───────────────────┐
│   description   │ var_samp("value") │
│     varchar     │      double       │
├─────────────────┼───────────────────┤
│ value is even   │ 6.666666666666667 │
│ value is uneven │              10.0 │
└─────────────────┴───────────────────┘
```

#### Output  {#docs:current:clients:python:relational_api::output-}

This section contains the functions which will trigger an SQL execution and retrieve the data.

| Name | Description |
|:--|:-------|
| [`arrow`](#::arrow) | Alias of to_arrow_reader(). We recommend using to_arrow_reader() instead. |
| [`close`](#::close) | Closes the result |
| [`create`](#::create) | Creates a new table named table_name with the contents of the relation object |
| [`create_view`](#::create_view) | Creates a view named view_name that refers to the relation object |
| [`df`](#::df) | Execute and fetch all rows as a pandas DataFrame |
| [`execute`](#::execute) | Transform the relation into a result set |
| [`fetch_arrow_reader`](#::fetch_arrow_reader) | Execute and return an Arrow Record Batch Reader that yields all rows |
| [`fetch_arrow_table`](#::fetch_arrow_table) | Execute and fetch all rows as an Arrow Table |
| [`fetch_df_chunk`](#::fetch_df_chunk) | Execute and fetch a chunk of the rows |
| [`fetch_record_batch`](#::fetch_record_batch) | Execute and return an Arrow Record Batch Reader that yields all rows |
| [`fetchall`](#::fetchall) | Execute and fetch all rows as a list of tuples |
| [`fetchdf`](#::fetchdf) | Execute and fetch all rows as a pandas DataFrame |
| [`fetchmany`](#::fetchmany) | Execute and fetch the next set of rows as a list of tuples |
| [`fetchnumpy`](#::fetchnumpy) | Execute and fetch all rows as a Python dict mapping each column to one numpy arrays |
| [`fetchone`](#::fetchone) | Execute and fetch a single row as a tuple |
| [`pl`](#::pl) | Execute and fetch all rows as a Polars DataFrame |
| [`tf`](#::tf) | Fetch a result as dict of TensorFlow Tensors |
| [`to_arrow_reader`](#::to_arrow_reader) | Execute and return an Arrow Record Batch Reader that yields all rows |
| [`to_arrow_table`](#::to_arrow_table) | Execute and fetch all rows as an Arrow Table |
| [`to_csv`](#::to_csv) | Write the relation object to a CSV file in 'file_name' |
| [`to_df`](#::to_df) | Execute and fetch all rows as a pandas DataFrame |
| [`to_parquet`](#::to_parquet) | Write the relation object to a Parquet file in 'file_name' |
| [`to_table`](#::to_table) | Creates a new table named table_name with the contents of the relation object |
| [`to_view`](#::to_view) | Creates a view named view_name that refers to the relation object |
| [`torch`](#::torch) | Fetch a result as dict of PyTorch Tensors |
| [`write_csv`](#::write_csv) | Write the relation object to a CSV file in 'file_name' |
| [`write_parquet`](#::write_parquet) | Write the relation object to a Parquet file in 'file_name' |

###### `arrow` {#docs:current:clients:python:relational_api::arrow}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
arrow(self: _duckdb.DuckDBPyRelation, batch_size: typing.SupportsInt = 1000000) -> pyarrow.lib.RecordBatchReader
```

####### Description {#docs:current:clients:python:relational_api::description}

Alias of to_arrow_reader(). We recommend using to_arrow_reader() instead.

> We recommend using [`to_arrow_reader()`](#::to_arrow_reader) instead.

**Aliases**: [`to_arrow_reader`](#::to_arrow_reader)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **batch_size** : int, default: 1000000
                            
	The batch size for fetching the data.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

pa_reader = rel.arrow(batch_size=1)

pa_reader.read_next_batch()
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
pyarrow.RecordBatch
id: string
description: string
value: int64
created_timestamp: timestamp[us, tz=Europe/Amsterdam]
----
id: ["e4ab8cb4-4609-40cb-ad7e-4304ed5ed4bd"]
description: ["value is even"]
value: [2]
created_timestamp: [2025-04-10 09:25:51.259000Z]
```

----

###### `close` {#docs:current:clients:python:relational_api::close}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
close(self: _duckdb.DuckDBPyRelation) -> None
```

####### Description {#docs:current:clients:python:relational_api::description}

Closes the result

----

###### `create` {#docs:current:clients:python:relational_api::create}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
create(self: _duckdb.DuckDBPyRelation, table_name: str) -> None
```

####### Description {#docs:current:clients:python:relational_api::description}

Creates a new table named table_name with the contents of the relation object

**Aliases**: [`to_table`](#::to_table)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **table_name** : str
                            
	The name of the table to be created. There shouldn't be any other table with the same name.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.create("table_code_example")

duckdb_conn.table("table_code_example").limit(1)
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌──────────────────────────────────────┬─────────────────┬───────┬────────────────────────────┐
│                  id                  │   description   │ value │     created_timestamp      │
│                 uuid                 │     varchar     │ int64 │  timestamp with time zone  │
├──────────────────────────────────────┼─────────────────┼───────┼────────────────────────────┤
│ 3ac9e0ba-8390-4a02-ad72-33b1caea6354 │ value is uneven │     1 │ 2025-04-10 11:07:12.614+02 │
└──────────────────────────────────────┴─────────────────┴───────┴────────────────────────────┘
```

----

###### `create_view` {#docs:current:clients:python:relational_api::create_view}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
create_view(self: _duckdb.DuckDBPyRelation, view_name: str, replace: bool = True) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Creates a view named view_name that refers to the relation object

**Aliases**: [`to_view`](#::to_view)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **view_name** : str
                            
	The name of the view to be created.
- **replace** : bool, default: True
                            
	If the view should be created with `CREATE OR REPLACE`. When set to `False`, there shouldn't be another view with the same `view_name`.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.create_view("view_code_example", replace=True)

duckdb_conn.table("view_code_example").limit(1)
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌──────────────────────────────────────┬─────────────────┬───────┬────────────────────────────┐
│                  id                  │   description   │ value │     created_timestamp      │
│                 uuid                 │     varchar     │ int64 │  timestamp with time zone  │
├──────────────────────────────────────┼─────────────────┼───────┼────────────────────────────┤
│ 3ac9e0ba-8390-4a02-ad72-33b1caea6354 │ value is uneven │     1 │ 2025-04-10 11:07:12.614+02 │
└──────────────────────────────────────┴─────────────────┴───────┴────────────────────────────┘
```

----

###### `df` {#docs:current:clients:python:relational_api::df}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
df(self: _duckdb.DuckDBPyRelation, *, date_as_object: bool = False) -> pandas.DataFrame
```

####### Description {#docs:current:clients:python:relational_api::description}

Execute and fetch all rows as a pandas DataFrame

**Aliases**: [`fetchdf`](#::fetchdf), [`to_df`](#::to_df)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **date_as_object** : bool, default: False
                            
	If the date columns should be interpreted as Python date objects.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.df()
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
                                     id      description  value                created_timestamp
0  3ac9e0ba-8390-4a02-ad72-33b1caea6354  value is uneven      1 2025-04-10 11:07:12.614000+02:00
1  8b844392-1404-4bbc-b731-120f42c8ca27    value is even      2 2025-04-10 11:08:12.614000+02:00
2  ca5584ca-8e97-4fca-a295-ae3c16c32f5b  value is uneven      3 2025-04-10 11:09:12.614000+02:00
...
```

----

###### `execute` {#docs:current:clients:python:relational_api::execute}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
execute(self: _duckdb.DuckDBPyRelation) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Transform the relation into a result set

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.execute()
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
┌──────────────────────────────────────┬─────────────────┬───────┬────────────────────────────┐
│                  id                  │   description   │ value │     created_timestamp      │
│                 uuid                 │     varchar     │ int64 │  timestamp with time zone  │
├──────────────────────────────────────┼─────────────────┼───────┼────────────────────────────┤
│ 3ac9e0ba-8390-4a02-ad72-33b1caea6354 │ value is uneven │     1 │ 2025-04-10 11:07:12.614+02 │
│ 8b844392-1404-4bbc-b731-120f42c8ca27 │ value is even   │     2 │ 2025-04-10 11:08:12.614+02 │
│ ca5584ca-8e97-4fca-a295-ae3c16c32f5b │ value is uneven │     3 │ 2025-04-10 11:09:12.614+02 │
```

----

###### `fetch_arrow_reader` {#docs:current:clients:python:relational_api::fetch_arrow_reader}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
fetch_arrow_reader(self: object, batch_size: typing.SupportsInt = 1000000) -> object
```

####### Description {#docs:current:clients:python:relational_api::description}

Execute and return an Arrow Record Batch Reader that yields all rows

> **Deprecated.** `fetch_arrow_reader()` is deprecated. Use [`to_arrow_reader()`](#::to_arrow_reader) instead.

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **batch_size** : int, default: 1000000
                            
	The batch size for fetching the data.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

pa_reader = rel.fetch_arrow_reader(batch_size=1)

pa_reader.read_next_batch()
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
pyarrow.RecordBatch
id: string
description: string
value: int64
created_timestamp: timestamp[us, tz=Europe/Amsterdam]
----
id: ["e4ab8cb4-4609-40cb-ad7e-4304ed5ed4bd"]
description: ["value is even"]
value: [2]
created_timestamp: [2025-04-10 09:25:51.259000Z]
```

----

###### `fetch_arrow_table` {#docs:current:clients:python:relational_api::fetch_arrow_table}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
fetch_arrow_table(self: object, batch_size: typing.SupportsInt = 1000000) -> object
```

####### Description {#docs:current:clients:python:relational_api::description}

Execute and fetch all rows as an Arrow Table

> **Deprecated.** `fetch_arrow_table()` is deprecated. Use [`to_arrow_table()`](#::to_arrow_table) instead.

**Aliases**: [`arrow`](#::arrow), [`to_arrow_table`](#::to_arrow_table)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **batch_size** : int, default: 1000000
                            
	The batch size for fetching the data.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.fetch_arrow_table()
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
pyarrow.Table
id: string
description: string
value: int64
created_timestamp: timestamp[us, tz=Europe/Amsterdam]
----
id: [["1587b4b0-3023-49fe-82cf-06303ca136ac","e4ab8cb4-4609-40cb-ad7e-4304ed5ed4bd","3f8ad67a-290f-4a22-b41b-0173b8e45afa","9a4e37ef-d8bd-46dd-ab01-51cf4973549f","12baa624-ebc9-45ae-b73e-6f4029e31d2d","56d41292-53cc-48be-a1b8-e1f5d6ca5581","1accca18-c950-47c1-9108-aef8afbd5249","56d8db75-72c4-4d40-90d2-a3c840579c37","e19f6201-8646-401c-b019-e37c42c39632"]]
description: [["value is uneven","value is even","value is uneven","value is even","value is uneven","value is even","value is uneven","value is even","value is uneven"]]
value: [[1,2,3,4,5,6,7,8,9]]
created_timestamp: [[2025-04-10 09:24:51.259000Z,2025-04-10 09:25:51.259000Z,2025-04-10 09:26:51.259000Z,2025-04-10 09:27:51.259000Z,2025-04-10 09:28:51.259000Z,2025-04-10 09:29:51.259000Z,2025-04-10 09:30:51.259000Z,2025-04-10 09:31:51.259000Z,2025-04-10 09:32:51.259000Z]]
```

----

###### `fetch_df_chunk` {#docs:current:clients:python:relational_api::fetch_df_chunk}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
fetch_df_chunk(self: _duckdb.DuckDBPyRelation, vectors_per_chunk: typing.SupportsInt = 1, *, date_as_object: bool = False) -> pandas.DataFrame
```

####### Description {#docs:current:clients:python:relational_api::description}

Execute and fetch a chunk of the rows

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **vectors_per_chunk** : int, default: 1
                            
	Number of data chunks to be processed before converting to dataframe.
- **date_as_object** : bool, default: False
                            
	If the date columns should be interpreted as Python date objects.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.fetch_df_chunk()
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
                                     id      description  value                created_timestamp
0  1587b4b0-3023-49fe-82cf-06303ca136ac  value is uneven      1 2025-04-10 11:24:51.259000+02:00
1  e4ab8cb4-4609-40cb-ad7e-4304ed5ed4bd    value is even      2 2025-04-10 11:25:51.259000+02:00
2  3f8ad67a-290f-4a22-b41b-0173b8e45afa  value is uneven      3 2025-04-10 11:26:51.259000+02:00
...
```

----

###### `fetch_record_batch` {#docs:current:clients:python:relational_api::fetch_record_batch}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
fetch_record_batch(self: object, rows_per_batch: typing.SupportsInt = 1000000) -> object
```

####### Description {#docs:current:clients:python:relational_api::description}

Execute and return an Arrow Record Batch Reader that yields all rows

> **Deprecated.** `fetch_record_batch()` is deprecated. Use [`to_arrow_reader()`](#::to_arrow_reader) instead.

**Aliases**: [`record_batch`](#::record_batch)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **rows_per_batch** : int, default: 1000000
                            
	The number of rows per batch.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

pa_reader = rel.fetch_record_batch(rows_per_batch=1)

pa_reader.read_next_batch()
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
pyarrow.RecordBatch
id: string
description: string
value: int64
created_timestamp: timestamp[us, tz=Europe/Amsterdam]
----
id: ["908cf67c-a086-4b94-9017-2089a83e4a6c"]
description: ["value is uneven"]
value: [1]
created_timestamp: [2025-04-10 09:52:55.249000Z]
```

----

###### `fetchall` {#docs:current:clients:python:relational_api::fetchall}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
fetchall(self: _duckdb.DuckDBPyRelation) -> list
```

####### Description {#docs:current:clients:python:relational_api::description}

Execute and fetch all rows as a list of tuples

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.limit(1).fetchall()
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
[(UUID('1587b4b0-3023-49fe-82cf-06303ca136ac'),
  'value is uneven',
  1,
  datetime.datetime(2025, 4, 10, 11, 24, 51, 259000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))]
```

----

###### `fetchdf` {#docs:current:clients:python:relational_api::fetchdf}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
fetchdf(self: _duckdb.DuckDBPyRelation, *, date_as_object: bool = False) -> pandas.DataFrame
```

####### Description {#docs:current:clients:python:relational_api::description}

Execute and fetch all rows as a pandas DataFrame

**Aliases**: [`df`](#::df), [`to_df`](#::to_df)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **date_as_object** : bool, default: False
                            
	If the date columns should be interpreted as Python date objects.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.fetchdf()
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
                                     id      description  value                created_timestamp
0  1587b4b0-3023-49fe-82cf-06303ca136ac  value is uneven      1 2025-04-10 11:24:51.259000+02:00
1  e4ab8cb4-4609-40cb-ad7e-4304ed5ed4bd    value is even      2 2025-04-10 11:25:51.259000+02:00
2  3f8ad67a-290f-4a22-b41b-0173b8e45afa  value is uneven      3 2025-04-10 11:26:51.259000+02:00
...
```

----

###### `fetchmany` {#docs:current:clients:python:relational_api::fetchmany}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
fetchmany(self: _duckdb.DuckDBPyRelation, size: typing.SupportsInt = 1) -> list
```

####### Description {#docs:current:clients:python:relational_api::description}

Execute and fetch the next set of rows as a list of tuples


>Warning Executing any operation during the retrieval of the data from an [aggregate](#::aggregate) relation,
>will close the result set.
>```python
>import duckdb
>
>duckdb_conn = duckdb.connect()
>
>rel = duckdb_conn.sql("""
>       select 
>           gen_random_uuid() as id, 
>           concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
>           range as value, 
>           now() + concat(range,' ', 'minutes')::interval as created_timestamp
>       from range(1, 10)
>    """
>)
>
>agg_rel = rel.aggregate("value")
>
>while res := agg_rel.fetchmany(size=1):
>    print(res)
>    rel.show()
>```


####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **size** : int, default: 1
                            
	The number of records to be fetched.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

while res := rel.fetchmany(size=1):
    print(res)
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
[(UUID('cf4c5e32-d0aa-4699-a3ee-0092e900f263'), 'value is uneven', 1, datetime.datetime(2025, 4, 30, 16, 23, 5, 310000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))]
[(UUID('cec335ac-24ac-49a3-ae9a-bb35f71fc88d'), 'value is even', 2, datetime.datetime(2025, 4, 30, 16, 24, 5, 310000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))]
[(UUID('2423295d-9bb0-453c-a385-21bdacba03b6'), 'value is uneven', 3, datetime.datetime(2025, 4, 30, 16, 25, 5, 310000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))]
[(UUID('88806b21-192d-41e7-a293-c789aad636ba'), 'value is even', 4, datetime.datetime(2025, 4, 30, 16, 26, 5, 310000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))]
[(UUID('05837a28-dacf-4121-88a6-a374aefb8a07'), 'value is uneven', 5, datetime.datetime(2025, 4, 30, 16, 27, 5, 310000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))]
[(UUID('b9c1f7e9-6156-4554-b80e-67d3b5d810bb'), 'value is even', 6, datetime.datetime(2025, 4, 30, 16, 28, 5, 310000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))]
[(UUID('4709c7fa-d286-4864-bb48-69748b447157'), 'value is uneven', 7, datetime.datetime(2025, 4, 30, 16, 29, 5, 310000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))]
[(UUID('30e48457-b103-4fa5-95cf-1c7f0143335b'), 'value is even', 8, datetime.datetime(2025, 4, 30, 16, 30, 5, 310000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))]
[(UUID('036b7f4b-bd78-4ffb-a351-964d93f267b7'), 'value is uneven', 9, datetime.datetime(2025, 4, 30, 16, 31, 5, 310000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))]
```

----

###### `fetchnumpy` {#docs:current:clients:python:relational_api::fetchnumpy}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
fetchnumpy(self: _duckdb.DuckDBPyRelation) -> dict
```

####### Description {#docs:current:clients:python:relational_api::description}

Execute and fetch all rows as a Python dict mapping each column to one numpy arrays

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.fetchnumpy()
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
{'id': array([UUID('1587b4b0-3023-49fe-82cf-06303ca136ac'),
        UUID('e4ab8cb4-4609-40cb-ad7e-4304ed5ed4bd'),
        UUID('3f8ad67a-290f-4a22-b41b-0173b8e45afa'),
        UUID('9a4e37ef-d8bd-46dd-ab01-51cf4973549f'),
        UUID('12baa624-ebc9-45ae-b73e-6f4029e31d2d'),
        UUID('56d41292-53cc-48be-a1b8-e1f5d6ca5581'),
        UUID('1accca18-c950-47c1-9108-aef8afbd5249'),
        UUID('56d8db75-72c4-4d40-90d2-a3c840579c37'),
        UUID('e19f6201-8646-401c-b019-e37c42c39632')], dtype=object),
 'description': array(['value is uneven', 'value is even', 'value is uneven',
        'value is even', 'value is uneven', 'value is even',
        'value is uneven', 'value is even', 'value is uneven'],
       dtype=object),
 'value': array([1, 2, 3, 4, 5, 6, 7, 8, 9]),
 'created_timestamp': array(['2025-04-10T09:24:51.259000', '2025-04-10T09:25:51.259000',
        '2025-04-10T09:26:51.259000', '2025-04-10T09:27:51.259000',
        '2025-04-10T09:28:51.259000', '2025-04-10T09:29:51.259000',
        '2025-04-10T09:30:51.259000', '2025-04-10T09:31:51.259000',
        '2025-04-10T09:32:51.259000'], dtype='datetime64[us]')}
```

----

###### `fetchone` {#docs:current:clients:python:relational_api::fetchone}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
fetchone(self: _duckdb.DuckDBPyRelation) -> typing.Optional[tuple]
```

####### Description {#docs:current:clients:python:relational_api::description}

Execute and fetch a single row as a tuple


>Warning Executing any operation during the retrieval of the data from an [aggregate](#::aggregate) relation,
>will close the result set.
>```python
>import duckdb
>
>duckdb_conn = duckdb.connect()
>
>rel = duckdb_conn.sql("""
>       select 
>           gen_random_uuid() as id, 
>           concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
>           range as value, 
>           now() + concat(range,' ', 'minutes')::interval as created_timestamp
>       from range(1, 10)
>    """
>)
>
>agg_rel = rel.aggregate("value")
>
>while res := agg_rel.fetchone():
>    print(res)
>    rel.show()
>```


####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

while res := rel.fetchone():
    print(res)
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
(UUID('fe036411-f4c7-4f52-9ddd-80cd2bb56613'), 'value is uneven', 1, datetime.datetime(2025, 4, 30, 12, 59, 8, 912000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))
(UUID('466c9b43-e9f0-4237-8f26-155f259a5b59'), 'value is even', 2, datetime.datetime(2025, 4, 30, 13, 0, 8, 912000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))
(UUID('5755cf16-a94f-41ef-a16d-21e856d71f9f'), 'value is uneven', 3, datetime.datetime(2025, 4, 30, 13, 1, 8, 912000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))
(UUID('05b52c93-bd68-45e1-b02a-a08d682c33d5'), 'value is even', 4, datetime.datetime(2025, 4, 30, 13, 2, 8, 912000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))
(UUID('cf61ef13-2840-4541-900d-f493767d7622'), 'value is uneven', 5, datetime.datetime(2025, 4, 30, 13, 3, 8, 912000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))
(UUID('033e7c68-e800-4ee8-9787-6cf50aabc27b'), 'value is even', 6, datetime.datetime(2025, 4, 30, 13, 4, 8, 912000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))
(UUID('8b8d6545-ff54-45d6-b69a-97edb63dfe43'), 'value is uneven', 7, datetime.datetime(2025, 4, 30, 13, 5, 8, 912000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))
(UUID('7da79dfe-b29c-462b-a414-9d5e3cc80139'), 'value is even', 8, datetime.datetime(2025, 4, 30, 13, 6, 8, 912000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))
(UUID('f83ffff2-33b9-4f86-9d14-46974b546bab'), 'value is uneven', 9, datetime.datetime(2025, 4, 30, 13, 7, 8, 912000, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>))
```

----

###### `pl` {#docs:current:clients:python:relational_api::pl}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
pl(self: _duckdb.DuckDBPyRelation, batch_size: typing.SupportsInt = 1000000, *, lazy: bool = False) -> duckdb::PolarsDataFrame
```

####### Description {#docs:current:clients:python:relational_api::description}

Execute and fetch all rows as a Polars DataFrame

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **batch_size** : int, default: 1000000
                            
	The number of records to be fetched per batch.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.pl(batch_size=1)
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
shape: (9, 4)
┌─────────────────────────────────┬─────────────────┬───────┬────────────────────────────────┐
│ id                              ┆ description     ┆ value ┆ created_timestamp              │
│ ---                             ┆ ---             ┆ ---   ┆ ---                            │
│ str                             ┆ str             ┆ i64   ┆ datetime[μs, Europe/Amsterdam] │
╞═════════════════════════════════╪═════════════════╪═══════╪════════════════════════════════╡
│ b2f92c3c-9372-49f3-897f-2c86fc… ┆ value is uneven ┆ 1     ┆ 2025-04-10 11:49:51.886 CEST   │
```

----

###### `tf` {#docs:current:clients:python:relational_api::tf}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
tf(self: _duckdb.DuckDBPyRelation) -> dict
```

####### Description {#docs:current:clients:python:relational_api::description}

Fetch a result as dict of TensorFlow Tensors

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.select("description, value").tf()
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
{'description': <tf.Tensor: shape=(9,), dtype=string, numpy=
 array([b'value is uneven', b'value is even', b'value is uneven',
        b'value is even', b'value is uneven', b'value is even',
        b'value is uneven', b'value is even', b'value is uneven'],
       dtype=object)>,
 'value': <tf.Tensor: shape=(9,), dtype=int64, numpy=array([1, 2, 3, 4, 5, 6, 7, 8, 9])>}
```

----

###### `to_arrow_reader` {#docs:current:clients:python:relational_api::to_arrow_reader}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
to_arrow_reader(self: _duckdb.DuckDBPyRelation, batch_size: typing.SupportsInt = 1000000) -> pyarrow.lib.RecordBatchReader
```

####### Description {#docs:current:clients:python:relational_api::description}

Execute and return an Arrow Record Batch Reader that yields all rows

**Aliases**: [`arrow`](#::arrow)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **batch_size** : int, default: 1000000
                            
	The batch size for fetching the data.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

pa_reader = rel.to_arrow_reader(batch_size=1)

pa_reader.read_next_batch()
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
pyarrow.RecordBatch
id: string
description: string
value: int64
created_timestamp: timestamp[us, tz=Europe/Amsterdam]
----
id: ["e4ab8cb4-4609-40cb-ad7e-4304ed5ed4bd"]
description: ["value is even"]
value: [2]
created_timestamp: [2025-04-10 09:25:51.259000Z]
```

----

###### `to_arrow_table` {#docs:current:clients:python:relational_api::to_arrow_table}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
to_arrow_table(self: _duckdb.DuckDBPyRelation, batch_size: typing.SupportsInt = 1000000) -> pyarrow.lib.Table
```

####### Description {#docs:current:clients:python:relational_api::description}

Execute and fetch all rows as an Arrow Table

**Aliases**: [`fetch_arrow_table`](#::fetch_arrow_table)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **batch_size** : int, default: 1000000
                            
	The batch size for fetching the data.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.to_arrow_table()
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
pyarrow.Table
id: string
description: string
value: int64
created_timestamp: timestamp[us, tz=Europe/Amsterdam]
----
id: [["86b2011d-3818-426f-a41e-7cd5c7321f79","07fa4f89-0bba-4049-9acd-c933332a66d5","f2f1479e-f582-4fe4-b82f-9b753b69634c","529d3c63-5961-4adb-b0a8-8249188fc82a","aa9eea7d-7fac-4dcf-8f32-4a0b5d64f864","4852aa32-03f2-40d3-8006-b8213904775a","c0127203-f2e3-4925-9810-655bc02a3c19","2a1356ba-5707-44d6-a492-abd0a67e5efb","800a1c24-231c-4dae-bd68-627654c8a110"]]
description: [["value is uneven","value is even","value is uneven","value is even","value is uneven","value is even","value is uneven","value is even","value is uneven"]]
value: [[1,2,3,4,5,6,7,8,9]]
created_timestamp: [[2025-04-10 09:54:24.015000Z,2025-04-10 09:55:24.015000Z,2025-04-10 09:56:24.015000Z,2025-04-10 09:57:24.015000Z,2025-04-10 09:58:24.015000Z,2025-04-10 09:59:24.015000Z,2025-04-10 10:00:24.015000Z,2025-04-10 10:01:24.015000Z,2025-04-10 10:02:24.015000Z]]
```

----

###### `to_csv` {#docs:current:clients:python:relational_api::to_csv}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
to_csv(self: _duckdb.DuckDBPyRelation, file_name: str, *, sep: object = None, na_rep: object = None, header: object = None, quotechar: object = None, escapechar: object = None, date_format: object = None, timestamp_format: object = None, quoting: object = None, encoding: object = None, compression: object = None, overwrite: object = None, per_thread_output: object = None, use_tmp_file: object = None, partition_by: object = None, write_partition_columns: object = None) -> None
```

####### Description {#docs:current:clients:python:relational_api::description}

Write the relation object to a CSV file in 'file_name'

**Aliases**: [`write_csv`](#::write_csv)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **file_name** : str
                            
	The name of the output CSV file.
- **sep** : str, default: ','
                            
	Field delimiter for the output file.
- **na_rep** : str, default: ''
                            
	Missing data representation.
- **header** : bool, default: True
                            
	Whether to write column headers.
- **quotechar** : str, default: '"'
                            
	Character used to quote fields containing special characters.
- **escapechar** : str, default: None
                            
	Character used to escape the delimiter if quoting is set to QUOTE_NONE.
- **date_format** : str, default: None
                            
	Custom format string for DATE values.
- **timestamp_format** : str, default: None
                            
	Custom format string for TIMESTAMP values.
- **quoting** : int, default: csv.QUOTE_MINIMAL
                            
	Control field quoting behavior (e.g., QUOTE_MINIMAL, QUOTE_ALL).
- **encoding** : str, default: 'utf-8'
                            
	Character encoding for the output file.
- **compression** : str, default: auto
                            
	Compression type (e.g., 'gzip', 'bz2', 'zstd').
- **overwrite** : bool, default: False
                            
	When true, all existing files inside targeted directories will be removed (not supported on remote filesystems). Only has an effect when used with `partition_by`.
- **per_thread_output** : bool, default: False
                            
	When `true`, write one file per thread, rather than one file in total. This allows for faster parallel writing.
- **use_tmp_file** : bool, default: False
                            
	Write to a temporary file before renaming to final name to avoid partial writes.
- **partition_by** : list[str], default: None
                            
	List of column names to partition output by (creates folder structure).
- **write_partition_columns** : bool, default: False
                            
	Whether or not to write partition columns into files. Only has an effect when used with `partition_by`.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.to_csv("code_example.csv")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
The data is exported to a CSV file, named code_example.csv
```

----

###### `to_df` {#docs:current:clients:python:relational_api::to_df}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
to_df(self: _duckdb.DuckDBPyRelation, *, date_as_object: bool = False) -> pandas.DataFrame
```

####### Description {#docs:current:clients:python:relational_api::description}

Execute and fetch all rows as a pandas DataFrame

**Aliases**: [`fetchdf`](#::fetchdf), [`df`](#::df)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **date_as_object** : bool, default: False
                            
	If the date columns should be interpreted as Python date objects.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.to_df()
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
                                     id      description  value                created_timestamp
0  e1f79925-60fd-4ee2-ae67-5eff6b0543d1  value is uneven      1 2025-04-10 11:56:04.452000+02:00
1  caa619d4-d79c-4c00-b82e-9319b086b6f8    value is even      2 2025-04-10 11:57:04.452000+02:00
2  64c68032-99b9-4e8f-b4a3-6c522d5419b3  value is uneven      3 2025-04-10 11:58:04.452000+02:00
...
```

----

###### `to_parquet` {#docs:current:clients:python:relational_api::to_parquet}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
to_parquet(self: _duckdb.DuckDBPyRelation, file_name: str, *, compression: object = None, field_ids: object = None, row_group_size_bytes: object = None, row_group_size: object = None, overwrite: object = None, per_thread_output: object = None, use_tmp_file: object = None, partition_by: object = None, write_partition_columns: object = None, append: object = None, filename_pattern: object = None, file_size_bytes: object = None) -> None
```

####### Description {#docs:current:clients:python:relational_api::description}

Write the relation object to a Parquet file in 'file_name'

**Aliases**: [`write_parquet`](#::write_parquet)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **file_name** : str
                            
	The name of the output Parquet file.
- **compression** : str, default: 'snappy'
                            
	The compression format to use (` uncompressed`, `snappy`, `gzip`, `zstd`, `brotli`, `lz4`, `lz4_raw`).
- **field_ids** : STRUCT
                            
	The field_id for each column. Pass auto to attempt to infer automatically.
- **row_group_size_bytes** : int, default: row_group_size * 1024
                            
	The target size of each row group. You can pass either a human-readable string, e.g., 2MB, or an integer, i.e., the number of bytes. This option is only used when you have issued `SET preserve_insertion_order = false;`, otherwise, it is ignored.
- **row_group_size** : int, default: 122880
                            
	The target size, i.e., number of rows, of each row group.
- **overwrite** : bool, default: False
                            
	If True, overwrite the file if it exists.
- **per_thread_output** : bool, default: False
                            
	When `True`, write one file per thread, rather than one file in total. This allows for faster parallel writing.
- **use_tmp_file** : bool, default: False
                            
	Write to a temporary file before renaming to final name to avoid partial writes.
- **partition_by** : list[str], default: None
                            
	List of column names to partition output by (creates folder structure).
- **write_partition_columns** : bool, default: False
                            
	Whether or not to write partition columns into files. Only has an effect when used with `partition_by`.
- **append** : bool, default: False
                            
	When `True`, in the event a filename pattern is generated that already exists, the path will be regenerated to ensure no existing files are overwritten. Only has an effect when used with `partition_by`.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.to_parquet("code_example.parquet")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
The data is exported to a Parquet file, named code_example.parquet
```

----

###### `to_table` {#docs:current:clients:python:relational_api::to_table}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
to_table(self: _duckdb.DuckDBPyRelation, table_name: str) -> None
```

####### Description {#docs:current:clients:python:relational_api::description}

Creates a new table named table_name with the contents of the relation object

**Aliases**: [`create`](#::create)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **table_name** : str
                            
	The name of the table to be created. There shouldn't be any other table with the same name.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.to_table("table_code_example")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
A table, named table_code_example, is created with the data of the relation
```

----

###### `to_view` {#docs:current:clients:python:relational_api::to_view}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
to_view(self: _duckdb.DuckDBPyRelation, view_name: str, replace: bool = True) -> _duckdb.DuckDBPyRelation
```

####### Description {#docs:current:clients:python:relational_api::description}

Creates a view named view_name that refers to the relation object

**Aliases**: [`create_view`](#::create_view)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **view_name** : str
                            
	The name of the view to be created.
- **replace** : bool, default: True
                            
	If the view should be created with `CREATE OR REPLACE`. When set to `False`, there shouldn't be another view with the same `view_name`.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.to_view("view_code_example", replace=True)
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
A view, named view_code_example, is created with the query definition of the relation
```

----

###### `torch` {#docs:current:clients:python:relational_api::torch}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
torch(self: _duckdb.DuckDBPyRelation) -> dict
```

####### Description {#docs:current:clients:python:relational_api::description}

Fetch a result as dict of PyTorch Tensors

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.select("value").torch()
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
{'value': tensor([1, 2, 3, 4, 5, 6, 7, 8, 9])}
```

----

###### `write_csv` {#docs:current:clients:python:relational_api::write_csv}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
write_csv(self: _duckdb.DuckDBPyRelation, file_name: str, *, sep: object = None, na_rep: object = None, header: object = None, quotechar: object = None, escapechar: object = None, date_format: object = None, timestamp_format: object = None, quoting: object = None, encoding: object = None, compression: object = None, overwrite: object = None, per_thread_output: object = None, use_tmp_file: object = None, partition_by: object = None, write_partition_columns: object = None) -> None
```

####### Description {#docs:current:clients:python:relational_api::description}

Write the relation object to a CSV file in 'file_name'

**Aliases**: [`to_csv`](#::to_csv)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **file_name** : str
                            
	The name of the output CSV file.
- **sep** : str, default: ','
                            
	Field delimiter for the output file.
- **na_rep** : str, default: ''
                            
	Missing data representation.
- **header** : bool, default: True
                            
	Whether to write column headers.
- **quotechar** : str, default: '"'
                            
	Character used to quote fields containing special characters.
- **escapechar** : str, default: None
                            
	Character used to escape the delimiter if quoting is set to QUOTE_NONE.
- **date_format** : str, default: None
                            
	Custom format string for DATE values.
- **timestamp_format** : str, default: None
                            
	Custom format string for TIMESTAMP values.
- **quoting** : int, default: csv.QUOTE_MINIMAL
                            
	Control field quoting behavior (e.g., QUOTE_MINIMAL, QUOTE_ALL).
- **encoding** : str, default: 'utf-8'
                            
	Character encoding for the output file.
- **compression** : str, default: auto
                            
	Compression type (e.g., 'gzip', 'bz2', 'zstd').
- **overwrite** : bool, default: False
                            
	When true, all existing files inside targeted directories will be removed (not supported on remote filesystems). Only has an effect when used with `partition_by`.
- **per_thread_output** : bool, default: False
                            
	When `true`, write one file per thread, rather than one file in total. This allows for faster parallel writing.
- **use_tmp_file** : bool, default: False
                            
	Write to a temporary file before renaming to final name to avoid partial writes.
- **partition_by** : list[str], default: None
                            
	List of column names to partition output by (creates folder structure).
- **write_partition_columns** : bool, default: False
                            
	Whether or not to write partition columns into files. Only has an effect when used with `partition_by`.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.write_csv("code_example.csv")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
The data is exported to a CSV file, named code_example.csv
```

----

###### `write_parquet` {#docs:current:clients:python:relational_api::write_parquet}

####### Signature {#docs:current:clients:python:relational_api::signature}

```python
write_parquet(self: _duckdb.DuckDBPyRelation, file_name: str, *, compression: object = None, field_ids: object = None, row_group_size_bytes: object = None, row_group_size: object = None, overwrite: object = None, per_thread_output: object = None, use_tmp_file: object = None, partition_by: object = None, write_partition_columns: object = None, append: object = None, filename_pattern: object = None, file_size_bytes: object = None) -> None
```

####### Description {#docs:current:clients:python:relational_api::description}

Write the relation object to a Parquet file in 'file_name'

**Aliases**: [`to_parquet`](#::to_parquet)

####### Parameters {#docs:current:clients:python:relational_api::parameters}

- **file_name** : str
                            
	The name of the output Parquet file.
- **compression** : str, default: 'snappy'
                            
	The compression format to use (` uncompressed`, `snappy`, `gzip`, `zstd`, `brotli`, `lz4`, `lz4_raw`).
- **field_ids** : STRUCT
                            
	The field_id for each column. Pass auto to attempt to infer automatically.
- **row_group_size_bytes** : int, default: row_group_size * 1024
                            
	The target size of each row group. You can pass either a human-readable string, e.g., 2MB, or an integer, i.e., the number of bytes. This option is only used when you have issued `SET preserve_insertion_order = false;`, otherwise, it is ignored.
- **row_group_size** : int, default: 122880
                            
	The target size, i.e., number of rows, of each row group.
- **overwrite** : bool, default: False
                            
	If True, overwrite the file if it exists.
- **per_thread_output** : bool, default: False
                            
	When `True`, write one file per thread, rather than one file in total. This allows for faster parallel writing.
- **use_tmp_file** : bool, default: False
                            
	Write to a temporary file before renaming to final name to avoid partial writes.
- **partition_by** : list[str], default: None
                            
	List of column names to partition output by (creates folder structure).
- **write_partition_columns** : bool, default: False
                            
	Whether or not to write partition columns into files. Only has an effect when used with `partition_by`.
- **append** : bool, default: False
                            
	When `True`, in the event a filename pattern is generated that already exists, the path will be regenerated to ensure no existing files are overwritten. Only has an effect when used with `partition_by`.

####### Example {#docs:current:clients:python:relational_api::example}

```python
import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("""
        select 
            gen_random_uuid() as id, 
            concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description,
            range as value, 
            now() + concat(range,' ', 'minutes')::interval as created_timestamp
        from range(1, 10)
    """
)

rel.write_parquet("code_example.parquet")
```


####### Result {#docs:current:clients:python:relational_api::result}

```text
The data is exported to a Parquet file, named code_example.parquet
```

### Python Function API {#docs:current:clients:python:function}

You can create a DuckDB user-defined function (UDF) from a Python function so it can be used in SQL queries.
Similarly to regular [functions](#docs:current:sql:functions:overview), they need to have a name, a return type and parameter types.

Here is an example using a Python function that calls a third-party library.

```python
import duckdb
from duckdb.sqltypes import VARCHAR
from faker import Faker

def generate_random_name():
    fake = Faker()
    return fake.name()

duckdb.create_function("random_name", generate_random_name, [], VARCHAR)
res = duckdb.sql("SELECT random_name()").fetchall()
print(res)
```

```text
[('Gerald Ashley',)]
```

#### Creating Functions {#docs:current:clients:python:function::creating-functions}

To register a Python UDF, use the `create_function` method from a DuckDB connection. Here is the syntax:

```python
import duckdb
con = duckdb.connect()
con.create_function(name, function, parameters, return_type)
```

The `create_function` method takes the following parameters:

1. `name` A string representing the unique name of the UDF within the connection catalog.
2. `function` The Python function you wish to register as a UDF.
3. `parameters` Scalar functions can operate on one or more columns. This parameter takes a list of column types used as input.
4. `return_type` Scalar functions return one element per row. This parameter specifies the return type of the function.
5. `type` (optional): DuckDB supports both native Python types and PyArrow Arrays. By default, `type = 'native'` is assumed, but you can specify `type = 'arrow'` to use PyArrow Arrays. In general, using an Arrow UDF will be much more efficient than native because it will be able to operate in batches.
6. `null_handling` (optional): By default, `NULL` values are automatically handled as `NULL`-in `NULL`-out. Users can specify a desired behavior for `NULL` values by setting `null_handling = 'special'`.
7. `exception_handling` (optional): By default, when an exception is thrown from the Python function, it will be re-thrown in Python. Users can disable this behavior, and instead return `NULL`, by setting this parameter to `'return_null'`
8. `side_effects` (optional): By default, functions are expected to produce the same result for the same input. If the result of a function is impacted by any type of randomness, `side_effects` must be set to `True`.

To unregister a UDF, you can call the `remove_function` method with the UDF name:

```python
con.remove_function(name)
```

##### Using Partial Functions {#docs:current:clients:python:function::using-partial-functions}

DuckDB UDFs can also be created with [Python partial functions](https://docs.python.org/3/library/functools.html#functools.partial).

In the below example, we show how a custom logger will return the concatenation of the execution datetime in ISO format, always followed by 
argument passed at UDF creation and the input parameter provided to the function call:

```python
from datetime import datetime
import duckdb
import functools


def get_datetime_iso_format() -> str:
    return datetime.now().isoformat()


def logger_udf(func, arg1: str, arg2: int) -> str:
    return ' '.join([func(), arg1, str(arg2)])
    
    
with duckdb.connect() as con:
    con.sql("select * from range(10) tbl(id)").to_table("example_table")
    
    con.create_function(
        'custom_logger',
        functools.partial(logger_udf, get_datetime_iso_format, 'logging data')
    )
    rel = con.sql("SELECT custom_logger(id) from example_table;")
    rel.show()

    con.create_function(
        'another_custom_logger',
        functools.partial(logger_udf, get_datetime_iso_format, ':')
    )
    rel = con.sql("SELECT another_custom_logger(id) from example_table;")
    rel.show()
```

```text
┌───────────────────────────────────────────┐
│             custom_logger(id)             │
│                  varchar                  │
├───────────────────────────────────────────┤
│ 2025-03-27T12:07:56.811251 logging data 0 │
│ 2025-03-27T12:07:56.811264 logging data 1 │
│ 2025-03-27T12:07:56.811266 logging data 2 │
│ 2025-03-27T12:07:56.811268 logging data 3 │
│ 2025-03-27T12:07:56.811269 logging data 4 │
│ 2025-03-27T12:07:56.811270 logging data 5 │
│ 2025-03-27T12:07:56.811271 logging data 6 │
│ 2025-03-27T12:07:56.811272 logging data 7 │
│ 2025-03-27T12:07:56.811274 logging data 8 │
│ 2025-03-27T12:07:56.811275 logging data 9 │
├───────────────────────────────────────────┤
│                  10 rows                  │
└───────────────────────────────────────────┘

┌────────────────────────────────┐
│   another_custom_logger(id)    │
│            varchar             │
├────────────────────────────────┤
│ 2025-03-27T12:07:56.812106 : 0 │
│ 2025-03-27T12:07:56.812116 : 1 │
│ 2025-03-27T12:07:56.812118 : 2 │
│ 2025-03-27T12:07:56.812119 : 3 │
│ 2025-03-27T12:07:56.812121 : 4 │
│ 2025-03-27T12:07:56.812122 : 5 │
│ 2025-03-27T12:07:56.812123 : 6 │
│ 2025-03-27T12:07:56.812124 : 7 │
│ 2025-03-27T12:07:56.812126 : 8 │
│ 2025-03-27T12:07:56.812127 : 9 │
├────────────────────────────────┤
│            10 rows             │
└────────────────────────────────┘
```

#### Type Annotation {#docs:current:clients:python:function::type-annotation}

When the function has type annotation it's often possible to leave out all of the optional parameters.
Using `DuckDBPyType` we can implicitly convert many known types to DuckDB's type system.
For example:

```python
import duckdb

def my_function(x: int) -> str:
    return x

duckdb.create_function("my_func", my_function)
print(duckdb.sql("SELECT my_func(42)"))
```

```text
┌─────────────┐
│ my_func(42) │
│   varchar   │
├─────────────┤
│ 42          │
└─────────────┘
```

If only the parameter list types can be inferred, you'll need to pass in `None` as `parameters`.

#### `NULL` Handling {#docs:current:clients:python:function::null-handling}

By default when functions receive a `NULL` value, this instantly returns `NULL`, as part of the default `NULL`-handling.
When this is not desired, you need to explicitly set this parameter to `"special"`.

```python
import duckdb
from duckdb.sqltypes import BIGINT

def dont_intercept_null(x):
    return 5

duckdb.create_function("dont_intercept", dont_intercept_null, [BIGINT], BIGINT)
res = duckdb.sql("SELECT dont_intercept(NULL)").fetchall()
print(res)
```

```text
[(None,)]
```

With `null_handling="special"`:

```python
import duckdb
from duckdb.sqltypes import BIGINT

def dont_intercept_null(x):
    return 5

duckdb.create_function("dont_intercept", dont_intercept_null, [BIGINT], BIGINT, null_handling="special")
res = duckdb.sql("SELECT dont_intercept(NULL)").fetchall()
print(res)
```

```text
[(5,)]
```

> Always use `null_handling="special"` when the function can return NULL.


```python
import duckdb
from duckdb.sqltypes import VARCHAR


def return_str_or_none(x: str) -> str | None:
    if not x:
        return None
    
    return x

duckdb.create_function(
    "return_str_or_none",
    return_str_or_none,
    [VARCHAR],
    VARCHAR,
    null_handling="special"
)
res = duckdb.sql("SELECT return_str_or_none('')").fetchall()
print(res)
```

```text
[(None,)]
```

#### Exception Handling {#docs:current:clients:python:function::exception-handling}

By default, when an exception is thrown from the Python function, we'll forward (re-throw) the exception.
If you want to disable this behavior, and instead return `NULL`, you'll need to set this parameter to `"return_null"`.

```python
import duckdb
from duckdb.sqltypes import BIGINT

def will_throw():
    raise ValueError("ERROR")

duckdb.create_function("throws", will_throw, [], BIGINT)
try:
    res = duckdb.sql("SELECT throws()").fetchall()
except duckdb.InvalidInputException as e:
    print(e)

duckdb.create_function("doesnt_throw", will_throw, [], BIGINT, exception_handling="return_null")
res = duckdb.sql("SELECT doesnt_throw()").fetchall()
print(res)
```

```console
Invalid Input Error: Python exception occurred while executing the UDF: ValueError: ERROR

At:
  ...(5): will_throw
  ...(9): <module>
```

```text
[(None,)]
```

#### Side Effects {#docs:current:clients:python:function::side-effects}

By default DuckDB will assume the created function is a *pure* function, meaning it will produce the same output when given the same input.
If your function does not follow that rule, for example when your function makes use of randomness, then you will need to mark this function as having `side_effects`.

For example, this function will produce a new count for every invocation.

```python
def count() -> int:
    old = count.counter;
    count.counter += 1
    return old

count.counter = 0
```

If we create this function without marking it as having side effects, the result will be the following:

```python
con = duckdb.connect()
con.create_function("my_counter", count, side_effects=False)
res = con.sql("SELECT my_counter() FROM range(10)").fetchall()
print(res)
```

```text
[(0,), (0,), (0,), (0,), (0,), (0,), (0,), (0,), (0,), (0,)]
```

Which is obviously not the desired result, when we add `side_effects=True`, the result is as we would expect:

```python
con.remove_function("my_counter")
count.counter = 0
con.create_function("my_counter", count, side_effects=True)
res = con.sql("SELECT my_counter() FROM range(10)").fetchall()
print(res)
```

```text
[(0,), (1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,)]
```

#### Python Function Types {#docs:current:clients:python:function::python-function-types}

Currently, two function types are supported, `native` (default) and `arrow`.

##### Arrow {#docs:current:clients:python:function::arrow}

If the function is expected to receive arrow arrays, set the `type` parameter to `'arrow'`.

This will let the system know to provide arrow arrays of up to `STANDARD_VECTOR_SIZE` tuples to the function, and also expect an array of the same amount of tuples to be returned from the function.

In general, using an Arrow UDF will be much more efficient than native because it will be able to operate in batches.

```python
import duckdb
import pyarrow as pa
from duckdb.sqltypes import VARCHAR
from pyarrow import compute as pc


def mirror(strings: pa.Array, sep: pa.Array) -> pa.Array:
    assert isinstance(strings, pa.ChunkedArray)
    assert isinstance(sep, pa.ChunkedArray)
    return pc.binary_join_element_wise(strings, pc.ascii_reverse(strings), sep)


duckdb.create_function(
    "mirror",
    mirror,
    [VARCHAR, VARCHAR],
    return_type=VARCHAR,
    type="arrow",
)

duckdb.sql(
    "CREATE OR REPLACE TABLE strings AS SELECT 'hello' AS str UNION ALL SELECT 'world' AS str;"
)
print(duckdb.sql("SELECT mirror(str, '|') FROM strings;").fetchall())
```

```text
[('hello|olleh',), ('world|dlrow',)]
```

##### Native {#docs:current:clients:python:function::native}

When the function type is set to `native` the function will be provided with a single tuple at a time, and expect only a single value to be returned.
This can be useful to interact with Python libraries that don't operate on Arrow, such as `faker`:

```python
import duckdb

from duckdb.sqltypes import DATE
from faker import Faker

def random_date():
    fake = Faker()
    return fake.date_between()

duckdb.create_function(
    "random_date",
    random_date,
    parameters=[],
    return_type=DATE,
    type="native",
)
res = duckdb.sql("SELECT random_date()").fetchall()
print(res)
```

```text
[(datetime.date(2019, 5, 15),)]
```

### Types API {#docs:current:clients:python:types}

The `DuckDBPyType` class represents a type instance of our [data types](#docs:current:sql:data_types:overview).

#### Converting from Other Types {#docs:current:clients:python:types::converting-from-other-types}

To make the API as easy to use as possible, we have added implicit conversions from existing type objects to a DuckDBPyType instance.
This means that wherever a DuckDBPyType object is expected, it is also possible to provide any of the options listed below.

##### Python Built-Ins {#docs:current:clients:python:types::python-built-ins}

The table below shows the mapping of Python Built-in types to DuckDB type.



| Built-in types | DuckDB type |
|:---------------|:------------|
| bool           | BOOLEAN     |
| bytearray      | BLOB        |
| bytes          | BLOB        |
| float          | DOUBLE      |
| int            | BIGINT      |
| str            | VARCHAR     |

##### Numpy DTypes {#docs:current:clients:python:types::numpy-dtypes}

The table below shows the mapping of Numpy DType to DuckDB type.



| Type        | DuckDB type |
|:------------|:------------|
| bool        | BOOLEAN     |
| float32     | FLOAT       |
| float64     | DOUBLE      |
| int16       | SMALLINT    |
| int32       | INTEGER     |
| int64       | BIGINT      |
| int8        | TINYINT     |
| uint16      | USMALLINT   |
| uint32      | UINTEGER    |
| uint64      | UBIGINT     |
| uint8       | UTINYINT    |

##### Nested Types {#docs:current:clients:python:types::nested-types}

###### `list[child_type]` {#docs:current:clients:python:types::listchild_type}

`list` type objects map to a `LIST` type of the child type.
Which can also be arbitrarily nested.

```python
import duckdb.sqltypes
from typing import Union

duckdb.sqltypes.DuckDBPyType(list[dict[Union[str, int], str]])
```

```text
MAP(UNION(u1 VARCHAR, u2 BIGINT), VARCHAR)[]
```

###### `dict[key_type, value_type]` {#docs:current:clients:python:types::dictkey_type-value_type}

`dict` type objects map to a `MAP` type of the key type and the value type.

```python
import duckdb.sqltypes

print(duckdb.sqltypes.DuckDBPyType(dict[str, int]))
```

```text
MAP(VARCHAR, BIGINT)
```

###### `{'a': field_one, 'b': field_two, ..., 'n': field_n}` {#docs:current:clients:python:types::a-field_one-b-field_two--n-field_n}

`dict` objects map to a `STRUCT` composed of the keys and values of the dict.

```python
import duckdb.sqltypes

print(duckdb.sqltypes.DuckDBPyType({'a': str, 'b': int}))
```

```text
STRUCT(a VARCHAR, b BIGINT)
```

###### `Union[type_1, ... type_n]` {#docs:current:clients:python:types::uniontype_1--type_n}

`typing.Union` objects map to a `UNION` type of the provided types.

```python
import duckdb.sqltypes
from typing import Union

print(duckdb.sqltypes.DuckDBPyType(Union[int, str, bool, bytearray]))
```

```text
UNION(u1 BIGINT, u2 VARCHAR, u3 BOOLEAN, u4 BLOB)
```

##### Creation Functions {#docs:current:clients:python:types::creation-functions}

For the built-in types, you can use the constants defined in `duckdb.sqltypes`:



| DuckDB type    |
|:---------------|
| BIGINT         |
| BIT            |
| BLOB           |
| BOOLEAN        |
| DATE           |
| DOUBLE         |
| FLOAT          |
| HUGEINT        |
| INTEGER        |
| INTERVAL       |
| SMALLINT       |
| SQLNULL        |
| TIME_TZ        |
| TIME           |
| TIMESTAMP_MS   |
| TIMESTAMP_NS   |
| TIMESTAMP_S    |
| TIMESTAMP_TZ   |
| TIMESTAMP      |
| TINYINT        |
| UBIGINT        |
| UHUGEINT       |
| UINTEGER       |
| USMALLINT      |
| UTINYINT       |
| UUID           |
| VARCHAR        |

For the complex types there are methods available on the `DuckDBPyConnection` object or the `duckdb` module.
Anywhere a `DuckDBPyType` is accepted, we will also accept one of the type objects that can implicitly convert to a `DuckDBPyType`.

###### `list_type` | `array_type` {#docs:current:clients:python:types::list_type--array_type}

Parameters:

* `child_type: DuckDBPyType`

###### `struct_type` | `row_type` {#docs:current:clients:python:types::struct_type--row_type}

Parameters:

* `fields: Union[list[DuckDBPyType], dict[str, DuckDBPyType]]`

###### `map_type` {#docs:current:clients:python:types::map_type}

Parameters:

* `key_type: DuckDBPyType`
* `value_type: DuckDBPyType`

###### `decimal_type` {#docs:current:clients:python:types::decimal_type}

Parameters:

* `width: int`
* `scale: int`

###### `union_type` {#docs:current:clients:python:types::union_type}

Parameters:

* `members: Union[list[DuckDBPyType], dict[str, DuckDBPyType]]`

###### `string_type` {#docs:current:clients:python:types::string_type}

Parameters:

* `collation: Optional[str]`

### Expression API {#docs:current:clients:python:expression}

The `Expression` class represents an instance of an [expression](#docs:current:sql:expressions:overview).

#### Why Would I Use the Expression API? {#docs:current:clients:python:expression::why-would-i-use-the-expression-api}

Using this API makes it possible to dynamically build up expressions, which are typically created by the parser from the query string.
This allows you to skip that and have more fine-grained control over the used expressions.

Below is a list of currently supported expressions that can be created through the API.

#### Column Expression {#docs:current:clients:python:expression::column-expression}

This expression references a column by name.

```python
import duckdb
import pandas as pd

df = pd.DataFrame({
    'a': [1, 2, 3, 4],
    'b': [True, None, False, True],
    'c': [42, 21, 13, 14]
})
```

Selecting a single column:

```python
col = duckdb.ColumnExpression('a')
duckdb.df(df).select(col).show()
```

```text
┌───────┐
│   a   │
│ int64 │
├───────┤
│     1 │
│     2 │
│     3 │
│     4 │
└───────┘
```

Selecting multiple columns:

```python
col_list = [
        duckdb.ColumnExpression('a') * 10,
        duckdb.ColumnExpression('b').isnull(),
        duckdb.ColumnExpression('c') + 5
    ]
duckdb.df(df).select(*col_list).show()
```

```text
┌──────────┬─────────────┬─────────┐
│ (a * 10) │ (b IS NULL) │ (c + 5) │
│  int64   │   boolean   │  int64  │
├──────────┼─────────────┼─────────┤
│       10 │ false       │      47 │
│       20 │ true        │      26 │
│       30 │ false       │      18 │
│       40 │ false       │      19 │
└──────────┴─────────────┴─────────┘
```

#### Star Expression {#docs:current:clients:python:expression::star-expression}

This expression selects all columns of the input source.

Optionally it's possible to provide an `exclude` list to filter out columns of the table.
This `exclude` list can contain either strings or Expressions.

```python
import duckdb
import pandas as pd

df = pd.DataFrame({
    'a': [1, 2, 3, 4],
    'b': [True, None, False, True],
    'c': [42, 21, 13, 14]
})

star = duckdb.StarExpression(exclude = ['b'])
duckdb.df(df).select(star).show()
```

```text
┌───────┬───────┐
│   a   │   c   │
│ int64 │ int64 │
├───────┼───────┤
│     1 │    42 │
│     2 │    21 │
│     3 │    13 │
│     4 │    14 │
└───────┴───────┘
```

#### Constant Expression {#docs:current:clients:python:expression::constant-expression}

This expression contains a single value.

```python
import duckdb
import pandas as pd

df = pd.DataFrame({
    'a': [1, 2, 3, 4],
    'b': [True, None, False, True],
    'c': [42, 21, 13, 14]
})

const = duckdb.ConstantExpression('hello')
duckdb.df(df).select(const).show()
```

```text
┌─────────┐
│ 'hello' │
│ varchar │
├─────────┤
│ hello   │
│ hello   │
│ hello   │
│ hello   │
└─────────┘
```

#### Case Expression {#docs:current:clients:python:expression::case-expression}

This expression contains a `CASE WHEN (...) THEN (...) ELSE (...) END` expression.
By default `ELSE` is `NULL` and it can be set using `.else(value = ...)`.
Additional `WHEN (...) THEN (...)` blocks can be added with `.when(condition = ..., value = ...)`.

```python
import duckdb
import pandas as pd
from duckdb import (
    ConstantExpression,
    ColumnExpression,
    CaseExpression
)

df = pd.DataFrame({
    'a': [1, 2, 3, 4],
    'b': [True, None, False, True],
    'c': [42, 21, 13, 14]
})

hello = ConstantExpression('hello')
world = ConstantExpression('world')

case = \
    CaseExpression(condition = ColumnExpression('b') == False, value = world) \
    .otherwise(hello)
duckdb.df(df).select(case).show()
```

```text
┌──────────────────────────────────────────────────────────┐
│ CASE  WHEN ((b = false)) THEN ('world') ELSE 'hello' END │
│                         varchar                          │
├──────────────────────────────────────────────────────────┤
│ hello                                                    │
│ hello                                                    │
│ world                                                    │
│ hello                                                    │
└──────────────────────────────────────────────────────────┘
```

#### Function Expression {#docs:current:clients:python:expression::function-expression}

This expression contains a function call.
It can be constructed by providing the function name and an arbitrary amount of Expressions as arguments.

```python
import duckdb
import pandas as pd
from duckdb import (
    ConstantExpression,
    ColumnExpression,
    FunctionExpression
)

df = pd.DataFrame({
    'a': [1, 2, 3, 4],
    'b': [True, None, False, True],
    'c': [42, 21, 13, 14]
})

multiply_by_2 = FunctionExpression('multiply', ColumnExpression('a'), ConstantExpression(2))
duckdb.df(df).select(multiply_by_2).show()
```

```text
┌────────────────┐
│ multiply(a, 2) │
│     int64      │
├────────────────┤
│              2 │
│              4 │
│              6 │
│              8 │
└────────────────┘
```

#### SQL Expression {#docs:current:clients:python:expression::sql-expression}

This expression contains any valid SQL expression.

```python
import duckdb
import pandas as pd

from duckdb import SQLExpression

df = pd.DataFrame({
    'a': [1, 2, 3, 4],
    'b': [True, None, False, True],
    'c': [42, 21, 13, 14]
})

duckdb.df(df).filter(
    SQLExpression("b is true")
).select(
    SQLExpression("a").alias("selecting_column_a"),
    SQLExpression("case when a = 1 then 1 else 0 end").alias("selecting_case_expression"),
    SQLExpression("1").alias("constant_numeric_column"),
    SQLExpression("'hello'").alias("constant_text_column")
).aggregate(
    aggr_expr=[
        SQLExpression("SUM(selecting_column_a)").alias("sum_a"), 
        "selecting_case_expression" , 
        "constant_numeric_column", 
        "constant_text_column"
    ],
).show()
```

```text
┌────────┬───────────────────────────┬─────────────────────────┬──────────────────────┐
│ sum_a  │ selecting_case_expression │ constant_numeric_column │ constant_text_column │
│ int128 │           int32           │          int32          │       varchar        │
├────────┼───────────────────────────┼─────────────────────────┼──────────────────────┤
│      4 │                         0 │                       1 │ hello                │
│      1 │                         1 │                       1 │ hello                │
└────────┴───────────────────────────┴─────────────────────────┴──────────────────────┘
```

#### Common Operations {#docs:current:clients:python:expression::common-operations}

The Expression class also contains many operations that can be applied to any Expression type.

| Operation                      | Description                                                                                                                 |
|--------------------------------|-----------------------------------------------------------------------------------------------------------------------------|
| `.alias(name: str)`            | Applies an alias to the expression                                                                                          |
| `.cast(type: DuckDBPyType)`    | Applies a cast to the provided type on the expression                                                                       |
| `.isin(*exprs: Expression)`    | Creates an [`IN` expression](#docs:current:sql:expressions:in::in) against the provided expressions as the list         |
| `.isnotin(*exprs: Expression)` | Creates a [`NOT IN` expression](#docs:current:sql:expressions:in::not-in) against the provided expressions as the list  |
| `.isnotnull()`                 | Checks whether the expression is not `NULL`                                                                                 |
| `.isnull()`                    | Checks whether the expression is `NULL`                                                                                     |

##### Order Operations {#docs:current:clients:python:expression::order-operations}

When expressions are provided to `DuckDBPyRelation.order()`, the following order operations can be applied.

| Operation                      | Description                                                                        |
|--------------------------------|------------------------------------------------------------------------------------|
| `.asc()`                       | Indicates that this expression should be sorted in ascending order                 |
| `.desc()`                      | Indicates that this expression should be sorted in descending order                |
| `.nulls_first()`               | Indicates that the nulls in this expression should precede the non-null values     |
| `.nulls_last()`                | Indicates that the nulls in this expression should come after the non-null values  |

### Spark API {#docs:current:clients:python:spark_api}

The DuckDB Spark API implements the [PySpark API](https://spark.apache.org/docs/3.5.8/api/python/reference/index.html), allowing you to use the familiar Spark API to interact with DuckDB.
All statements are translated to DuckDB's internal plans using our [relational API](#docs:current:clients:python:relational_api) and executed using DuckDB's query engine.

> **Warning.** The DuckDB Spark API is currently experimental and features are still missing. We are very interested in feedback. Please report any functionality that you are missing, either through [Discord](https://discord.duckdb.org) or on [GitHub](https://github.com/duckdb/duckdb/issues).

#### Example {#docs:current:clients:python:spark_api::example}

```python
from duckdb.experimental.spark.sql import SparkSession as session
from duckdb.experimental.spark.sql.functions import lit, col
import pandas as pd

spark = session.builder.getOrCreate()

pandas_df = pd.DataFrame({
    'age': [34, 45, 23, 56],
    'name': ['Joan', 'Peter', 'John', 'Bob']
})

df = spark.createDataFrame(pandas_df)
df = df.withColumn(
    'location', lit('Seattle')
)
res = df.select(
    col('age'),
    col('location')
).collect()

print(res)
```

```text
[
    Row(age=34, location='Seattle'),
    Row(age=45, location='Seattle'),
    Row(age=23, location='Seattle'),
    Row(age=56, location='Seattle')
]
```

#### Contribution Guidelines {#docs:current:clients:python:spark_api::contribution-guidelines}

Contributions to the experimental Spark API are welcome.
When making a contribution, please follow these guidelines:

* Instead of using temporary files, use our `pytest` testing framework.
* When adding new functions, ensure that method signatures comply with those in the [PySpark API](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/index.html).

### Python Client API {#docs:current:clients:python:reference:index}

<div class="bodywrapper">
<div class="body" role="main">

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.BinaryValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">BinaryValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.BinaryValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.BinderException">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">BinderException</span></span><a class="headerlink" href="#duckdb.BinderException" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.ProgrammingError" title="_duckdb.ProgrammingError"><code class="xref py py-class docutils literal notranslate"><span class="pre">ProgrammingError</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.BitValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">BitValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.BitValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.BlobValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">BlobValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.BlobValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.BooleanValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">BooleanValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.BooleanValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.CSVLineTerminator">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">CSVLineTerminator</span></span><a class="headerlink" href="#duckdb.CSVLineTerminator" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">pybind11_object</span></code></p>
<p>Members:</p>
<p>LINE_FEED</p>
<p>CARRIAGE_RETURN_LINE_FEED</p>
<dl class="py property">
<dt class="sig sig-object py">
<span class="sig-name descname"><span class="pre">CSVLineTerminator.name</span> <span class="pre">-&gt;</span> <span class="pre">str</span></span>
</dt>
<dd></dd>
</dl>

</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.CaseExpression">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">CaseExpression</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">condition</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">value</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></span><a class="headerlink" href="#duckdb.CaseExpression" title="Link to this definition">&#182;</a>
</dt>
<dd></dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.CatalogException">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">CatalogException</span></span><a class="headerlink" href="#duckdb.CatalogException" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.ProgrammingError" title="_duckdb.ProgrammingError"><code class="xref py py-class docutils literal notranslate"><span class="pre">ProgrammingError</span></code></a></p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.CoalesceOperator">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">CoalesceOperator</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></span><a class="headerlink" href="#duckdb.CoalesceOperator" title="Link to this definition">&#182;</a>
</dt>
<dd></dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.ColumnExpression">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">ColumnExpression</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></span><a class="headerlink" href="#duckdb.ColumnExpression" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a column reference from the provided column name</p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.ConnectionException">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">ConnectionException</span></span><a class="headerlink" href="#duckdb.ConnectionException" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.OperationalError" title="_duckdb.OperationalError"><code class="xref py py-class docutils literal notranslate"><span class="pre">OperationalError</span></code></a></p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.ConstantExpression">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">ConstantExpression</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">value</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></span><a class="headerlink" href="#duckdb.ConstantExpression" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a constant expression from the provided value</p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.ConstraintException">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">ConstraintException</span></span><a class="headerlink" href="#duckdb.ConstraintException" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.IntegrityError" title="_duckdb.IntegrityError"><code class="xref py py-class docutils literal notranslate"><span class="pre">IntegrityError</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.ConversionException">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">ConversionException</span></span><a class="headerlink" href="#duckdb.ConversionException" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.DataError" title="_duckdb.DataError"><code class="xref py py-class docutils literal notranslate"><span class="pre">DataError</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.DBAPITypeObject">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">DBAPITypeObject</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">types</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">list</span><span class="p"><span class="pre">[</span></span><span class="pre">DuckDBPyType</span><span class="p"><span class="pre">]</span></span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.DBAPITypeObject" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">object</span></code></p>
<p>DB API 2.0 type object for categorizing database column types.</p>
<p>This class implements the type objects defined in PEP 249 (DB API 2.0).
It allows checking whether a specific DuckDB type belongs to a broader
category like STRING, NUMBER, DATETIME, etc.</p>
<p>The type object supports equality comparison with DuckDBPyType instances,
returning True if the type belongs to this category.</p>
<dl>
<dt>Args:</dt>
<dd>
<p>types: A list of DuckDBPyType instances that belong to this type category.</p>
</dd>
<dt>Example:</dt>
<dd>
<div class="doctest highlight-default notranslate">
<div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">string_types</span> <span class="o">=</span> <span class="n">DBAPITypeObject</span><span class="p">([</span><span class="n">sqltypes</span><span class="o">.</span><span class="n">VARCHAR</span><span class="p">,</span> <span class="n">sqltypes</span><span class="o">.</span><span class="n">CHAR</span><span class="p">])</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">result</span> <span class="o">=</span> <span class="n">sqltypes</span><span class="o">.</span><span class="n">VARCHAR</span> <span class="o">==</span> <span class="n">string_types</span>  <span class="c1"># True</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">result</span> <span class="o">=</span> <span class="n">sqltypes</span><span class="o">.</span><span class="n">INTEGER</span> <span class="o">==</span> <span class="n">string_types</span>  <span class="c1"># False</span>
</pre>
</div>
</dd>
<dt>Note:</dt>
<dd>
<p>This follows the DB API 2.0 specification where type objects are compared
using equality operators rather than isinstance() checks.</p>
</dd>
</dl>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.DataError">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">DataError</span></span><a class="headerlink" href="#duckdb.DataError" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.DatabaseError" title="_duckdb.DatabaseError"><code class="xref py py-class docutils literal notranslate"><span class="pre">DatabaseError</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.DatabaseError">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">DatabaseError</span></span><a class="headerlink" href="#duckdb.DatabaseError" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Error" title="_duckdb.Error"><code class="xref py py-class docutils literal notranslate"><span class="pre">Error</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.DateValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">DateValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.DateValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.DecimalValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">DecimalValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">width</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">int</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">scale</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">int</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.DecimalValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.DefaultExpression">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">DefaultExpression</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></span><a class="headerlink" href="#duckdb.DefaultExpression" title="Link to this definition">&#182;</a>
</dt>
<dd></dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.DependencyException">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">DependencyException</span></span><a class="headerlink" href="#duckdb.DependencyException" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.DatabaseError" title="_duckdb.DatabaseError"><code class="xref py py-class docutils literal notranslate"><span class="pre">DatabaseError</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.DoubleValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">DoubleValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.DoubleValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">DuckDBPyConnection</span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">pybind11_object</span></code></p>
<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.append">
<span class="sig-name descname"><span class="pre">append</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">table_name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">df</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference external" href="https://pandas.pydata.org/pandas-docs/version/3.0/reference/api/pandas.DataFrame.html#pandas.DataFrame" title="(in pandas v3.0)"><span class="pre">pandas.DataFrame</span></a></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">by_name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.append" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Append the passed DataFrame to the named table</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.array_type">
<span class="sig-name descname"><span class="pre">array_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">type</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">size</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.array_type" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create an array type object of &#8216;type&#8217;</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.arrow">
<span class="sig-name descname"><span class="pre">arrow</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">rows_per_batch</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1000000</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference external" href="https://arrow.apache.org/docs/9.0/python/generated/pyarrow.RecordBatchReader.html#pyarrow.RecordBatchReader" title="(in Apache Arrow v9.0.0)"><span class="pre">pyarrow.lib.RecordBatchReader</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.arrow" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Alias of to_arrow_reader(). We recommend using to_arrow_reader() instead.</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.begin">
<span class="sig-name descname"><span class="pre">begin</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.begin" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Start a new transaction</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.checkpoint">
<span class="sig-name descname"><span class="pre">checkpoint</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.checkpoint" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Synchronizes data in the write-ahead log (WAL) to the database data file (no-op for in-memory connections)</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.close">
<span class="sig-name descname"><span class="pre">close</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.close" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Close the connection</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.commit">
<span class="sig-name descname"><span class="pre">commit</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.commit" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Commit changes performed within a transaction</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.create_function">
<span class="sig-name descname"><span class="pre">create_function</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">function</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">collections.abc.Callable</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">parameters</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">return_type</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">*</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">type</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">_duckdb._func.PythonUDFType</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">&lt;PythonUDFType.NATIVE:</span> <span class="pre">0&gt;</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">null_handling</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">_duckdb._func.FunctionNullHandling</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">&lt;FunctionNullHandling.DEFAULT:</span> <span class="pre">0&gt;</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">exception_handling</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.PythonExceptionHandling" title="_duckdb.PythonExceptionHandling"><span class="pre">_duckdb.PythonExceptionHandling</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">&lt;PythonExceptionHandling.DEFAULT:</span> <span class="pre">0&gt;</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">side_effects</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.create_function" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a DuckDB function out of the passing in Python function so it can be used in queries</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.cursor">
<span class="sig-name descname"><span class="pre">cursor</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.cursor" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a duplicate of the current connection</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.decimal_type">
<span class="sig-name descname"><span class="pre">decimal_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">width</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">scale</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.decimal_type" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a decimal type with &#8216;width&#8217; and &#8216;scale&#8217;</p>
</dd>
</dl>

<dl class="py property">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.description">
<span class="property"><span class="k"><span class="pre">property</span></span><span class="w"> </span></span><span class="sig-name descname"><span class="pre">description</span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.description" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Get result set attributes, mainly column names</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.df">
<span class="sig-name descname"><span class="pre">df</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">date_as_object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference external" href="https://pandas.pydata.org/pandas-docs/version/3.0/reference/api/pandas.DataFrame.html#pandas.DataFrame" title="(in pandas v3.0)"><span class="pre">pandas.DataFrame</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.df" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch a result as DataFrame following execute()</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.disable_profiling">
<span class="sig-name descname"><span class="pre">disable_profiling</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.disable_profiling" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Disable profiling for subsequent queries</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.dtype">
<span class="sig-name descname"><span class="pre">dtype</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">type_str</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.dtype" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a type object by parsing the &#8216;type_str&#8217; string</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.duplicate">
<span class="sig-name descname"><span class="pre">duplicate</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.duplicate" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a duplicate of the current connection</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.enable_profiling">
<span class="sig-name descname"><span class="pre">enable_profiling</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.enable_profiling" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Enable profiling for subsequent queries</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.enum_type">
<span class="sig-name descname"><span class="pre">enum_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">type</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">values</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">list</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.enum_type" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create an enum type of underlying &#8216;type&#8217;, consisting of the list of &#8216;values&#8217;</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.execute">
<span class="sig-name descname"><span class="pre">execute</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">query</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">parameters</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.execute" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Execute the given SQL query, optionally using prepared statements with parameters set</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.executemany">
<span class="sig-name descname"><span class="pre">executemany</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">query</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">parameters</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.executemany" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Execute the given prepared statement multiple times using the list of parameter sets in parameters</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.extract_statements">
<span class="sig-name descname"><span class="pre">extract_statements</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">query</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">list</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.extract_statements" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Parse the query string and extract the Statement object(s) produced</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.fetch_arrow_table">
<span class="sig-name descname"><span class="pre">fetch_arrow_table</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">rows_per_batch</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1000000</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference external" href="https://arrow.apache.org/docs/9.0/python/generated/pyarrow.Table.html#pyarrow.Table" title="(in Apache Arrow v9.0.0)"><span class="pre">pyarrow.lib.Table</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.fetch_arrow_table" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch a result as Arrow table following execute()</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.fetch_df">
<span class="sig-name descname"><span class="pre">fetch_df</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">date_as_object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference external" href="https://pandas.pydata.org/pandas-docs/version/3.0/reference/api/pandas.DataFrame.html#pandas.DataFrame" title="(in pandas v3.0)"><span class="pre">pandas.DataFrame</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.fetch_df" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch a result as DataFrame following execute()</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.fetch_df_chunk">
<span class="sig-name descname"><span class="pre">fetch_df_chunk</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">vectors_per_chunk</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">date_as_object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference external" href="https://pandas.pydata.org/pandas-docs/version/3.0/reference/api/pandas.DataFrame.html#pandas.DataFrame" title="(in pandas v3.0)"><span class="pre">pandas.DataFrame</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.fetch_df_chunk" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch a chunk of the result as DataFrame following execute()</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.fetch_record_batch">
<span class="sig-name descname"><span class="pre">fetch_record_batch</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">rows_per_batch</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1000000</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference external" href="https://arrow.apache.org/docs/9.0/python/generated/pyarrow.RecordBatchReader.html#pyarrow.RecordBatchReader" title="(in Apache Arrow v9.0.0)"><span class="pre">pyarrow.lib.RecordBatchReader</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.fetch_record_batch" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch an Arrow RecordBatchReader following execute()</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.fetchall">
<span class="sig-name descname"><span class="pre">fetchall</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">list</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.fetchall" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch all rows from a result following execute</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.fetchdf">
<span class="sig-name descname"><span class="pre">fetchdf</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">date_as_object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference external" href="https://pandas.pydata.org/pandas-docs/version/3.0/reference/api/pandas.DataFrame.html#pandas.DataFrame" title="(in pandas v3.0)"><span class="pre">pandas.DataFrame</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.fetchdf" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch a result as DataFrame following execute()</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.fetchmany">
<span class="sig-name descname"><span class="pre">fetchmany</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">size</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">list</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.fetchmany" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch the next set of rows from a result following execute</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.fetchnumpy">
<span class="sig-name descname"><span class="pre">fetchnumpy</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">dict</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.fetchnumpy" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch a result as list of NumPy arrays following execute</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.fetchone">
<span class="sig-name descname"><span class="pre">fetchone</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">tuple</span><span class="p"><span class="pre">]</span></span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.fetchone" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch a single row from a result following execute</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.filesystem_is_registered">
<span class="sig-name descname"><span class="pre">filesystem_is_registered</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">bool</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.filesystem_is_registered" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Check if a filesystem with the provided name is currently registered</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.from_arrow">
<span class="sig-name descname"><span class="pre">from_arrow</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">arrow_object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.from_arrow" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a relation object from an Arrow object</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.from_csv_auto">
<span class="sig-name descname"><span class="pre">from_csv_auto</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">path_or_buffer</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.from_csv_auto" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a relation object from the CSV file in &#8216;name&#8217;</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.from_df">
<span class="sig-name descname"><span class="pre">from_df</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">df</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference external" href="https://pandas.pydata.org/pandas-docs/version/3.0/reference/api/pandas.DataFrame.html#pandas.DataFrame" title="(in pandas v3.0)"><span class="pre">pandas.DataFrame</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.from_df" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a relation object from the DataFrame in df</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.from_parquet">
<span class="sig-name descname"><span class="pre">from_parquet</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.DuckDBPyConnection.from_parquet" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Overloaded function.</p>
<ol class="arabic simple">
<li><p>from_parquet(self: _duckdb.DuckDBPyConnection, file_glob: str, binary_as_string: bool = False, <a href="#id1"><span class="problematic" id="id2">*</span></a>, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -&gt; _duckdb.DuckDBPyRelation</p></li>
</ol>
<p>Create a relation object from the Parquet files in file_glob</p>
<ol class="arabic simple" start="2">
<li><p>from_parquet(self: _duckdb.DuckDBPyConnection, file_globs: collections.abc.Sequence[str], binary_as_string: bool = False, <a href="#id3"><span class="problematic" id="id4">*</span></a>, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -&gt; _duckdb.DuckDBPyRelation</p></li>
</ol>
<p>Create a relation object from the Parquet files in file_globs</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.from_query">
<span class="sig-name descname"><span class="pre">from_query</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">query</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">alias</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">params</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.from_query" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.get_profiling_information">
<span class="sig-name descname"><span class="pre">get_profiling_information</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">format</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">'json'</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">str</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.get_profiling_information" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Get profiling information for a query</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.get_table_names">
<span class="sig-name descname"><span class="pre">get_table_names</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">query</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">qualified</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">set</span><span class="p"><span class="pre">[</span></span><span class="pre">str</span><span class="p"><span class="pre">]</span></span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.get_table_names" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Extract the required table names from a query</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.install_extension">
<span class="sig-name descname"><span class="pre">install_extension</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">extension</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">force_install</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">repository</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">repository_url</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">version</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.install_extension" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Install an extension by name, with an optional version and/or repository to get the extension from</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.interrupt">
<span class="sig-name descname"><span class="pre">interrupt</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.interrupt" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Interrupt pending operations</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.list_filesystems">
<span class="sig-name descname"><span class="pre">list_filesystems</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">list</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.list_filesystems" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>List registered filesystems, including builtin ones</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.list_type">
<span class="sig-name descname"><span class="pre">list_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">type</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.list_type" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a list type object of &#8216;type&#8217;</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.load_extension">
<span class="sig-name descname"><span class="pre">load_extension</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">extension</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.load_extension" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Load an installed extension</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.map_type">
<span class="sig-name descname"><span class="pre">map_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">key</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">value</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.map_type" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a map type object from &#8216;key_type&#8217; and &#8216;value_type&#8217;</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.pl">
<span class="sig-name descname"><span class="pre">pl</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">rows_per_batch</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1000000</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">lazy</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">duckdb::PolarsDataFrame</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.pl" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch a result as Polars DataFrame following execute()</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.query">
<span class="sig-name descname"><span class="pre">query</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">query</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">alias</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">params</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.query" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.query_progress">
<span class="sig-name descname"><span class="pre">query_progress</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">float</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.query_progress" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Query progress of pending operation</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.read_csv">
<span class="sig-name descname"><span class="pre">read_csv</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">path_or_buffer</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.read_csv" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a relation object from the CSV file in &#8216;name&#8217;</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.read_json">
<span class="sig-name descname"><span class="pre">read_json</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">path_or_buffer</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">sample_size</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">maximum_depth</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">records</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">str</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">format</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">str</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">date_format</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">timestamp_format</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">compression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">maximum_object_size</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">ignore_errors</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">convert_strings_to_integers</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">field_appearance_threshold</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">map_inference_threshold</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">maximum_sample_files</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">filename</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">hive_partitioning</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">union_by_name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">hive_types</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">hive_types_autocast</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.read_json" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a relation object from the JSON file in &#8216;name&#8217;</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.read_parquet">
<span class="sig-name descname"><span class="pre">read_parquet</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.DuckDBPyConnection.read_parquet" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Overloaded function.</p>
<ol class="arabic simple">
<li><p>read_parquet(self: _duckdb.DuckDBPyConnection, file_glob: str, binary_as_string: bool = False, <a href="#id5"><span class="problematic" id="id6">*</span></a>, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -&gt; _duckdb.DuckDBPyRelation</p></li>
</ol>
<p>Create a relation object from the Parquet files in file_glob</p>
<ol class="arabic simple" start="2">
<li><p>read_parquet(self: _duckdb.DuckDBPyConnection, file_globs: collections.abc.Sequence[str], binary_as_string: bool = False, <a href="#id7"><span class="problematic" id="id8">*</span></a>, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -&gt; _duckdb.DuckDBPyRelation</p></li>
</ol>
<p>Create a relation object from the Parquet files in file_globs</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.register">
<span class="sig-name descname"><span class="pre">register</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">view_name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">python_object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.register" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Register the passed Python Object value for querying with a view</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.register_filesystem">
<span class="sig-name descname"><span class="pre">register_filesystem</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">filesystem</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">fsspec.AbstractFileSystem</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.register_filesystem" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Register a fsspec compliant filesystem</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.remove_function">
<span class="sig-name descname"><span class="pre">remove_function</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.remove_function" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Remove a previously created function</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.rollback">
<span class="sig-name descname"><span class="pre">rollback</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.rollback" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Roll back changes performed within a transaction</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.row_type">
<span class="sig-name descname"><span class="pre">row_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">fields</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.row_type" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a struct type object from &#8216;fields&#8217;</p>
</dd>
</dl>

<dl class="py property">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.rowcount">
<span class="property"><span class="k"><span class="pre">property</span></span><span class="w"> </span></span><span class="sig-name descname"><span class="pre">rowcount</span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.rowcount" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Get result set row count</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.sql">
<span class="sig-name descname"><span class="pre">sql</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">query</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">alias</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">params</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.sql" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.sqltype">
<span class="sig-name descname"><span class="pre">sqltype</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">type_str</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.sqltype" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a type object by parsing the &#8216;type_str&#8217; string</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.string_type">
<span class="sig-name descname"><span class="pre">string_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">collation</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.string_type" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a string type with an optional collation</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.struct_type">
<span class="sig-name descname"><span class="pre">struct_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">fields</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.struct_type" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a struct type object from &#8216;fields&#8217;</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.table">
<span class="sig-name descname"><span class="pre">table</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">table_name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.table" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a relation object for the named table</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.table_function">
<span class="sig-name descname"><span class="pre">table_function</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">parameters</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.table_function" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a relation object from the named table function with given parameters</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.tf">
<span class="sig-name descname"><span class="pre">tf</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">dict</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.tf" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch a result as dict of TensorFlow Tensors following execute()</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.to_arrow_reader">
<span class="sig-name descname"><span class="pre">to_arrow_reader</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">batch_size</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1000000</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference external" href="https://arrow.apache.org/docs/9.0/python/generated/pyarrow.RecordBatchReader.html#pyarrow.RecordBatchReader" title="(in Apache Arrow v9.0.0)"><span class="pre">pyarrow.lib.RecordBatchReader</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.to_arrow_reader" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch an Arrow RecordBatchReader following execute()</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.to_arrow_table">
<span class="sig-name descname"><span class="pre">to_arrow_table</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">batch_size</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1000000</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference external" href="https://arrow.apache.org/docs/9.0/python/generated/pyarrow.Table.html#pyarrow.Table" title="(in Apache Arrow v9.0.0)"><span class="pre">pyarrow.lib.Table</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.to_arrow_table" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch a result as Arrow table following execute()</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.torch">
<span class="sig-name descname"><span class="pre">torch</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">dict</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.torch" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch a result as dict of PyTorch Tensors following execute()</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.type">
<span class="sig-name descname"><span class="pre">type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">type_str</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.type" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a type object by parsing the &#8216;type_str&#8217; string</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.union_type">
<span class="sig-name descname"><span class="pre">union_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">members</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.union_type" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a union type object from &#8216;members&#8217;</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.unregister">
<span class="sig-name descname"><span class="pre">unregister</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">view_name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.unregister" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Unregister the view name</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.unregister_filesystem">
<span class="sig-name descname"><span class="pre">unregister_filesystem</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.unregister_filesystem" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Unregister a filesystem</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.values">
<span class="sig-name descname"><span class="pre">values</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.values" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a relation object from the passed values</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyConnection.view">
<span class="sig-name descname"><span class="pre">view</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="_duckdb.DuckDBPyConnection"><span class="pre">_duckdb.DuckDBPyConnection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">view_name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyConnection.view" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a relation object for the named view</p>
</dd>
</dl>

</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">DuckDBPyRelation</span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">pybind11_object</span></code></p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}">Relational API page</a>.
<br><dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.aggregate">
<span class="sig-name descname"><span class="pre">aggregate</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">aggr_expr</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">group_expr</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.aggregate" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Compute the aggregate aggr_expr by the optional groups group_expr on the relation</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#aggregate">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py attribute">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.alias">
<span class="sig-name descname"><span class="pre">alias</span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.alias" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Get the name of the current alias</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#alias">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.any_value">
<span class="sig-name descname"><span class="pre">any_value</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.any_value" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Returns the first non-null value from a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#any_value">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.apply">
<span class="sig-name descname"><span class="pre">apply</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">function_name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">function_aggr</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">group_expr</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">function_parameter</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.apply" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Compute the function of a single column or a list of columns by the optional groups on the relation</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#apply">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.arg_max">
<span class="sig-name descname"><span class="pre">arg_max</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">arg_column</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">value_column</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.arg_max" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Finds the row with the maximum value for a value column and returns the value of that row for an argument column</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#arg_max">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.arg_min">
<span class="sig-name descname"><span class="pre">arg_min</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">arg_column</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">value_column</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.arg_min" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Finds the row with the minimum value for a value column and returns the value of that row for an argument column</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#arg_min">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.arrow">
<span class="sig-name descname"><span class="pre">arrow</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">batch_size</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1000000</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference external" href="https://arrow.apache.org/docs/9.0/python/generated/pyarrow.RecordBatchReader.html#pyarrow.RecordBatchReader" title="(in Apache Arrow v9.0.0)"><span class="pre">pyarrow.lib.RecordBatchReader</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.arrow" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Alias of to_arrow_reader(). We recommend using to_arrow_reader() instead.</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#arrow">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.avg">
<span class="sig-name descname"><span class="pre">avg</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.avg" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the average of a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#avg">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.bit_and">
<span class="sig-name descname"><span class="pre">bit_and</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.bit_and" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the bitwise AND of all bits present in a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#bit_and">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.bit_or">
<span class="sig-name descname"><span class="pre">bit_or</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.bit_or" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the bitwise OR of all bits present in a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#bit_or">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.bit_xor">
<span class="sig-name descname"><span class="pre">bit_xor</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.bit_xor" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the bitwise XOR of all bits present in a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#bit_xor">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.bitstring_agg">
<span class="sig-name descname"><span class="pre">bitstring_agg</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">min</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">max</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.bitstring_agg" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes a bitstring with bits set for each distinct value in a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#bitstring_agg">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.bool_and">
<span class="sig-name descname"><span class="pre">bool_and</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.bool_and" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the logical AND of all values present in a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#bool_and">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.bool_or">
<span class="sig-name descname"><span class="pre">bool_or</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.bool_or" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the logical OR of all values present in a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#bool_or">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.close">
<span class="sig-name descname"><span class="pre">close</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.close" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Closes the result</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#close">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py attribute">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.columns">
<span class="sig-name descname"><span class="pre">columns</span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.columns" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Return a list containing the names of the columns of the relation.</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#columns">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.count">
<span class="sig-name descname"><span class="pre">count</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.count" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the number of elements present in a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#count">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.create">
<span class="sig-name descname"><span class="pre">create</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">table_name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.create" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Creates a new table named table_name with the contents of the relation object</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#create">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.create_view">
<span class="sig-name descname"><span class="pre">create_view</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">view_name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">replace</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">True</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.create_view" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Creates a view named view_name that refers to the relation object</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#create_view">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.cross">
<span class="sig-name descname"><span class="pre">cross</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">other_rel</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.cross" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create cross/cartesian product of two relational objects</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#cross">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.cume_dist">
<span class="sig-name descname"><span class="pre">cume_dist</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.cume_dist" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the cumulative distribution within the partition</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#cume_dist">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.dense_rank">
<span class="sig-name descname"><span class="pre">dense_rank</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.dense_rank" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the dense rank within the partition</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#dense_rank">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.describe">
<span class="sig-name descname"><span class="pre">describe</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.describe" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Gives basic statistics (e.g., min, max) and if NULL exists for each column of the relation.</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#describe">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py attribute">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.description">
<span class="sig-name descname"><span class="pre">description</span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.description" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Return the description of the result</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#description">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.df">
<span class="sig-name descname"><span class="pre">df</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">date_as_object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference external" href="https://pandas.pydata.org/pandas-docs/version/3.0/reference/api/pandas.DataFrame.html#pandas.DataFrame" title="(in pandas v3.0)"><span class="pre">pandas.DataFrame</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.df" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Execute and fetch all rows as a pandas DataFrame</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#df">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.distinct">
<span class="sig-name descname"><span class="pre">distinct</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.distinct" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Retrieve distinct rows from this relation object</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#distinct">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py attribute">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.dtypes">
<span class="sig-name descname"><span class="pre">dtypes</span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.dtypes" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Return a list containing the types of the columns of the relation.</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#dtypes">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.except_">
<span class="sig-name descname"><span class="pre">except_</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">other_rel</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.except_" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create the set except of this relation object with another relation object in other_rel</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#except_">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.execute">
<span class="sig-name descname"><span class="pre">execute</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.execute" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Transform the relation into a result set</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#execute">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.explain">
<span class="sig-name descname"><span class="pre">explain</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">type</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.ExplainType" title="_duckdb.ExplainType"><span class="pre">_duckdb.ExplainType</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">'standard'</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">str</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.explain" title="Link to this definition">&#182;</a>
</dt>
<dd>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#explain">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.favg">
<span class="sig-name descname"><span class="pre">favg</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.favg" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the average of all values present in a given expression using a more accurate floating point summation (Kahan Sum)</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#favg">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.fetch_arrow_reader">
<span class="sig-name descname"><span class="pre">fetch_arrow_reader</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">batch_size</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1000000</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">object</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.fetch_arrow_reader" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Execute and return an Arrow Record Batch Reader that yields all rows</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#fetch_arrow_reader">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.fetch_arrow_table">
<span class="sig-name descname"><span class="pre">fetch_arrow_table</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">batch_size</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1000000</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">object</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.fetch_arrow_table" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Execute and fetch all rows as an Arrow Table</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#fetch_arrow_table">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.fetch_df_chunk">
<span class="sig-name descname"><span class="pre">fetch_df_chunk</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">vectors_per_chunk</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">date_as_object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference external" href="https://pandas.pydata.org/pandas-docs/version/3.0/reference/api/pandas.DataFrame.html#pandas.DataFrame" title="(in pandas v3.0)"><span class="pre">pandas.DataFrame</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.fetch_df_chunk" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Execute and fetch a chunk of the rows</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#fetch_df_chunk">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.fetch_record_batch">
<span class="sig-name descname"><span class="pre">fetch_record_batch</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">rows_per_batch</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1000000</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">object</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.fetch_record_batch" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Execute and return an Arrow Record Batch Reader that yields all rows</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#fetch_record_batch">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.fetchall">
<span class="sig-name descname"><span class="pre">fetchall</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">list</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.fetchall" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Execute and fetch all rows as a list of tuples</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#fetchall">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.fetchdf">
<span class="sig-name descname"><span class="pre">fetchdf</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">date_as_object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference external" href="https://pandas.pydata.org/pandas-docs/version/3.0/reference/api/pandas.DataFrame.html#pandas.DataFrame" title="(in pandas v3.0)"><span class="pre">pandas.DataFrame</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.fetchdf" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Execute and fetch all rows as a pandas DataFrame</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#fetchdf">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.fetchmany">
<span class="sig-name descname"><span class="pre">fetchmany</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">size</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">list</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.fetchmany" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Execute and fetch the next set of rows as a list of tuples</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#fetchmany">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.fetchnumpy">
<span class="sig-name descname"><span class="pre">fetchnumpy</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">dict</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.fetchnumpy" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Execute and fetch all rows as a Python dict mapping each column to one numpy arrays</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#fetchnumpy">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.fetchone">
<span class="sig-name descname"><span class="pre">fetchone</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">tuple</span><span class="p"><span class="pre">]</span></span></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.fetchone" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Execute and fetch a single row as a tuple</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#fetchone">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.filter">
<span class="sig-name descname"><span class="pre">filter</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">filter_expr</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.filter" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Filter the relation object by the filter in filter_expr</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#filter">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.first">
<span class="sig-name descname"><span class="pre">first</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.first" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Returns the first value of a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#first">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.first_value">
<span class="sig-name descname"><span class="pre">first_value</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.first_value" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the first value within the group or partition</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#first_value">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.fsum">
<span class="sig-name descname"><span class="pre">fsum</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.fsum" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the sum of all values present in a given expression using a more accurate floating point summation (Kahan Sum)</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#fsum">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.geomean">
<span class="sig-name descname"><span class="pre">geomean</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.geomean" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the geometric mean over all values present in a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#geomean">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.histogram">
<span class="sig-name descname"><span class="pre">histogram</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.histogram" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the histogram over all values present in a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#histogram">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.insert">
<span class="sig-name descname"><span class="pre">insert</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">values</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.insert" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Inserts the given values into the relation</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#insert">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.insert_into">
<span class="sig-name descname"><span class="pre">insert_into</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">table_name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.insert_into" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Inserts the relation object into an existing table named table_name</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#insert_into">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.intersect">
<span class="sig-name descname"><span class="pre">intersect</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">other_rel</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.intersect" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create the set intersection of this relation object with another relation object in other_rel</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#intersect">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.join">
<span class="sig-name descname"><span class="pre">join</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">other_rel</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">condition</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">how</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">'inner'</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.join" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Join the relation object with another relation object in other_rel using the join condition expression in join_condition. Types supported are &#8216;inner&#8217;, &#8216;left&#8217;, &#8216;right&#8217;, &#8216;outer&#8217;, &#8216;semi&#8217; and &#8216;anti&#8217;</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#join">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.lag">
<span class="sig-name descname"><span class="pre">lag</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">offset</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">default_value</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">'NULL'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">ignore_nulls</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.lag" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the lag within the partition</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#lag">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.last">
<span class="sig-name descname"><span class="pre">last</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.last" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Returns the last value of a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#last">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.last_value">
<span class="sig-name descname"><span class="pre">last_value</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.last_value" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the last value within the group or partition</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#last_value">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.lead">
<span class="sig-name descname"><span class="pre">lead</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">offset</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">default_value</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">'NULL'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">ignore_nulls</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.lead" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the lead within the partition</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#lead">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.limit">
<span class="sig-name descname"><span class="pre">limit</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">n</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">offset</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">0</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.limit" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Only retrieve the first n rows from this relation object, starting at offset</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#limit">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.list">
<span class="sig-name descname"><span class="pre">list</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.list" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Returns a list containing all values present in a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#list">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.map">
<span class="sig-name descname"><span class="pre">map</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">map_function</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">collections.abc.Callable</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">schema</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.map" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Calls the passed function on the relation</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#map">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.max">
<span class="sig-name descname"><span class="pre">max</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.max" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Returns the maximum value present in a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#max">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.mean">
<span class="sig-name descname"><span class="pre">mean</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.mean" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the average of a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#mean">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.median">
<span class="sig-name descname"><span class="pre">median</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.median" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the median over all values present in a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#median">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.min">
<span class="sig-name descname"><span class="pre">min</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.min" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Returns the minimum value present in a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#min">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.mode">
<span class="sig-name descname"><span class="pre">mode</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.mode" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the mode over all values present in a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#mode">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.n_tile">
<span class="sig-name descname"><span class="pre">n_tile</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">num_buckets</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.n_tile" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Divides the partition as equally as possible into num_buckets</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#n_tile">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.nth_value">
<span class="sig-name descname"><span class="pre">nth_value</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">offset</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">ignore_nulls</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.nth_value" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the nth value within the partition</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#nth_value">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.order">
<span class="sig-name descname"><span class="pre">order</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">order_expr</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.order" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Reorder the relation object by order_expr</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#order">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.percent_rank">
<span class="sig-name descname"><span class="pre">percent_rank</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.percent_rank" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the relative rank within the partition</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#percent_rank">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.pl">
<span class="sig-name descname"><span class="pre">pl</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">batch_size</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1000000</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">lazy</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">duckdb::PolarsDataFrame</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.pl" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Execute and fetch all rows as a Polars DataFrame</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#pl">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.product">
<span class="sig-name descname"><span class="pre">product</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.product" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Returns the product of all values present in a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#product">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.project">
<span class="sig-name descname"><span class="pre">project</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.project" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Project the relation object by the projection in project_expr</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#project">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.quantile">
<span class="sig-name descname"><span class="pre">quantile</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">q</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">0.5</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.quantile" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the exact quantile value for a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#quantile">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.quantile_cont">
<span class="sig-name descname"><span class="pre">quantile_cont</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">q</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">0.5</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.quantile_cont" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the interpolated quantile value for a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#quantile_cont">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.quantile_disc">
<span class="sig-name descname"><span class="pre">quantile_disc</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">q</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">0.5</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.quantile_disc" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the exact quantile value for a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#quantile_disc">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.query">
<span class="sig-name descname"><span class="pre">query</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">virtual_table_name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">sql_query</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.query" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Run the given SQL query in sql_query on the view named virtual_table_name that refers to the relation object</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#query">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.rank">
<span class="sig-name descname"><span class="pre">rank</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.rank" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the rank within the partition</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#rank">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.rank_dense">
<span class="sig-name descname"><span class="pre">rank_dense</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.rank_dense" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the dense rank within the partition</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#rank_dense">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.row_number">
<span class="sig-name descname"><span class="pre">row_number</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.row_number" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the row number within the partition</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#row_number">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.select">
<span class="sig-name descname"><span class="pre">select</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.select" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Project the relation object by the projection in project_expr</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#select">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.select_dtypes">
<span class="sig-name descname"><span class="pre">select_dtypes</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">types</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.select_dtypes" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Select columns from the relation, by filtering based on type(s)</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#select_dtypes">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.select_types">
<span class="sig-name descname"><span class="pre">select_types</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">types</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.select_types" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Select columns from the relation, by filtering based on type(s)</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#select_types">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.set_alias">
<span class="sig-name descname"><span class="pre">set_alias</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">alias</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.set_alias" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Rename the relation object to new alias</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#set_alias">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py attribute">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.shape">
<span class="sig-name descname"><span class="pre">shape</span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.shape" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Tuple of # of rows, # of columns in relation.</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#shape">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.show">
<span class="sig-name descname"><span class="pre">show</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">max_width</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">SupportsInt</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">max_rows</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">SupportsInt</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">max_col_width</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">SupportsInt</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">null_value</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">str</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">render_mode</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.show" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Display a summary of the data</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#show">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.sort">
<span class="sig-name descname"><span class="pre">sort</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.sort" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Reorder the relation object by the provided expressions</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#sort">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.sql_query">
<span class="sig-name descname"><span class="pre">sql_query</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">str</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.sql_query" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Get the SQL query that is equivalent to the relation</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#sql_query">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.std">
<span class="sig-name descname"><span class="pre">std</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.std" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the sample standard deviation for a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#std">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.stddev">
<span class="sig-name descname"><span class="pre">stddev</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.stddev" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the sample standard deviation for a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#stddev">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.stddev_pop">
<span class="sig-name descname"><span class="pre">stddev_pop</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.stddev_pop" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the population standard deviation for a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#stddev_pop">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.stddev_samp">
<span class="sig-name descname"><span class="pre">stddev_samp</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.stddev_samp" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the sample standard deviation for a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#stddev_samp">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.string_agg">
<span class="sig-name descname"><span class="pre">string_agg</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">sep</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">','</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.string_agg" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Concatenates the values present in a given expression with a separator</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#string_agg">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.sum">
<span class="sig-name descname"><span class="pre">sum</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.sum" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the sum of all values present in a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#sum">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.tf">
<span class="sig-name descname"><span class="pre">tf</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">dict</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.tf" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch a result as dict of TensorFlow Tensors</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#tf">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.to_arrow_reader">
<span class="sig-name descname"><span class="pre">to_arrow_reader</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">batch_size</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1000000</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference external" href="https://arrow.apache.org/docs/9.0/python/generated/pyarrow.RecordBatchReader.html#pyarrow.RecordBatchReader" title="(in Apache Arrow v9.0.0)"><span class="pre">pyarrow.lib.RecordBatchReader</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.to_arrow_reader" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Execute and return an Arrow Record Batch Reader that yields all rows</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#to_arrow_reader">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.to_arrow_table">
<span class="sig-name descname"><span class="pre">to_arrow_table</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">batch_size</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1000000</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference external" href="https://arrow.apache.org/docs/9.0/python/generated/pyarrow.Table.html#pyarrow.Table" title="(in Apache Arrow v9.0.0)"><span class="pre">pyarrow.lib.Table</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.to_arrow_table" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Execute and fetch all rows as an Arrow Table</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#to_arrow_table">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.to_csv">
<span class="sig-name descname"><span class="pre">to_csv</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">file_name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">sep</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">na_rep</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">header</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">quotechar</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">escapechar</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">date_format</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">timestamp_format</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">quoting</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">encoding</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">compression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">overwrite</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">per_thread_output</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">use_tmp_file</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">partition_by</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">write_partition_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.to_csv" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Write the relation object to a CSV file in &#8216;file_name&#8217;</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#to_csv">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.to_df">
<span class="sig-name descname"><span class="pre">to_df</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">date_as_object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference external" href="https://pandas.pydata.org/pandas-docs/version/3.0/reference/api/pandas.DataFrame.html#pandas.DataFrame" title="(in pandas v3.0)"><span class="pre">pandas.DataFrame</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.to_df" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Execute and fetch all rows as a pandas DataFrame</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#to_df">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.to_parquet">
<span class="sig-name descname"><span class="pre">to_parquet</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">file_name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">compression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">field_ids</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">row_group_size_bytes</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">row_group_size</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">overwrite</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">per_thread_output</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">use_tmp_file</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">partition_by</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">write_partition_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">append</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">filename_pattern</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">file_size_bytes</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.to_parquet" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Write the relation object to a Parquet file in &#8216;file_name&#8217;</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#to_parquet">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.to_table">
<span class="sig-name descname"><span class="pre">to_table</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">table_name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.to_table" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Creates a new table named table_name with the contents of the relation object</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#to_table">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.to_view">
<span class="sig-name descname"><span class="pre">to_view</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">view_name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">replace</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">True</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.to_view" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Creates a view named view_name that refers to the relation object</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#to_view">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.torch">
<span class="sig-name descname"><span class="pre">torch</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">dict</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.torch" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch a result as dict of PyTorch Tensors</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#torch">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py attribute">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.type">
<span class="sig-name descname"><span class="pre">type</span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.type" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Get the type of the relation.</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#type">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py attribute">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.types">
<span class="sig-name descname"><span class="pre">types</span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.types" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Return a list containing the types of the columns of the relation.</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#types">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.union">
<span class="sig-name descname"><span class="pre">union</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">union_rel</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.union" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create the set union of this relation object with another relation object in other_rel</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#union">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.unique">
<span class="sig-name descname"><span class="pre">unique</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">unique_aggr</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.unique" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Returns the distinct values in a column.</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#unique">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.update">
<span class="sig-name descname"><span class="pre">update</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">set</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">condition</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.update" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Update the given relation with the provided expressions</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#update">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.value_counts">
<span class="sig-name descname"><span class="pre">value_counts</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.value_counts" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the number of elements present in a given expression, also projecting the original expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#value_counts">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.var">
<span class="sig-name descname"><span class="pre">var</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.var" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the sample variance for a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#var">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.var_pop">
<span class="sig-name descname"><span class="pre">var_pop</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.var_pop" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the population variance for a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#var_pop">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.var_samp">
<span class="sig-name descname"><span class="pre">var_samp</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.var_samp" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the sample variance for a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#var_samp">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.variance">
<span class="sig-name descname"><span class="pre">variance</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">window_spec</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">projected_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.variance" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Computes the sample variance for a given expression</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#variance">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.write_csv">
<span class="sig-name descname"><span class="pre">write_csv</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">file_name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">sep</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">na_rep</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">header</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">quotechar</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">escapechar</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">date_format</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">timestamp_format</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">quoting</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">encoding</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">compression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">overwrite</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">per_thread_output</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">use_tmp_file</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">partition_by</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">write_partition_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.write_csv" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Write the relation object to a CSV file in &#8216;file_name&#8217;</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#write_csv">Relational API page</a>.
<br>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.DuckDBPyRelation.write_parquet">
<span class="sig-name descname"><span class="pre">write_parquet</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">file_name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">compression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">field_ids</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">row_group_size_bytes</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">row_group_size</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">overwrite</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">per_thread_output</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">use_tmp_file</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">partition_by</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">write_partition_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">append</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">filename_pattern</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">file_size_bytes</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.DuckDBPyRelation.write_parquet" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Write the relation object to a Parquet file in &#8216;file_name&#8217;</p>
Detailed examples can be found at <a href="{% link docs/current/clients/python/relational_api.md %}#write_parquet">Relational API page</a>.
<br>
</dd>
</dl>

</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.Error">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">Error</span></span><a class="headerlink" href="#duckdb.Error" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">Exception</span></code></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.ExpectedResultType">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">ExpectedResultType</span></span><a class="headerlink" href="#duckdb.ExpectedResultType" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">pybind11_object</span></code></p>
<p>Members:</p>
<p>QUERY_RESULT</p>
<p>CHANGED_ROWS</p>
<p>NOTHING</p>
<dl class="py property">
<dt class="sig sig-object py">
<span class="sig-name descname"><span class="pre">ExpectedResultType.name</span> <span class="pre">-&gt;</span> <span class="pre">str</span></span>
</dt>
<dd></dd>
</dl>

</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.ExplainType">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">ExplainType</span></span><a class="headerlink" href="#duckdb.ExplainType" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">pybind11_object</span></code></p>
<p>Members:</p>
<p>STANDARD</p>
<p>ANALYZE</p>
<dl class="py property">
<dt class="sig sig-object py">
<span class="sig-name descname"><span class="pre">ExplainType.name</span> <span class="pre">-&gt;</span> <span class="pre">str</span></span>
</dt>
<dd></dd>
</dl>

</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.Expression">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">Expression</span></span><a class="headerlink" href="#duckdb.Expression" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">pybind11_object</span></code></p>
<dl class="py method">
<dt class="sig sig-object py" id="duckdb.Expression.alias">
<span class="sig-name descname"><span class="pre">alias</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">arg0</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></span><a class="headerlink" href="#duckdb.Expression.alias" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a copy of this expression with the given alias.</p>
<dl class="simple">
<dt>Parameters:</dt>
<dd>
<p>name: The alias to use for the expression, this will affect how it can be referenced.</p>
</dd>
<dt>Returns:</dt>
<dd>
<p>Expression: self with an alias.</p>
</dd>
</dl>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.Expression.asc">
<span class="sig-name descname"><span class="pre">asc</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></span><a class="headerlink" href="#duckdb.Expression.asc" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Set the order by modifier to ASCENDING.</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.Expression.between">
<span class="sig-name descname"><span class="pre">between</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">lower</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">upper</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></span><a class="headerlink" href="#duckdb.Expression.between" title="Link to this definition">&#182;</a>
</dt>
<dd></dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.Expression.cast">
<span class="sig-name descname"><span class="pre">cast</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">type</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></span><a class="headerlink" href="#duckdb.Expression.cast" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a CastExpression to type from self</p>
<dl class="simple">
<dt>Parameters:</dt>
<dd>
<p>type: The type to cast to</p>
</dd>
<dt>Returns:</dt>
<dd>
<p>CastExpression: self::type</p>
</dd>
</dl>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.Expression.collate">
<span class="sig-name descname"><span class="pre">collate</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">collation</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></span><a class="headerlink" href="#duckdb.Expression.collate" title="Link to this definition">&#182;</a>
</dt>
<dd></dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.Expression.desc">
<span class="sig-name descname"><span class="pre">desc</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></span><a class="headerlink" href="#duckdb.Expression.desc" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Set the order by modifier to DESCENDING.</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.Expression.get_name">
<span class="sig-name descname"><span class="pre">get_name</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">str</span></span></span><a class="headerlink" href="#duckdb.Expression.get_name" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Return the stringified version of the expression.</p>
<dl class="simple">
<dt>Returns:</dt>
<dd>
<p>str: The string representation.</p>
</dd>
</dl>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.Expression.isin">
<span class="sig-name descname"><span class="pre">isin</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></em>, <em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></span><a class="headerlink" href="#duckdb.Expression.isin" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Return an IN expression comparing self to the input arguments.</p>
<dl class="simple">
<dt>Returns:</dt>
<dd>
<p>DuckDBPyExpression: The compare IN expression</p>
</dd>
</dl>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.Expression.isnotin">
<span class="sig-name descname"><span class="pre">isnotin</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></em>, <em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></span><a class="headerlink" href="#duckdb.Expression.isnotin" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Return a NOT IN expression comparing self to the input arguments.</p>
<dl class="simple">
<dt>Returns:</dt>
<dd>
<p>DuckDBPyExpression: The compare NOT IN expression</p>
</dd>
</dl>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.Expression.isnotnull">
<span class="sig-name descname"><span class="pre">isnotnull</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></span><a class="headerlink" href="#duckdb.Expression.isnotnull" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a binary IS NOT NULL expression from self</p>
<dl class="simple">
<dt>Returns:</dt>
<dd>
<p>DuckDBPyExpression: self IS NOT NULL</p>
</dd>
</dl>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.Expression.isnull">
<span class="sig-name descname"><span class="pre">isnull</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></span><a class="headerlink" href="#duckdb.Expression.isnull" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a binary IS NULL expression from self</p>
<dl class="simple">
<dt>Returns:</dt>
<dd>
<p>DuckDBPyExpression: self IS NULL</p>
</dd>
</dl>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.Expression.nulls_first">
<span class="sig-name descname"><span class="pre">nulls_first</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></span><a class="headerlink" href="#duckdb.Expression.nulls_first" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Set the NULL order by modifier to NULLS FIRST.</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.Expression.nulls_last">
<span class="sig-name descname"><span class="pre">nulls_last</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></span><a class="headerlink" href="#duckdb.Expression.nulls_last" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Set the NULL order by modifier to NULLS LAST.</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.Expression.otherwise">
<span class="sig-name descname"><span class="pre">otherwise</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">value</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></span><a class="headerlink" href="#duckdb.Expression.otherwise" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Add an ELSE &lt;value&gt; clause to the CaseExpression.</p>
<dl class="simple">
<dt>Parameters:</dt>
<dd>
<p>value: The value to use if none of the WHEN conditions are met.</p>
</dd>
<dt>Returns:</dt>
<dd>
<p>CaseExpression: self with an ELSE clause.</p>
</dd>
</dl>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.Expression.show">
<span class="sig-name descname"><span class="pre">show</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.Expression.show" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Print the stringified version of the expression.</p>
</dd>
</dl>

<dl class="py method">
<dt class="sig sig-object py" id="duckdb.Expression.when">
<span class="sig-name descname"><span class="pre">when</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">self</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">condition</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">value</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></span><a class="headerlink" href="#duckdb.Expression.when" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Add an additional WHEN &lt;condition&gt; THEN &lt;value&gt; clause to the CaseExpression.</p>
<dl class="simple">
<dt>Parameters:</dt>
<dd>
<p>condition: The condition that must be met.
value: The value to use if the condition is met.</p>
</dd>
<dt>Returns:</dt>
<dd>
<p>CaseExpression: self with an additional WHEN clause.</p>
</dd>
</dl>
</dd>
</dl>

</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.FatalException">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">FatalException</span></span><a class="headerlink" href="#duckdb.FatalException" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.DatabaseError" title="_duckdb.DatabaseError"><code class="xref py py-class docutils literal notranslate"><span class="pre">DatabaseError</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.FloatValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">FloatValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.FloatValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.FunctionExpression">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">FunctionExpression</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">function_name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></span><a class="headerlink" href="#duckdb.FunctionExpression" title="Link to this definition">&#182;</a>
</dt>
<dd></dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.HTTPException">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">HTTPException</span></span><a class="headerlink" href="#duckdb.HTTPException" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.IOException" title="_duckdb.IOException"><code class="xref py py-class docutils literal notranslate"><span class="pre">IOException</span></code></a></p>
<p>Thrown when an error occurs in the httpfs extension, or whilst downloading an extension.</p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.HugeIntegerValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">HugeIntegerValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.HugeIntegerValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.IOException">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">IOException</span></span><a class="headerlink" href="#duckdb.IOException" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.OperationalError" title="_duckdb.OperationalError"><code class="xref py py-class docutils literal notranslate"><span class="pre">OperationalError</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.IntegerValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">IntegerValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.IntegerValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.IntegrityError">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">IntegrityError</span></span><a class="headerlink" href="#duckdb.IntegrityError" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.DatabaseError" title="_duckdb.DatabaseError"><code class="xref py py-class docutils literal notranslate"><span class="pre">DatabaseError</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.InternalError">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">InternalError</span></span><a class="headerlink" href="#duckdb.InternalError" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.DatabaseError" title="_duckdb.DatabaseError"><code class="xref py py-class docutils literal notranslate"><span class="pre">DatabaseError</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.InternalException">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">InternalException</span></span><a class="headerlink" href="#duckdb.InternalException" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.InternalError" title="_duckdb.InternalError"><code class="xref py py-class docutils literal notranslate"><span class="pre">InternalError</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.InterruptException">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">InterruptException</span></span><a class="headerlink" href="#duckdb.InterruptException" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.DatabaseError" title="_duckdb.DatabaseError"><code class="xref py py-class docutils literal notranslate"><span class="pre">DatabaseError</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.IntervalValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">IntervalValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.IntervalValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.InvalidInputException">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">InvalidInputException</span></span><a class="headerlink" href="#duckdb.InvalidInputException" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.ProgrammingError" title="_duckdb.ProgrammingError"><code class="xref py py-class docutils literal notranslate"><span class="pre">ProgrammingError</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.InvalidTypeException">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">InvalidTypeException</span></span><a class="headerlink" href="#duckdb.InvalidTypeException" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.ProgrammingError" title="_duckdb.ProgrammingError"><code class="xref py py-class docutils literal notranslate"><span class="pre">ProgrammingError</span></code></a></p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.LambdaExpression">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">LambdaExpression</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">lhs</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">rhs</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></span><a class="headerlink" href="#duckdb.LambdaExpression" title="Link to this definition">&#182;</a>
</dt>
<dd></dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.ListValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">ListValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">child_type</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">DuckDBPyType</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.ListValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.LongValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">LongValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.LongValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.MapValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">MapValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">key_type</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">DuckDBPyType</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">value_type</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">DuckDBPyType</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.MapValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.NotImplementedException">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">NotImplementedException</span></span><a class="headerlink" href="#duckdb.NotImplementedException" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.NotSupportedError" title="_duckdb.NotSupportedError"><code class="xref py py-class docutils literal notranslate"><span class="pre">NotSupportedError</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.NotSupportedError">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">NotSupportedError</span></span><a class="headerlink" href="#duckdb.NotSupportedError" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.DatabaseError" title="_duckdb.DatabaseError"><code class="xref py py-class docutils literal notranslate"><span class="pre">DatabaseError</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.NullValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">NullValue</span></span><a class="headerlink" href="#duckdb.NullValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.OperationalError">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">OperationalError</span></span><a class="headerlink" href="#duckdb.OperationalError" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.DatabaseError" title="_duckdb.DatabaseError"><code class="xref py py-class docutils literal notranslate"><span class="pre">DatabaseError</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.OutOfMemoryException">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">OutOfMemoryException</span></span><a class="headerlink" href="#duckdb.OutOfMemoryException" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.OperationalError" title="_duckdb.OperationalError"><code class="xref py py-class docutils literal notranslate"><span class="pre">OperationalError</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.OutOfRangeException">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">OutOfRangeException</span></span><a class="headerlink" href="#duckdb.OutOfRangeException" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.DataError" title="_duckdb.DataError"><code class="xref py py-class docutils literal notranslate"><span class="pre">DataError</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.ParserException">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">ParserException</span></span><a class="headerlink" href="#duckdb.ParserException" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.ProgrammingError" title="_duckdb.ProgrammingError"><code class="xref py py-class docutils literal notranslate"><span class="pre">ProgrammingError</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.PermissionException">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">PermissionException</span></span><a class="headerlink" href="#duckdb.PermissionException" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.DatabaseError" title="_duckdb.DatabaseError"><code class="xref py py-class docutils literal notranslate"><span class="pre">DatabaseError</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.ProgrammingError">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">ProgrammingError</span></span><a class="headerlink" href="#duckdb.ProgrammingError" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.DatabaseError" title="_duckdb.DatabaseError"><code class="xref py py-class docutils literal notranslate"><span class="pre">DatabaseError</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.PythonExceptionHandling">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">PythonExceptionHandling</span></span><a class="headerlink" href="#duckdb.PythonExceptionHandling" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">pybind11_object</span></code></p>
<p>Members:</p>
<p>DEFAULT</p>
<p>RETURN_NULL</p>
<dl class="py property">
<dt class="sig sig-object py">
<span class="sig-name descname"><span class="pre">PythonExceptionHandling.name</span> <span class="pre">-&gt;</span> <span class="pre">str</span></span>
</dt>
<dd></dd>
</dl>

</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.RenderMode">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">RenderMode</span></span><a class="headerlink" href="#duckdb.RenderMode" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">pybind11_object</span></code></p>
<p>Members:</p>
<p>ROWS</p>
<p>COLUMNS</p>
<dl class="py property">
<dt class="sig sig-object py">
<span class="sig-name descname"><span class="pre">RenderMode.name</span> <span class="pre">-&gt;</span> <span class="pre">str</span></span>
</dt>
<dd></dd>
</dl>

</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.SQLExpression">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">SQLExpression</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">expression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.Expression" title="_duckdb.Expression"><span class="pre">_duckdb.Expression</span></a></span></span><a class="headerlink" href="#duckdb.SQLExpression" title="Link to this definition">&#182;</a>
</dt>
<dd></dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.SequenceException">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">SequenceException</span></span><a class="headerlink" href="#duckdb.SequenceException" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.DatabaseError" title="_duckdb.DatabaseError"><code class="xref py py-class docutils literal notranslate"><span class="pre">DatabaseError</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.SerializationException">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">SerializationException</span></span><a class="headerlink" href="#duckdb.SerializationException" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.OperationalError" title="_duckdb.OperationalError"><code class="xref py py-class docutils literal notranslate"><span class="pre">OperationalError</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.ShortValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">ShortValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.ShortValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.StarExpression">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">StarExpression</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.StarExpression" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Overloaded function.</p>
<ol class="arabic simple">
<li><p>StarExpression(<a href="#id9"><span class="problematic" id="id10">*</span></a>, exclude: object = None) -&gt; _duckdb.Expression</p></li>
<li><p>StarExpression() -&gt; _duckdb.Expression</p></li>
</ol>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.Statement">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">Statement</span></span><a class="headerlink" href="#duckdb.Statement" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">pybind11_object</span></code></p>
<dl class="py property">
<dt class="sig sig-object py" id="duckdb.Statement.expected_result_type">
<span class="property"><span class="k"><span class="pre">property</span></span><span class="w"> </span></span><span class="sig-name descname"><span class="pre">expected_result_type</span></span><a class="headerlink" href="#duckdb.Statement.expected_result_type" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Get the expected type of result produced by this statement, actual type may vary depending on the statement.</p>
</dd>
</dl>

<dl class="py property">
<dt class="sig sig-object py" id="duckdb.Statement.named_parameters">
<span class="property"><span class="k"><span class="pre">property</span></span><span class="w"> </span></span><span class="sig-name descname"><span class="pre">named_parameters</span></span><a class="headerlink" href="#duckdb.Statement.named_parameters" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Get the map of named parameters this statement has.</p>
</dd>
</dl>

<dl class="py property">
<dt class="sig sig-object py" id="duckdb.Statement.query">
<span class="property"><span class="k"><span class="pre">property</span></span><span class="w"> </span></span><span class="sig-name descname"><span class="pre">query</span></span><a class="headerlink" href="#duckdb.Statement.query" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Get the query equivalent to this statement.</p>
</dd>
</dl>

<dl class="py property">
<dt class="sig sig-object py" id="duckdb.Statement.type">
<span class="property"><span class="k"><span class="pre">property</span></span><span class="w"> </span></span><span class="sig-name descname"><span class="pre">type</span></span><a class="headerlink" href="#duckdb.Statement.type" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Get the type of the statement.</p>
</dd>
</dl>

</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.StatementType">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">StatementType</span></span><a class="headerlink" href="#duckdb.StatementType" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">pybind11_object</span></code></p>
<p>Members:</p>
<p>INVALID</p>
<p>SELECT</p>
<p>INSERT</p>
<p>UPDATE</p>
<p>CREATE</p>
<p>DELETE</p>
<p>PREPARE</p>
<p>EXECUTE</p>
<p>ALTER</p>
<p>TRANSACTION</p>
<p>COPY</p>
<p>ANALYZE</p>
<p>VARIABLE_SET</p>
<p>CREATE_FUNC</p>
<p>EXPLAIN</p>
<p>DROP</p>
<p>EXPORT</p>
<p>PRAGMA</p>
<p>VACUUM</p>
<p>CALL</p>
<p>SET</p>
<p>LOAD</p>
<p>RELATION</p>
<p>EXTENSION</p>
<p>LOGICAL_PLAN</p>
<p>ATTACH</p>
<p>DETACH</p>
<p>MULTI</p>
<p>COPY_DATABASE</p>
<p>MERGE_INTO</p>
<dl class="py property">
<dt class="sig sig-object py">
<span class="sig-name descname"><span class="pre">StatementType.name</span> <span class="pre">-&gt;</span> <span class="pre">str</span></span>
</dt>
<dd></dd>
</dl>

</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.StringValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">StringValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.StringValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.StructValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">StructValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">children</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">dict</span><span class="p"><span class="pre">[</span></span><span class="pre">str</span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="pre">DuckDBPyType</span><span class="p"><span class="pre">]</span></span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.StructValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.SyntaxException">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">SyntaxException</span></span><a class="headerlink" href="#duckdb.SyntaxException" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.ProgrammingError" title="_duckdb.ProgrammingError"><code class="xref py py-class docutils literal notranslate"><span class="pre">ProgrammingError</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.TimeTimeZoneValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">TimeTimeZoneValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.TimeTimeZoneValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.TimeValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">TimeValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.TimeValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.TimestampMillisecondValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">TimestampMillisecondValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.TimestampMillisecondValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.TimestampNanosecondValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">TimestampNanosecondValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.TimestampNanosecondValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.TimestampSecondValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">TimestampSecondValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.TimestampSecondValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.TimestampTimeZoneValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">TimestampTimeZoneValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.TimestampTimeZoneValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.TimestampValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">TimestampValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.TimestampValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.TransactionException">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">TransactionException</span></span><a class="headerlink" href="#duckdb.TransactionException" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.OperationalError" title="_duckdb.OperationalError"><code class="xref py py-class docutils literal notranslate"><span class="pre">OperationalError</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.TypeMismatchException">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">TypeMismatchException</span></span><a class="headerlink" href="#duckdb.TypeMismatchException" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.DataError" title="_duckdb.DataError"><code class="xref py py-class docutils literal notranslate"><span class="pre">DataError</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.UUIDValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">UUIDValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.UUIDValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.UnionType">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">UnionType</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">members</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">dict</span><span class="p"><span class="pre">[</span></span><span class="pre">str</span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="pre">DuckDBPyType</span><span class="p"><span class="pre">]</span></span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.UnionType" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.UnsignedBinaryValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">UnsignedBinaryValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.UnsignedBinaryValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.UnsignedHugeIntegerValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">UnsignedHugeIntegerValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.UnsignedHugeIntegerValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.UnsignedIntegerValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">UnsignedIntegerValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.UnsignedIntegerValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.UnsignedLongValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">UnsignedLongValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.UnsignedLongValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.UnsignedShortValue">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">UnsignedShortValue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.UnsignedShortValue" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <a class="reference internal" href="#duckdb.Value" title="duckdb.value.constant.Value"><code class="xref py py-class docutils literal notranslate"><span class="pre">Value</span></code></a></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.Value">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">Value</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">type</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">DuckDBPyType</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.Value" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">object</span></code></p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.Warning">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">Warning</span></span><a class="headerlink" href="#duckdb.Warning" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">Exception</span></code></p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.__annotate__">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">__annotate__</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">format</span></span></em>, <em class="sig-param"><span class="positional-only-separator o"><abbr title="Positional-only parameter separator (PEP 570)"><span class="pre">/</span></abbr></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.__annotate__" title="Link to this definition">&#182;</a>
</dt>
<dd></dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.aggregate">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">aggregate</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">df</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference external" href="https://pandas.pydata.org/pandas-docs/version/3.0/reference/api/pandas.DataFrame.html#pandas.DataFrame" title="(in pandas v3.0)"><span class="pre">pandas.DataFrame</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">aggr_expr</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">group_expr</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.aggregate" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Compute the aggregate aggr_expr by the optional groups group_expr on the relation</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.alias">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">alias</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">df</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference external" href="https://pandas.pydata.org/pandas-docs/version/3.0/reference/api/pandas.DataFrame.html#pandas.DataFrame" title="(in pandas v3.0)"><span class="pre">pandas.DataFrame</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">alias</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.alias" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Rename the relation object to new alias</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.append">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">append</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">table_name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">df</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference external" href="https://pandas.pydata.org/pandas-docs/version/3.0/reference/api/pandas.DataFrame.html#pandas.DataFrame" title="(in pandas v3.0)"><span class="pre">pandas.DataFrame</span></a></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">by_name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span></span><a class="headerlink" href="#duckdb.append" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Append the passed DataFrame to the named table</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.array_type">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">array_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">type</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">size</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></span><a class="headerlink" href="#duckdb.array_type" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create an array type object of &#8216;type&#8217;</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.arrow">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">arrow</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.arrow" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Overloaded function.</p>
<ol class="arabic simple">
<li><p>arrow(rows_per_batch: typing.SupportsInt = 1000000, <a href="#id11"><span class="problematic" id="id12">*</span></a>, connection: duckdb.DuckDBPyConnection = None) -&gt; pyarrow.lib.RecordBatchReader</p></li>
</ol>
<p>Alias of to_arrow_reader(). We recommend using to_arrow_reader() instead.</p>
<ol class="arabic simple" start="2">
<li><p>arrow(arrow_object: object, <a href="#id13"><span class="problematic" id="id14">*</span></a>, connection: duckdb.DuckDBPyConnection = None) -&gt; _duckdb.DuckDBPyRelation</p></li>
</ol>
<p>Create a relation object from an Arrow object</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.begin">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">begin</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span></span><a class="headerlink" href="#duckdb.begin" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Start a new transaction</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.checkpoint">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">checkpoint</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span></span><a class="headerlink" href="#duckdb.checkpoint" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Synchronizes data in the write-ahead log (WAL) to the database data file (no-op for in-memory connections)</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.close">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">close</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.close" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Close the connection</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.commit">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">commit</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span></span><a class="headerlink" href="#duckdb.commit" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Commit changes performed within a transaction</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.connect">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">connect</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">database</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">':memory:'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">read_only</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">config</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">dict</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span></span><a class="headerlink" href="#duckdb.connect" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a DuckDB database instance. Can take a database file name to read/write persistent data and a read_only flag if no changes are desired</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.create_function">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">create_function</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">function</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">collections.abc.Callable</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">parameters</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">return_type</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">*</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">type</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">_duckdb._func.PythonUDFType</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">&lt;PythonUDFType.NATIVE:</span> <span class="pre">0&gt;</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">null_handling</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">_duckdb._func.FunctionNullHandling</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">&lt;FunctionNullHandling.DEFAULT:</span> <span class="pre">0&gt;</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">exception_handling</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.PythonExceptionHandling" title="_duckdb.PythonExceptionHandling"><span class="pre">_duckdb.PythonExceptionHandling</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">&lt;PythonExceptionHandling.DEFAULT:</span> <span class="pre">0&gt;</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">side_effects</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span></span><a class="headerlink" href="#duckdb.create_function" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a DuckDB function out of the passing in Python function so it can be used in queries</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.cursor">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">cursor</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span></span><a class="headerlink" href="#duckdb.cursor" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a duplicate of the current connection</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.decimal_type">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">decimal_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">width</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">scale</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></span><a class="headerlink" href="#duckdb.decimal_type" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a decimal type with &#8216;width&#8217; and &#8216;scale&#8217;</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.default_connection">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">default_connection</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span></span><a class="headerlink" href="#duckdb.default_connection" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Retrieve the connection currently registered as the default to be used by the module</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.description">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">description</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">list</span><span class="p"><span class="pre">]</span></span></span></span><a class="headerlink" href="#duckdb.description" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Get result set attributes, mainly column names</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.df">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">df</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.df" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Overloaded function.</p>
<ol class="arabic simple">
<li><p>df(<a href="#id15"><span class="problematic" id="id16">*</span></a>, date_as_object: bool = False, connection: duckdb.DuckDBPyConnection = None) -&gt; pandas.DataFrame</p></li>
</ol>
<p>Fetch a result as DataFrame following execute()</p>
<ol class="arabic simple" start="2">
<li><p>df(<a href="#id17"><span class="problematic" id="id18">*</span></a>, date_as_object: bool = False, connection: duckdb.DuckDBPyConnection = None) -&gt; pandas.DataFrame</p></li>
</ol>
<p>Fetch a result as DataFrame following execute()</p>
<ol class="arabic simple" start="3">
<li><p>df(df: pandas.DataFrame, <a href="#id19"><span class="problematic" id="id20">*</span></a>, connection: duckdb.DuckDBPyConnection = None) -&gt; _duckdb.DuckDBPyRelation</p></li>
</ol>
<p>Create a relation object from the DataFrame df</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.disable_profiling">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">disable_profiling</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.disable_profiling" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Disable profiling for the current connection</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.distinct">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">distinct</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">df</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference external" href="https://pandas.pydata.org/pandas-docs/version/3.0/reference/api/pandas.DataFrame.html#pandas.DataFrame" title="(in pandas v3.0)"><span class="pre">pandas.DataFrame</span></a></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.distinct" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Retrieve distinct rows from this relation object</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.dtype">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">dtype</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">type_str</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></span><a class="headerlink" href="#duckdb.dtype" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a type object by parsing the &#8216;type_str&#8217; string</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.duplicate">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">duplicate</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span></span><a class="headerlink" href="#duckdb.duplicate" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a duplicate of the current connection</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.enable_profiling">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">enable_profiling</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.enable_profiling" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Enable profiling for the current connection</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.enum_type">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">enum_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">type</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">values</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">list</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></span><a class="headerlink" href="#duckdb.enum_type" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create an enum type of underlying &#8216;type&#8217;, consisting of the list of &#8216;values&#8217;</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.execute">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">execute</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">query</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">parameters</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span></span><a class="headerlink" href="#duckdb.execute" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Execute the given SQL query, optionally using prepared statements with parameters set</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.executemany">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">executemany</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">query</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">parameters</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span></span><a class="headerlink" href="#duckdb.executemany" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Execute the given prepared statement multiple times using the list of parameter sets in parameters</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.extract_statements">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">extract_statements</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">query</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">list</span></span></span><a class="headerlink" href="#duckdb.extract_statements" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Parse the query string and extract the Statement object(s) produced</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.fetch_arrow_table">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">fetch_arrow_table</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">rows_per_batch</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1000000</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference external" href="https://arrow.apache.org/docs/9.0/python/generated/pyarrow.Table.html#pyarrow.Table" title="(in Apache Arrow v9.0.0)"><span class="pre">pyarrow.lib.Table</span></a></span></span><a class="headerlink" href="#duckdb.fetch_arrow_table" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch a result as Arrow table following execute()</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.fetch_df">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">fetch_df</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">date_as_object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference external" href="https://pandas.pydata.org/pandas-docs/version/3.0/reference/api/pandas.DataFrame.html#pandas.DataFrame" title="(in pandas v3.0)"><span class="pre">pandas.DataFrame</span></a></span></span><a class="headerlink" href="#duckdb.fetch_df" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch a result as DataFrame following execute()</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.fetch_df_chunk">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">fetch_df_chunk</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">vectors_per_chunk</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">date_as_object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference external" href="https://pandas.pydata.org/pandas-docs/version/3.0/reference/api/pandas.DataFrame.html#pandas.DataFrame" title="(in pandas v3.0)"><span class="pre">pandas.DataFrame</span></a></span></span><a class="headerlink" href="#duckdb.fetch_df_chunk" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch a chunk of the result as DataFrame following execute()</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.fetch_record_batch">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">fetch_record_batch</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">rows_per_batch</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1000000</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference external" href="https://arrow.apache.org/docs/9.0/python/generated/pyarrow.RecordBatchReader.html#pyarrow.RecordBatchReader" title="(in Apache Arrow v9.0.0)"><span class="pre">pyarrow.lib.RecordBatchReader</span></a></span></span><a class="headerlink" href="#duckdb.fetch_record_batch" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch an Arrow RecordBatchReader following execute()</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.fetchall">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">fetchall</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">list</span></span></span><a class="headerlink" href="#duckdb.fetchall" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch all rows from a result following execute</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.fetchdf">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">fetchdf</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">date_as_object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference external" href="https://pandas.pydata.org/pandas-docs/version/3.0/reference/api/pandas.DataFrame.html#pandas.DataFrame" title="(in pandas v3.0)"><span class="pre">pandas.DataFrame</span></a></span></span><a class="headerlink" href="#duckdb.fetchdf" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch a result as DataFrame following execute()</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.fetchmany">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">fetchmany</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">size</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">list</span></span></span><a class="headerlink" href="#duckdb.fetchmany" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch the next set of rows from a result following execute</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.fetchnumpy">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">fetchnumpy</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">dict</span></span></span><a class="headerlink" href="#duckdb.fetchnumpy" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch a result as list of NumPy arrays following execute</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.fetchone">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">fetchone</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">tuple</span><span class="p"><span class="pre">]</span></span></span></span><a class="headerlink" href="#duckdb.fetchone" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch a single row from a result following execute</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.filesystem_is_registered">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">filesystem_is_registered</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">bool</span></span></span><a class="headerlink" href="#duckdb.filesystem_is_registered" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Check if a filesystem with the provided name is currently registered</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.filter">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">filter</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">df</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference external" href="https://pandas.pydata.org/pandas-docs/version/3.0/reference/api/pandas.DataFrame.html#pandas.DataFrame" title="(in pandas v3.0)"><span class="pre">pandas.DataFrame</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">filter_expr</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.filter" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Filter the relation object by the filter in filter_expr</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.from_arrow">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">from_arrow</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">arrow_object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.from_arrow" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a relation object from an Arrow object</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.from_csv_auto">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">from_csv_auto</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">path_or_buffer</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.from_csv_auto" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a relation object from the CSV file in &#8216;name&#8217;</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.from_df">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">from_df</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">df</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference external" href="https://pandas.pydata.org/pandas-docs/version/3.0/reference/api/pandas.DataFrame.html#pandas.DataFrame" title="(in pandas v3.0)"><span class="pre">pandas.DataFrame</span></a></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.from_df" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a relation object from the DataFrame in df</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.from_parquet">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">from_parquet</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.from_parquet" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Overloaded function.</p>
<ol class="arabic simple">
<li><p>from_parquet(file_glob: str, binary_as_string: bool = False, <a href="#id21"><span class="problematic" id="id22">*</span></a>, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None, connection: duckdb.DuckDBPyConnection = None) -&gt; _duckdb.DuckDBPyRelation</p></li>
</ol>
<p>Create a relation object from the Parquet files in file_glob</p>
<ol class="arabic simple" start="2">
<li><p>from_parquet(file_globs: collections.abc.Sequence[str], binary_as_string: bool = False, <a href="#id23"><span class="problematic" id="id24">*</span></a>, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None, connection: duckdb.DuckDBPyConnection = None) -&gt; _duckdb.DuckDBPyRelation</p></li>
</ol>
<p>Create a relation object from the Parquet files in file_globs</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.from_query">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">from_query</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">query</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">alias</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">params</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.from_query" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.get_profiling_information">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">get_profiling_information</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">format</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">'json'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">str</span></span></span><a class="headerlink" href="#duckdb.get_profiling_information" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Get profiling information from a query</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.get_table_names">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">get_table_names</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">query</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">qualified</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">set</span><span class="p"><span class="pre">[</span></span><span class="pre">str</span><span class="p"><span class="pre">]</span></span></span></span><a class="headerlink" href="#duckdb.get_table_names" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Extract the required table names from a query</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.install_extension">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">install_extension</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">extension</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">force_install</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">repository</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">repository_url</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">version</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.install_extension" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Install an extension by name, with an optional version and/or repository to get the extension from</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.interrupt">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">interrupt</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.interrupt" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Interrupt pending operations</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.limit">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">limit</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">df</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference external" href="https://pandas.pydata.org/pandas-docs/version/3.0/reference/api/pandas.DataFrame.html#pandas.DataFrame" title="(in pandas v3.0)"><span class="pre">pandas.DataFrame</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">n</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">offset</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">0</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.limit" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Only retrieve the first n rows from this relation object, starting at offset</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.list_filesystems">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">list_filesystems</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">list</span></span></span><a class="headerlink" href="#duckdb.list_filesystems" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>List registered filesystems, including builtin ones</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.list_type">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">list_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">type</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></span><a class="headerlink" href="#duckdb.list_type" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a list type object of &#8216;type&#8217;</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.load_extension">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">load_extension</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">extension</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.load_extension" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Load an installed extension</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.map_type">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">map_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">key</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">value</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></span><a class="headerlink" href="#duckdb.map_type" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a map type object from &#8216;key_type&#8217; and &#8216;value_type&#8217;</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.order">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">order</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">df</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference external" href="https://pandas.pydata.org/pandas-docs/version/3.0/reference/api/pandas.DataFrame.html#pandas.DataFrame" title="(in pandas v3.0)"><span class="pre">pandas.DataFrame</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">order_expr</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.order" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Reorder the relation object by order_expr</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.pl">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">pl</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">rows_per_batch</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1000000</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">lazy</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">duckdb::PolarsDataFrame</span></span></span><a class="headerlink" href="#duckdb.pl" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch a result as Polars DataFrame following execute()</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.project">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">project</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">df</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference external" href="https://pandas.pydata.org/pandas-docs/version/3.0/reference/api/pandas.DataFrame.html#pandas.DataFrame" title="(in pandas v3.0)"><span class="pre">pandas.DataFrame</span></a></span></em>, <em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.project" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Project the relation object by the projection in project_expr</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.query">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">query</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">query</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">alias</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">params</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.query" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.query_df">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">query_df</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">df</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference external" href="https://pandas.pydata.org/pandas-docs/version/3.0/reference/api/pandas.DataFrame.html#pandas.DataFrame" title="(in pandas v3.0)"><span class="pre">pandas.DataFrame</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">virtual_table_name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">sql_query</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.query_df" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Run the given SQL query in sql_query on the view named virtual_table_name that refers to the relation object</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.query_progress">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">query_progress</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">float</span></span></span><a class="headerlink" href="#duckdb.query_progress" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Query progress of pending operation</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.read_csv">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">read_csv</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">path_or_buffer</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.read_csv" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a relation object from the CSV file in &#8216;name&#8217;</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.read_json">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">read_json</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">path_or_buffer</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">sample_size</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">maximum_depth</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">records</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">str</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">format</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">str</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">date_format</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">timestamp_format</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">compression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">maximum_object_size</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">ignore_errors</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">convert_strings_to_integers</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">field_appearance_threshold</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">map_inference_threshold</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">maximum_sample_files</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">filename</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">hive_partitioning</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">union_by_name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">hive_types</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">hive_types_autocast</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">object</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.read_json" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a relation object from the JSON file in &#8216;name&#8217;</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.read_parquet">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">read_parquet</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#duckdb.read_parquet" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Overloaded function.</p>
<ol class="arabic simple">
<li><p>read_parquet(file_glob: str, binary_as_string: bool = False, <a href="#id25"><span class="problematic" id="id26">*</span></a>, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None, connection: duckdb.DuckDBPyConnection = None) -&gt; _duckdb.DuckDBPyRelation</p></li>
</ol>
<p>Create a relation object from the Parquet files in file_glob</p>
<ol class="arabic simple" start="2">
<li><p>read_parquet(file_globs: collections.abc.Sequence[str], binary_as_string: bool = False, <a href="#id27"><span class="problematic" id="id28">*</span></a>, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None, connection: duckdb.DuckDBPyConnection = None) -&gt; _duckdb.DuckDBPyRelation</p></li>
</ol>
<p>Create a relation object from the Parquet files in file_globs</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.register">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">register</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">view_name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">python_object</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span></span><a class="headerlink" href="#duckdb.register" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Register the passed Python Object value for querying with a view</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.register_filesystem">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">register_filesystem</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">filesystem</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">fsspec.AbstractFileSystem</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.register_filesystem" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Register a fsspec compliant filesystem</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.remove_function">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">remove_function</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span></span><a class="headerlink" href="#duckdb.remove_function" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Remove a previously created function</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.rollback">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">rollback</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span></span><a class="headerlink" href="#duckdb.rollback" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Roll back changes performed within a transaction</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.row_type">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">row_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">fields</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></span><a class="headerlink" href="#duckdb.row_type" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a struct type object from &#8216;fields&#8217;</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.rowcount">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">rowcount</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">int</span></span></span><a class="headerlink" href="#duckdb.rowcount" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Get result set row count</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.set_default_connection">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">set_default_connection</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.set_default_connection" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Register the provided connection as the default to be used by the module</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.sql">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">sql</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">query</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">alias</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">params</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.sql" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.sqltype">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">sqltype</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">type_str</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></span><a class="headerlink" href="#duckdb.sqltype" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a type object by parsing the &#8216;type_str&#8217; string</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.string_type">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">string_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">collation</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">''</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></span><a class="headerlink" href="#duckdb.string_type" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a string type with an optional collation</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.struct_type">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">struct_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">fields</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></span><a class="headerlink" href="#duckdb.struct_type" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a struct type object from &#8216;fields&#8217;</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.table">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">table</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">table_name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.table" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a relation object for the named table</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.table_function">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">table_function</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">parameters</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.table_function" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a relation object from the named table function with given parameters</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.tf">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">tf</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">dict</span></span></span><a class="headerlink" href="#duckdb.tf" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch a result as dict of TensorFlow Tensors following execute()</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.to_arrow_reader">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">to_arrow_reader</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">batch_size</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1000000</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference external" href="https://arrow.apache.org/docs/9.0/python/generated/pyarrow.RecordBatchReader.html#pyarrow.RecordBatchReader" title="(in Apache Arrow v9.0.0)"><span class="pre">pyarrow.lib.RecordBatchReader</span></a></span></span><a class="headerlink" href="#duckdb.to_arrow_reader" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch an Arrow RecordBatchReader following execute()</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.to_arrow_table">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">to_arrow_table</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">batch_size</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">SupportsInt</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1000000</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference external" href="https://arrow.apache.org/docs/9.0/python/generated/pyarrow.Table.html#pyarrow.Table" title="(in Apache Arrow v9.0.0)"><span class="pre">pyarrow.lib.Table</span></a></span></span><a class="headerlink" href="#duckdb.to_arrow_table" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch a result as Arrow table following execute()</p>
</dd>
</dl>

<dl class="py class">
<dt class="sig sig-object py" id="duckdb.token_type">
<span class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></span><span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">token_type</span></span><a class="headerlink" href="#duckdb.token_type" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">pybind11_object</span></code></p>
<p>Members:</p>
<p>identifier</p>
<p>numeric_const</p>
<p>string_const</p>
<p>operator</p>
<p>keyword</p>
<p>comment</p>
<dl class="py property">
<dt class="sig sig-object py">
<span class="sig-name descname"><span class="pre">token_type.name</span> <span class="pre">-&gt;</span> <span class="pre">str</span></span>
</dt>
<dd></dd>
</dl>

</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.tokenize">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">tokenize</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">query</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">list</span></span></span><a class="headerlink" href="#duckdb.tokenize" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Tokenizes a SQL string, returning a list of (position, type) tuples that can be used for e.g., syntax highlighting</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.torch">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">torch</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">dict</span></span></span><a class="headerlink" href="#duckdb.torch" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Fetch a result as dict of PyTorch Tensors following execute()</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.type">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">type_str</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></span><a class="headerlink" href="#duckdb.type" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a type object by parsing the &#8216;type_str&#8217; string</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.union_type">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">union_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">members</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">_duckdb._sqltypes.DuckDBPyType</span></span></span><a class="headerlink" href="#duckdb.union_type" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a union type object from &#8216;members&#8217;</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.unregister">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">unregister</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">view_name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span></span><a class="headerlink" href="#duckdb.unregister" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Unregister the view name</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.unregister_filesystem">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">unregister_filesystem</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.unregister_filesystem" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Unregister a filesystem</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.values">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">values</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.values" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a relation object from the passed values</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.version">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">version</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">str</span></span></span><a class="headerlink" href="#duckdb.version" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Human-friendly formatted version string of both the distribution package and the bundled DuckDB engine.</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.view">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">view</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">view_name</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><a class="reference internal" href="#duckdb.DuckDBPyRelation" title="_duckdb.DuckDBPyRelation"><span class="pre">_duckdb.DuckDBPyRelation</span></a></span></span><a class="headerlink" href="#duckdb.view" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Create a relation object for the named view</p>
</dd>
</dl>

<dl class="py function">
<dt class="sig sig-object py" id="duckdb.write_csv">
<span class="sig-prename descclassname"><span class="pre">duckdb.</span></span><span class="sig-name descname"><span class="pre">write_csv</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">df</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference external" href="https://pandas.pydata.org/pandas-docs/version/3.0/reference/api/pandas.DataFrame.html#pandas.DataFrame" title="(in pandas v3.0)"><span class="pre">pandas.DataFrame</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">filename</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em>, <em class="sig-param"><span class="keyword-only-separator o"><abbr title="Keyword-only parameters separator (PEP 3102)"><span class="pre">*</span></abbr></span></em>, <em class="sig-param"><span class="n"><span class="pre">sep</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">na_rep</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">header</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">quotechar</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">escapechar</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">date_format</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">timestamp_format</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">quoting</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">encoding</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">compression</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">overwrite</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">per_thread_output</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">use_tmp_file</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">partition_by</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">write_partition_columns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">object</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">connection</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#duckdb.DuckDBPyConnection" title="duckdb.DuckDBPyConnection"><span class="pre">duckdb.DuckDBPyConnection</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#8594;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span><a class="headerlink" href="#duckdb.write_csv" title="Link to this definition">&#182;</a>
</dt>
<dd>
<p>Write the relation object to a CSV file in &#8216;file_name&#8217;</p>
</dd>
</dl>




</div>
</div>
</div>

### Troubleshooting {#docs:current:clients:python:known_issues}

#### Troubleshooting {#docs:current:clients:python:known_issues::troubleshooting}

##### Running `EXPLAIN` Renders Newlines {#docs:current:clients:python:known_issues::running-explain-renders-newlines}

In Python, the output of the [`EXPLAIN` statement](#docs:current:guides:meta:explain) contains hard line breaks (` \n`):

```python
In [1]: import duckdb
   ...: duckdb.sql("EXPLAIN SELECT 42 AS x")
```

```text
Out[1]:
┌───────────────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│  explain_key  │                                                   explain_value                                                   │
│    varchar    │                                                      varchar                                                      │
├───────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ physical_plan │ ┌───────────────────────────┐\n│         PROJECTION        │\n│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │\n│             x   …  │
└───────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
```

To work around this, `print` the output of the `explain()` function:

```python
In [2]: print(duckdb.sql("SELECT 42 AS x").explain())
```

```text
Out[2]:
┌───────────────────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             x             │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         DUMMY_SCAN        │
└───────────────────────────┘
```

Please also check out the [Jupyter guide](#docs:current:guides:python:jupyter) for tips on using Jupyter with JupySQL.

##### Crashes and Errors on Windows {#docs:current:clients:python:known_issues::crashes-and-errors-on-windows}

When importing DuckDB on Windows, the Python runtime may crash or return an error upon import or first use:

```python
import duckdb

duckdb.sql("...")
```

```console
ImportError: DLL load failed while importing duckdb: The specified module could not be found.
```

```console
Windows fatal exception: access violation

Current thread 0x0000311c (most recent call first):
  File "<stdin>", line 1 in <module>
```

```console
Process finished with exit code -1073741819 (0xC0000005)
```

The problem is likely caused by using an outdated Microsoft Visual C++ (MSVC) Redistributable package.
The solution is to install the [latest MSVC Redistributable package](https://learn.microsoft.com/en-US/cpp/windows/latest-supported-vc-redist).
Alternatively, you can instruct `pip` to compile the package from source as follows:

```batch
python3 -m pip install duckdb --no-binary duckdb
```

##### Parameterized Queries in Relational API {#docs:current:clients:python:known_issues::parameterized-queries-in-relational-api}

Passing query parameters to the [`sql()`](#docs:current:clients:python:relational_api::sql), [`query()`](#docs:current:clients:python:relational_api::query), or [`from_query()`](#docs:current:clients:python:relational_api::from_query) methods has significant performance overhead.
There is currently no relation type in core that supports prepared statements, so parameterized queries are immediately materialized into an intermediate representation. This causes at least 5x processing overhead and nearly 2x memory usage compared to the non-parameterized path.

Instead, use [`execute()`](#docs:current:clients:python:dbapi::prepared-statements) for the parameterized query, then feed the result into the relational API via a [replacement scan](#docs:current:clients:python:relational_api::sql):

```python
import duckdb

conn = duckdb.connect()

# Use execute() for the parameterized query
df = conn.execute("SELECT * FROM my_table WHERE x = ?", [42]).df()

# Use a replacement scan to continue with the relational API
conn.sql("SELECT * FROM df WHERE y > 0").order("y").show()
```

#### Known Issues {#docs:current:clients:python:known_issues::known-issues}

Unfortunately there are some issues that are either beyond our control or are very elusive / hard to track down.
Below is a list of these issues that you might have to be aware of, depending on your workflow.

##### Numpy Import Multithreading {#docs:current:clients:python:known_issues::numpy-import-multithreading}

When making use of multi threading and fetching results either directly as Numpy arrays or indirectly through a Pandas DataFrame, it might be necessary to ensure that `numpy.core.multiarray` is imported.
If this module has not been imported from the main thread, and a different thread during execution attempts to import it this causes either a deadlock or a crash.

To avoid this, it's recommended to `import numpy.core.multiarray` before starting up threads.

#### `DESCRIBE` and `SUMMARIZE` Return Empty Tables in Jupyter {#docs:current:clients:python:known_issues::describe-and-summarize-return-empty-tables-in-jupyter}

The `DESCRIBE` and `SUMMARIZE` statements return an empty table:

```python
%sql
CREATE OR REPLACE TABLE tbl AS (SELECT 42 AS x);
DESCRIBE tbl;
```

To work around this, wrap them into a subquery:

```python
%sql
CREATE OR REPLACE TABLE tbl AS (SELECT 42 AS x);
FROM (DESCRIBE tbl);
```

##### Protobuf Error for JupySQL in IPython {#docs:current:clients:python:known_issues::protobuf-error-for-jupysql-in-ipython}

Loading the JupySQL extension in IPython fails:

```python
In [1]: %load_ext sql
```

```console
ImportError: cannot import name 'builder' from 'google.protobuf.internal' (unknown location)
```

The solution is to fix the `protobuf` package. This may require uninstalling conflicting packages, e.g.:

```python
%pip uninstall tensorflow
%pip install protobuf
```

## R Client {#docs:current:clients:r}

> Installation To use the DuckDB R client, visit the [R installation page](https://duckdb.org/install/index.html?environment=r).
>
> The latest stable version of the DuckDB R client is {% if site.current_duckdb_r_version != "" %}{{ site.current_duckdb_r_version }}{% else %}{{ site.lts_duckdb_r_version }}{% endif %}

#### Installation {#docs:current:clients:r::installation}

##### `duckdb`: R Client {#docs:current:clients:r::duckdb-r-client}

The DuckDB R client can be installed using the following command:

```r
install.packages("duckdb")
```

Please see the [installation page](https://duckdb.org/install) for details.

##### `duckplyr`: dplyr Client {#docs:current:clients:r::duckplyr-dplyr-client}

DuckDB offers a [dplyr](https://dplyr.tidyverse.org/)-compatible API via the `duckplyr` package. It can be installed using `install.packages("duckplyr")`. For details, see the [`duckplyr` documentation](https://tidyverse.github.io/duckplyr/).

#### Reference Manual {#docs:current:clients:r::reference-manual}

The reference manual for the DuckDB R client is available at [r.duckdb.org](https://r.duckdb.org).

#### Basic Client Usage {#docs:current:clients:r::basic-client-usage}

The standard DuckDB R client implements the [DBI interface](https://cran.r-project.org/package=DBI) for R. If you are not familiar with DBI yet, see the [Using DBI page](https://solutions.rstudio.com/db/r-packages/DBI/) for an introduction.

##### Startup & Shutdown {#docs:current:clients:r::startup--shutdown}

To use DuckDB, you must first create a connection object that represents the database. The connection object takes as parameter the database file to read and write from. If the database file does not exist, it will be created (the file extension may be `.db`, `.duckdb`, or anything else). The special value `:memory:` (the default) can be used to create an **in-memory database**. Note that for an in-memory database no data is persisted to disk (i.e., all data is lost when you exit the R process). If you would like to connect to an existing database in read-only mode, set the `read_only` flag to `TRUE`. Read-only mode is required if multiple R processes want to access the same database file at the same time.

```r
library("duckdb")
# to start an in-memory database
con <- dbConnect(duckdb())
# or
con <- dbConnect(duckdb(), dbdir = ":memory:")
# to use a database file (not shared between processes)
con <- dbConnect(duckdb(), dbdir = "my-db.duckdb", read_only = FALSE)
# to use a database file (shared between processes)
con <- dbConnect(duckdb(), dbdir = "my-db.duckdb", read_only = TRUE)
```

Connections are closed implicitly when they go out of scope or if they are explicitly closed using `dbDisconnect()`. To shut down the database instance associated with the connection, use `dbDisconnect(con, shutdown = TRUE)`

##### Querying {#docs:current:clients:r::querying}

DuckDB supports the standard DBI methods to send queries and retrieve result sets. `dbExecute()` is meant for queries where no results are expected like `CREATE TABLE` or `UPDATE` etc. and `dbGetQuery()` is meant to be used for queries that produce results (e.g., `SELECT`). Below is an example.

```r
# create a table
dbExecute(con, "CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)")
# insert two items into the table
dbExecute(con, "INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)")

# retrieve the items again
res <- dbGetQuery(con, "SELECT * FROM items")
print(res)
#     item value count
# 1  jeans  20.0     1
# 2 hammer  42.2     2
```

DuckDB also supports prepared statements in the R client with the `dbExecute` and `dbGetQuery` methods. Here is an example:

```r
# prepared statement parameters are given as a list
dbExecute(con, "INSERT INTO items VALUES (?, ?, ?)", list('laptop', 2000, 1))

# if you want to reuse a prepared statement multiple times, use dbSendStatement() and dbBind()
stmt <- dbSendStatement(con, "INSERT INTO items VALUES (?, ?, ?)")
dbBind(stmt, list('iphone', 300, 2))
dbBind(stmt, list('android', 3.5, 1))
dbClearResult(stmt)

# query the database using a prepared statement
res <- dbGetQuery(con, "SELECT item FROM items WHERE value > ?", list(400))
print(res)
#       item
# 1 laptop
```

> **Warning.** Do **not** use prepared statements to insert large amounts of data into DuckDB. See below for better options.

#### Efficient Transfer {#docs:current:clients:r::efficient-transfer}

To write a R data frame into DuckDB, use the standard DBI function `dbWriteTable()`. This creates a table in DuckDB and populates it with the data frame contents. For example:

```r
dbWriteTable(con, "iris_table", iris)
res <- dbGetQuery(con, "SELECT * FROM iris_table LIMIT 1")
print(res)
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
```

It is also possible to “register” a R data frame as a virtual table, comparable to a SQL `VIEW`. This *does not actually transfer data* into DuckDB yet. Below is an example:

```r
duckdb_register(con, "iris_view", iris)
res <- dbGetQuery(con, "SELECT * FROM iris_view LIMIT 1")
print(res)
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
```

> DuckDB keeps a reference to the R data frame after registration. This prevents the data frame from being garbage-collected. The reference is cleared when the connection is closed, but can also be cleared manually using the `duckdb_unregister()` method.

Also refer to the [data import documentation](#docs:current:data:overview) for more options of efficiently importing data.

#### dbplyr {#docs:current:clients:r::dbplyr}

DuckDB also plays well with the [dbplyr](https://CRAN.R-project.org/package=dbplyr) / [dplyr](https://dplyr.tidyverse.org) packages for programmatic query construction from R. Here is an example:

```r
library("duckdb")
library("dplyr")
con <- dbConnect(duckdb())
duckdb_register(con, "flights", nycflights13::flights)

tbl(con, "flights") |>
  group_by(dest) |>
  summarise(delay = mean(dep_time, na.rm = TRUE)) |>
  collect()
```

When using dbplyr, CSV and Parquet files can be read using the `dplyr::tbl` function.

```r
# Establish a CSV for the sake of this example
write.csv(mtcars, "mtcars.csv")

# Summarize the dataset in DuckDB to avoid reading the entire CSV into R's memory
tbl(con, "mtcars.csv") |>
  group_by(cyl) |>
  summarise(across(disp:wt, .fns = mean)) |>
  collect()
```

```r
# Establish a set of Parquet files
dbExecute(con, "COPY flights TO 'dataset' (FORMAT parquet, PARTITION_BY (year, month))")

# Summarize the dataset in DuckDB to avoid reading 12 Parquet files into R's memory
tbl(con, "read_parquet('dataset/**/*.parquet', hive_partitioning = true)") |>
  filter(month == "3") |>
  summarise(delay = mean(dep_time, na.rm = TRUE)) |>
  collect()
```

#### Memory Limit {#docs:current:clients:r::memory-limit}

You can use the [`memory_limit` configuration option](#docs:current:configuration:pragmas) to limit the memory use of DuckDB, e.g.:

```sql
SET memory_limit = '2GB';
```

Note that this limit is only applied to the memory DuckDB uses and it does not affect the memory use of other R libraries.
Therefore, the total memory used by the R process may be higher than the configured `memory_limit`.

#### Troubleshooting {#docs:current:clients:r::troubleshooting}

##### Warning When Installing on macOS {#docs:current:clients:r::warning-when-installing-on-macos}

On macOS, installing DuckDB may result in a warning `unable to load shared object '.../R_X11.so'`:

```console
Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
  unable to load shared object '/Library/Frameworks/R.framework/Resources/modules//R_X11.so':
  dlopen(/Library/Frameworks/R.framework/Resources/modules//R_X11.so, 0x0006): Library not loaded: /opt/X11/lib/libSM.6.dylib
  Referenced from: <31EADEB5-0A17-3546-9944-9B3747071FE8> /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/modules/R_X11.so
  Reason: tried: '/opt/X11/lib/libSM.6.dylib' (no such file) ...
> ')
```

Note that this is just a warning, so the simplest solution is to ignore it. Alternatively, you can install DuckDB from the [R-universe](https://r-universe.dev/search):

```R
install.packages("duckdb", repos = c("https://duckdb.r-universe.dev", "https://cloud.r-project.org"))
```

You may also install the optional [`xquartz` dependency via Homebrew](https://formulae.brew.sh/cask/xquartz).

## Rust Client {#docs:current:clients:rust}

> Installation To use the DuckDB Rust client, visit the [Rust installation page](https://duckdb.org/install/index.html?environment=rust).
>
> The latest stable version of the DuckDB Rust client is {% if site.current_duckdb_rust_version != "" %}{{ site.current_duckdb_rust_version }}{% else %}{{ site.lts_duckdb_rust_version }}{% endif %}.

#### Installation {#docs:current:clients:rust::installation}

The DuckDB Rust client can be installed from [crates.io](https://crates.io/crates/duckdb). Please see the [docs.rs](http://docs.rs/duckdb) for details.

#### Basic API Usage {#docs:current:clients:rust::basic-api-usage}

duckdb-rs is an ergonomic wrapper based on the [DuckDB C API](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb.h), please refer to the [README](https://github.com/duckdb/duckdb-rs) for details.

##### Startup & Shutdown {#docs:current:clients:rust::startup--shutdown}

To use duckdb, you must first initialize a `Connection` handle using `Connection::open()`. `Connection::open()` takes as parameter the database file to read and write from. If the database file does not exist, it will be created (the file extension may be `.db`, `.duckdb`, or anything else). You can also use `Connection::open_in_memory()` to create an **in-memory database**. Note that for an in-memory database no data is persisted to disk (i.e., all data is lost when you exit the process).

```rust
use duckdb::{params, Connection, Result};
let conn = Connection::open_in_memory()?;
```

The `Connection` will automatically close the underlying db connection for you when it goes out of scope (via `Drop`). You can also explicitly close the `Connection` with `conn.close()`. There is not much difference between these in the typical case, but in case there is an error, you'll have the chance to handle it with the explicit close.

##### Querying {#docs:current:clients:rust::querying}

SQL queries can be sent to DuckDB using the `execute()` method of connections, or we can also prepare the statement and then query on that.

```rust
#[derive(Debug)]
struct Person {
    id: i32,
    name: String,
    data: Option<Vec<u8>>,
}

conn.execute(
    "INSERT INTO person (name, data) VALUES (?, ?)",
    params![me.name, me.data],
)?;

let mut stmt = conn.prepare("SELECT id, name, data FROM person")?;
let person_iter = stmt.query_map([], |row| {
    Ok(Person {
        id: row.get(0)?,
        name: row.get(1)?,
        data: row.get(2)?,
    })
})?;

for person in person_iter {
    println!("Found person {:?}", person.unwrap());
}
```

#### Appender {#docs:current:clients:rust::appender}

The Rust client supports the [DuckDB Appender API](#docs:current:data:appender) for bulk inserts. For example:

```rust
fn insert_rows(conn: &Connection) -> Result<()> {
    let mut app = conn.appender("foo")?;
    app.append_rows([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])?;
    Ok(())
}
```

## Wasm {#clients:wasm}

### DuckDB Wasm {#docs:current:clients:wasm:overview}

> Installation To use the DuckDB Wasm client, visit the [`duckdb-wasm` GitHub repository](https://github.com/duckdb/duckdb-wasm#readme).
>
> The latest stable version of the DuckDB WebAssembly client is {% if site.current_duckdb_wasm_version != "" %}{{ site.current_duckdb_wasm_version }}{% else %}{{ site.lts_duckdb_wasm_version }}{% endif %}.

DuckDB has been compiled to WebAssembly, so it can run inside any browser on any device.




DuckDB-Wasm offers a layered API, it can be embedded as a [JavaScript + WebAssembly library](https://www.npmjs.com/package/@duckdb/duckdb-wasm), as a [Web shell](https://www.npmjs.com/package/@duckdb/duckdb-wasm-shell), or [built from source](https://github.com/duckdb/duckdb-wasm) according to your needs.

#### Getting Started with DuckDB-Wasm {#docs:current:clients:wasm:overview::getting-started-with-duckdb-wasm}

A great starting point is to read the [DuckDB-Wasm launch blog post](https://duckdb.org/2021/10/29/duckdb-wasm)!

Another great resource is the [GitHub repository](https://github.com/duckdb/duckdb-wasm).

For details, see the full [DuckDB-Wasm API Documentation](https://shell.duckdb.org/docs/modules/index.html).

#### Limitations {#docs:current:clients:wasm:overview::limitations}

* By default, the WebAssembly client only uses a single thread.
* The WebAssembly client has a limited amount of memory available. [WebAssembly limits the amount of available memory to 4 GB](https://v8.dev/blog/4gb-wasm-memory) and browsers may impose even stricter limits.

### Deploying DuckDB-Wasm {#docs:current:clients:wasm:deploying_duckdb_wasm}

A DuckDB-Wasm deployment needs to access the following components:

* the DuckDB-Wasm main library component, distributed as TypeScript and compiled to JavaScript code
* the DuckDB-Wasm Worker component, compiled to JavaScript code, possibly instantiated multiple times for threaded environments
* the DuckDB-Wasm module, compiled as a WebAssembly file and instantiated by the browser
* any relevant DuckDB-Wasm extension

#### Main Library Component {#docs:current:clients:wasm:deploying_duckdb_wasm::main-library-component}

This is distributed as either TypeScript code or CommonJS JavaScript code in the `npm` duckdb-wasm package, and can be either bundled together with a given application, served in a same origin (sub-)domain and included at runtime or served from a third party CDN like jsDelivr.
This does need some form of transpilation and can't be served as-is, given it needs to know the location of the follow up files for this to be functional.
Details will depend on your given setup, examples can be found at <https://github.com/duckdb/duckdb-wasm/tree/main/examples>.
An example deployment is <https://shell.duckdb.org>, which transpiles the main library component together with shell code (first approach). Or the `bare-browser` example at <https://github.com/duckdb/duckdb-wasm/tree/main/examples/bare-browser>.

#### JS Worker Component {#docs:current:clients:wasm:deploying_duckdb_wasm::js-worker-component}

This is distributed as a JavaScript file in 3 different flavors, `mvp`, `eh` and `threads`, and needs to be served as is. The main library components need to be informed of the actual location.

There are 3 variants for 3 different `platforms`:

* `mvp` targets WebAssembly 1.0 spec
* `eh` targets WebAssembly 1.0 spec WITH Wasm-level exceptions handling added, which improves performance
* `threads` targets WebAssembly spec WITH exception and threading constructs

You could serve all 3, and feature detect, or serve a single variant and instruct duckdb-wasm library on which one to use

#### Wasm Worker Component {#docs:current:clients:wasm:deploying_duckdb_wasm::wasm-worker-component}

Same as the JS Worker component, 3 different flavors, `mvp`, `eh` and `threads`, each one is needed by the relevant JS component. These WebAssembly modules need to be served as-is at an arbitrary [sub-] domain that is reachable from the main one.

#### DuckDB Extensions {#docs:current:clients:wasm:deploying_duckdb_wasm::duckdb-extensions}

DuckDB extensions for DuckDB-Wasm, similar for the native cases, are served signed at the default extension endpoint: `https://extensions.duckdb.org`.
If you are deploying duckdb-wasm you can consider mirroring relevant extensions at a different endpoint, possibly allowing for air-tight deployments on internal networks.

```sql
SET custom_extension_repository = '⟨https://some.endpoint.org/path/to/repository⟩';
```

Changes the default extension repository from the public `https://extensions.duckdb.org` to the one specified. Note that extensions are still signed, so the best path is downloading and serving the extensions with a similar structure to the original repository. See additional notes at <https://duckdb.org/docs/lts/extensions/extension_distribution#creating-a-custom-repository>.


Community extensions are served at <https://community-extensions.duckdb.org>, and they are signed with a different key, so they can be disabled with a one way SQL statement such as:

```sql
SET allow_community_extensions = false;
```

This will allow loading **only** of core duckdb extensions. Note that the failure is at `LOAD` time, not at `INSTALL` time.

Please review <https://duckdb.org/docs/lts/extensions/extension_distribution> for general information about extensions.


#### Security Considerations {#docs:current:clients:wasm:deploying_duckdb_wasm::security-considerations}

> **Warning.** Deploying DuckDB-Wasm with access to your own data means whoever has access to SQL can access the data that DuckDB-Wasm can access. Also, DuckDB-Wasm in the default setting can access remote endpoints, so it can have a visible effect on the external world even from within the sandbox.

### Instantiation {#docs:current:clients:wasm:instantiation}

DuckDB-Wasm has multiple ways to be instantiated depending on the use case.

#### `cdn(jsdelivr)` {#docs:current:clients:wasm:instantiation::cdnjsdelivr}

```ts
import * as duckdb from '@duckdb/duckdb-wasm';

const JSDELIVR_BUNDLES = duckdb.getJsDelivrBundles();

// Select a bundle based on browser checks
const bundle = await duckdb.selectBundle(JSDELIVR_BUNDLES);

const worker_url = URL.createObjectURL(
  new Blob([`importScripts("${bundle.mainWorker}");`], {type: 'text/javascript'})
);

// Instantiate the asynchronous version of DuckDB-Wasm
const worker = new Worker(worker_url);
const logger = new duckdb.ConsoleLogger();
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule, bundle.pthreadWorker);
URL.revokeObjectURL(worker_url);
```

#### `webpack` {#docs:current:clients:wasm:instantiation::webpack}

```ts
import * as duckdb from '@duckdb/duckdb-wasm';
import duckdb_wasm from '@duckdb/duckdb-wasm/dist/duckdb-mvp.wasm';
import duckdb_wasm_next from '@duckdb/duckdb-wasm/dist/duckdb-eh.wasm';
const MANUAL_BUNDLES: duckdb.DuckDBBundles = {
    mvp: {
        mainModule: duckdb_wasm,
        mainWorker: new URL('@duckdb/duckdb-wasm/dist/duckdb-browser-mvp.worker.js', import.meta.url).toString(),
    },
    eh: {
        mainModule: duckdb_wasm_next,
        mainWorker: new URL('@duckdb/duckdb-wasm/dist/duckdb-browser-eh.worker.js', import.meta.url).toString(),
    },
};
// Select a bundle based on browser checks
const bundle = await duckdb.selectBundle(MANUAL_BUNDLES);
// Instantiate the asynchronous version of DuckDB-Wasm
const worker = new Worker(bundle.mainWorker!);
const logger = new duckdb.ConsoleLogger();
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule, bundle.pthreadWorker);
```

#### `vite` {#docs:current:clients:wasm:instantiation::vite}

```ts
import * as duckdb from '@duckdb/duckdb-wasm';
import duckdb_wasm from '@duckdb/duckdb-wasm/dist/duckdb-mvp.wasm?url';
import mvp_worker from '@duckdb/duckdb-wasm/dist/duckdb-browser-mvp.worker.js?url';
import duckdb_wasm_eh from '@duckdb/duckdb-wasm/dist/duckdb-eh.wasm?url';
import eh_worker from '@duckdb/duckdb-wasm/dist/duckdb-browser-eh.worker.js?url';

const MANUAL_BUNDLES: duckdb.DuckDBBundles = {
    mvp: {
        mainModule: duckdb_wasm,
        mainWorker: mvp_worker,
    },
    eh: {
        mainModule: duckdb_wasm_eh,
        mainWorker: eh_worker,
    },
};
// Select a bundle based on browser checks
const bundle = await duckdb.selectBundle(MANUAL_BUNDLES);
// Instantiate the asynchronous version of DuckDB-wasm
const worker = new Worker(bundle.mainWorker!);
const logger = new duckdb.ConsoleLogger();
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule, bundle.pthreadWorker);
```

#### Statically Served {#docs:current:clients:wasm:instantiation::statically-served}

It is possible to manually download the files from <https://cdn.jsdelivr.net/npm/@duckdb/duckdb-wasm/dist/>.

```ts
import * as duckdb from '@duckdb/duckdb-wasm';

const MANUAL_BUNDLES: duckdb.DuckDBBundles = {
    mvp: {
        mainModule: 'change/me/../duckdb-mvp.wasm',
        mainWorker: 'change/me/../duckdb-browser-mvp.worker.js',
    },
    eh: {
        mainModule: 'change/m/../duckdb-eh.wasm',
        mainWorker: 'change/m/../duckdb-browser-eh.worker.js',
    },
};
// Select a bundle based on browser checks
const bundle = await duckdb.selectBundle(MANUAL_BUNDLES);
// Instantiate the asynchronous version of DuckDB-Wasm
const worker = new Worker(bundle.mainWorker!);
const logger = new duckdb.ConsoleLogger();
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule, bundle.pthreadWorker);
```

### Data Ingestion {#docs:current:clients:wasm:data_ingestion}

DuckDB-Wasm has multiple ways to import data, depending on the format of the data.

There are two steps to import data into DuckDB.

First, the data file is imported into a local file system using register functions ([registerEmptyFileBuffer](https://shell.duckdb.org/docs/classes/index.AsyncDuckDB.html#registerEmptyFileBuffer), [registerFileBuffer](https://shell.duckdb.org/docs/classes/index.AsyncDuckDB.html#registerFileBuffer), [registerFileHandle](https://shell.duckdb.org/docs/classes/index.AsyncDuckDB.html#registerFileHandle), [registerFileText](https://shell.duckdb.org/docs/classes/index.AsyncDuckDB.html#registerFileText), [registerFileURL](https://shell.duckdb.org/docs/classes/index.AsyncDuckDB.html#registerFileURL)).

Then, the data file is imported into DuckDB using insert functions ([insertArrowFromIPCStream](https://shell.duckdb.org/docs/classes/index.AsyncDuckDBConnection.html#insertArrowFromIPCStream), [insertArrowTable](https://shell.duckdb.org/docs/classes/index.AsyncDuckDBConnection.html#insertArrowTable), [insertCSVFromPath](https://shell.duckdb.org/docs/classes/index.AsyncDuckDBConnection.html#insertCSVFromPath), [insertJSONFromPath](https://shell.duckdb.org/docs/classes/index.AsyncDuckDBConnection.html#insertJSONFromPath)) or directly using FROM SQL query (using extensions like Parquet or [Wasm-flavored httpfs](#::httpfs-wasm-flavored)).

[Insert statements](#docs:current:data:insert) can also be used to import data.

#### Data Import {#docs:current:clients:wasm:data_ingestion::data-import}

##### Open & Close Connection {#docs:current:clients:wasm:data_ingestion::open--close-connection}

```ts
// Create a new connection
const c = await db.connect();

// ... import data

// Close the connection to release memory
await c.close();
```

##### Apache Arrow {#docs:current:clients:wasm:data_ingestion::apache-arrow}

```ts
// Data can be inserted from an existing arrow.Table
// More Example https://arrow.apache.org/docs/js/
import { tableFromArrays } from 'apache-arrow';

// EOS signal according to Arrow IPC streaming format
// See https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format
const EOS = new Uint8Array([255, 255, 255, 255, 0, 0, 0, 0]);

const arrowTable = tableFromArrays({
  id: [1, 2, 3],
  name: ['John', 'Jane', 'Jack'],
  age: [20, 21, 22],
});

await c.insertArrowTable(arrowTable, { name: 'arrow_table' });
// Write EOS
await c.insertArrowTable(EOS, { name: 'arrow_table' });

// ..., from a raw Arrow IPC stream
const streamResponse = await fetch(` someapi`);
const streamReader = streamResponse.body.getReader();
const streamInserts = [];
while (true) {
    const { value, done } = await streamReader.read();
    if (done) break;
    streamInserts.push(c.insertArrowFromIPCStream(value, { name: 'streamed' }));
}

// Write EOS
streamInserts.push(c.insertArrowFromIPCStream(EOS, { name: 'streamed' }));

await Promise.all(streamInserts);
```

##### CSV {#docs:current:clients:wasm:data_ingestion::csv}

```ts
// ..., from CSV files
// (interchangeable: registerFile{Text,Buffer,URL,Handle})
const csvContent = '1|foo\n2|bar\n';
await db.registerFileText(` data.csv`, csvContent);
// ... with typed insert options
await c.insertCSVFromPath('data.csv', {
    schema: 'main',
    name: 'foo',
    detect: false,
    header: false,
    delimiter: '|',
    columns: {
        col1: new arrow.Int32(),
        col2: new arrow.Utf8(),
    },
});
```

##### JSON {#docs:current:clients:wasm:data_ingestion::json}

```ts
// ..., from JSON documents in row-major format
const jsonRowContent = [
    { "col1": 1, "col2": "foo" },
    { "col1": 2, "col2": "bar" },
];
await db.registerFileText(
    'rows.json',
    JSON.stringify(jsonRowContent),
);
await c.insertJSONFromPath('rows.json', { name: 'rows' });

// ... or column-major format
const jsonColContent = {
    "col1": [1, 2],
    "col2": ["foo", "bar"]
};
await db.registerFileText(
    'columns.json',
    JSON.stringify(jsonColContent),
);
await c.insertJSONFromPath('columns.json', { name: 'columns' });

// From API
const streamResponse = await fetch(` someapi/content.json`);
await db.registerFileBuffer('file.json', new Uint8Array(await streamResponse.arrayBuffer()))
await c.insertJSONFromPath('file.json', { name: 'JSONContent' });
```

##### Parquet {#docs:current:clients:wasm:data_ingestion::parquet}

```ts
// from Parquet files
// ...Local
const pickedFile: File = letUserPickFile();
await db.registerFileHandle('local.parquet', pickedFile, DuckDBDataProtocol.BROWSER_FILEREADER, true);
// ...Remote
await db.registerFileURL('remote.parquet', 'https://origin/remote.parquet', DuckDBDataProtocol.HTTP, false);
// ... Using Fetch
const res = await fetch('https://origin/remote.parquet');
await db.registerFileBuffer('buffer.parquet', new Uint8Array(await res.arrayBuffer()));

// ..., by specifying URLs in the SQL text
await c.query(` 
    CREATE TABLE direct AS
        SELECT * FROM 'https://origin/remote.parquet'
`);
// ..., or by executing raw insert statements
await c.query(` 
    INSERT INTO existing_table
    VALUES (1, 'foo'), (2, 'bar')`);
```

##### httpfs (Wasm-Flavored) {#docs:current:clients:wasm:data_ingestion::httpfs-wasm-flavored}

```ts
// ..., by specifying URLs in the SQL text
await c.query(` 
    CREATE TABLE direct AS
        SELECT * FROM 'https://origin/remote.parquet'
`);
```

> **Tip.** If you encounter a Network Error (` Failed to execute 'send' on 'XMLHttpRequest'`) when you try to query files from S3, configure the S3 permission CORS header. For example:

```json
[
    {
        "AllowedHeaders": [
            "*"
        ],
        "AllowedMethods": [
            "GET",
            "HEAD"
        ],
        "AllowedOrigins": [
            "*"
        ],
        "ExposeHeaders": [],
        "MaxAgeSeconds": 3000
    }
]
```

##### Insert Statement {#docs:current:clients:wasm:data_ingestion::insert-statement}

```ts
// ..., or by executing raw insert statements
await c.query(` 
    INSERT INTO existing_table
    VALUES (1, 'foo'), (2, 'bar')`);
```

### Query {#docs:current:clients:wasm:query}

DuckDB-Wasm provides functions for querying data. Queries are run sequentially.

First, a connection needs to be created by calling [connect](https://shell.duckdb.org/docs/classes/index.AsyncDuckDB.html#connect). Then, queries can be run by calling [query](https://shell.duckdb.org/docs/classes/index.AsyncDuckDBConnection.html#query) or [send](https://shell.duckdb.org/docs/classes/index.AsyncDuckDBConnection.html#send).

#### Query Execution {#docs:current:clients:wasm:query::query-execution}

```ts
// Create a new connection
const conn = await db.connect();

// Either materialize the query result
await conn.query<{ v: arrow.Int }>(` 
    SELECT * FROM generate_series(1, 100) t(v)
`);
// ..., or fetch the result chunks lazily
for await (const batch of await conn.send<{ v: arrow.Int }>(` 
    SELECT * FROM generate_series(1, 100) t(v)
`)) {
    // ...
}

// Close the connection to release memory
await conn.close();
```

#### Prepared Statements {#docs:current:clients:wasm:query::prepared-statements}

```ts
// Create a new connection
const conn = await db.connect();
// Prepare query
const stmt = await conn.prepare(` SELECT v + ? FROM generate_series(0, 10_000) t(v);`);
// ... and run the query with materialized results
await stmt.query(234);
// ... or result chunks
for await (const batch of await stmt.send(234)) {
    // ...
}
// Close the statement to release memory
await stmt.close();
// Closing the connection will release statements as well
await conn.close();
```

#### Arrow Table to JSON {#docs:current:clients:wasm:query::arrow-table-to-json}

```ts
// Create a new connection
const conn = await db.connect();

// Query
const arrowResult = await conn.query<{ v: arrow.Int }>(` 
    SELECT * FROM generate_series(1, 100) t(v)
`);

// Convert arrow table to json
const result = arrowResult.toArray().map((row) => row.toJSON());

// Close the connection to release memory
await conn.close();
```

#### Export Parquet {#docs:current:clients:wasm:query::export-parquet}

```ts
// Create a new connection
const conn = await db.connect();

// Export Parquet
conn.send(` COPY (SELECT * FROM tbl) TO 'result-snappy.parquet' (FORMAT parquet);`);
const parquet_buffer = await this._db.copyFileToBuffer('result-snappy.parquet');

// Generate a download link
const link = URL.createObjectURL(new Blob([parquet_buffer]));

// Close the connection to release memory
await conn.close();
```

### Extensions {#docs:current:clients:wasm:extensions}

DuckDB-Wasm's (dynamic) extension loading is modeled after the regular DuckDB's extension loading, with a few relevant differences due to the difference in platform.

#### Format {#docs:current:clients:wasm:extensions::format}

Extensions in DuckDB are binaries to be dynamically loaded via `dlopen`. A cryptographical signature is appended to the binary.
An extension in DuckDB-Wasm is a regular Wasm file to be dynamically loaded via Emscripten's `dlopen`. A cryptographical signature is appended to the Wasm file as a WebAssembly custom section called `duckdb_signature`.
This ensures the file remains a valid WebAssembly file.

> Currently, we require this custom section to be the last one, but this can be potentially relaxed in the future.

#### `INSTALL` and `LOAD` {#docs:current:clients:wasm:extensions::install-and-load}

The `INSTALL` semantic in native embeddings of DuckDB is to fetch, decompress from `gzip` and store data in local disk.
The `LOAD` semantic in native embeddings of DuckDB is to (optionally) perform signature checks *and* dynamic load the binary with the main DuckDB binary.

In DuckDB-Wasm, `INSTALL` is a no-op given there is no durable cross-session storage. The `LOAD` operation will fetch (and decompress on the fly), perform signature checks *and* dynamically load via the Emscripten implementation of `dlopen`.

#### Autoloading {#docs:current:clients:wasm:extensions::autoloading}

[Autoloading](#docs:current:extensions:overview), i.e., the possibility for DuckDB to add extension functionality on-the-fly, is enabled by default in DuckDB-Wasm.

#### List of Officially Available Extensions {#docs:current:clients:wasm:extensions::list-of-officially-available-extensions}

| Extension name                                                          | Description                                                      | Aliases         |
| ----------------------------------------------------------------------- | ---------------------------------------------------------------- | --------------- |
| [autocomplete](#docs:current:core_extensions:autocomplete) | Adds support for autocomplete in the shell                       |                 |
| [excel](#docs:current:core_extensions:excel)               | Adds support for Excel-like format strings                       |                 |
| [fts](#docs:current:core_extensions:full_text_search)      | Adds support for Full-Text Search Indexes                        |                 |
| [icu](#docs:current:core_extensions:icu)                   | Adds support for time zones and collations using the ICU library |                 |
| [inet](#docs:current:core_extensions:inet)                 | Adds support for IP-related data types and functions             |                 |
| [json](#docs:current:data:json:overview)                   | Adds support for JSON operations                                 |                 |
| [parquet](#docs:current:data:parquet:overview)             | Adds support for reading and writing Parquet files               |                 |
| [sqlite](#docs:current:core_extensions:sqlite)             | Adds support for reading SQLite database files                   | sqlite, sqlite3 |
| [sqlsmith](#docs:current:core_extensions:sqlsmith)         |                                                                  |                 |
| [tpcds](#docs:current:core_extensions:tpcds)               | Adds TPC-DS data generation and query support                    |                 |
| [tpch](#docs:current:core_extensions:tpch)                 | Adds TPC-H data generation and query support                     |                 |

WebAssembly is basically an additional platform, and there might be platform-specific limitations that make some extensions not able to match their native capabilities or to perform them in a different way. We will document here relevant differences for DuckDB-hosted extensions.

##### HTTPFS {#docs:current:clients:wasm:extensions::httpfs}

The HTTPFS extension is, at the moment, not available in DuckDB-Wasm. Https protocol capabilities need to go through an additional layer, the browser, which adds both differences and some restrictions to what is doable from native.

Instead, DuckDB-Wasm has a separate implementation that for most purposes is interchangeable, but does not support all use cases (as it must follow security rules imposed by the browser, such as CORS).
Due to this CORS restriction, any requests for data made using the HTTPFS extension must be to websites that allow (using CORS headers) the website hosting the DuckDB-Wasm instance to access that data.
The [MDN website](https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS) is a great resource for more information regarding CORS.

#### Extension Signing {#docs:current:clients:wasm:extensions::extension-signing}

As with regular DuckDB extensions, DuckDB-Wasm extensions are by default checked on `LOAD` to verify the signature and confirm the extension has not been tampered with.
Extension signature verification can be disabled via a configuration option.
Signing is a property of the binary itself, so copying a DuckDB extension (say to serve it from a different location) will still keep a valid signature (e.g., for local development).

#### Fetching DuckDB-Wasm Extensions {#docs:current:clients:wasm:extensions::fetching-duckdb-wasm-extensions}

Official DuckDB extensions are served at `extensions.duckdb.org`, and this is also the default value for the `default_extension_repository` option.
When installing extensions, a relevant URL will be built that will look like `extensions.duckdb.org/$duckdb_version_hash/$duckdb_platform/$name.duckdb_extension.gz`.

DuckDB-Wasm extension are fetched only on load, and the URL will look like: `extensions.duckdb.org/duckdb-wasm/$duckdb_version_hash/$duckdb_platform/$name.duckdb_extension.wasm`.

Note that an additional `duckdb-wasm` is added to the folder structure, and the file is served as a `.wasm` file.

DuckDB-Wasm extensions are served pre-compressed using Brotli compression. While fetched from a browser, extensions will be transparently uncompressed. If you want to fetch the `duckdb-wasm` extension manually, you can use `curl --compress extensions.duckdb.org/<...>/icu.duckdb_extension.wasm`.

#### Serving Extensions from a Third-Party Repository {#docs:current:clients:wasm:extensions::serving-extensions-from-a-third-party-repository}

As with regular DuckDB, if you use `SET custom_extension_repository = 'https://some.url.com'`, subsequent loads will be attempted at `https://some.url.com/duckdb-wasm/$duckdb_version_hash/$duckdb_platform/$name.duckdb_extension.wasm`.

Note that GET requests on the extensions needs to be [CORS enabled](https://www.w3.org/wiki/CORS_Enabled) for a browser to allow the connection.

#### Tooling {#docs:current:clients:wasm:extensions::tooling}

Both DuckDB-Wasm and its extensions have been compiled using the latest packaged Emscripten toolchain.




## Tertiary Clients {#clients:tertiary_clients}

### Tertiary Clients {#docs:current:clients:tertiary_clients:overview}

The following table lists the **tertiary clients** of DuckDB.
Tertiary clients come without any support guarantees.


| Client API                                                         | Maintainer                                            |
| ------------------------------------------------------------------ | ----------------------------------------------------- |
| [Common Lisp](https://github.com/ak-coram/cl-duckdb)               | [ak-coram](https://github.com/ak-coram)               |
| [Crystal](https://github.com/amauryt/crystal-duckdb)               | [amauryt](https://github.com/amauryt)                 |
| [Dart](#docs:current:clients:tertiary_clients:dart)   | [TigerEye](https://www.tigereye.com/)                 |
| [Elixir](https://github.com/AlexR2D2/duckdbex)                     | [AlexR2D2](https://github.com/AlexR2D2/duckdbex)      |
| [Erlang](https://github.com/mmzeeman/educkdb)                      | [MM Zeeman](https://github.com/mmzeeman)              |
| [Haskell](https://github.com/tritlo/duckdb-haskell)                | [Tritlo](https://github.com/tritlo)                   |
| [Julia](#docs:current:clients:tertiary_clients:julia) | The DuckDB team                                       |
| [Perl](https://metacpan.org/pod/DBD::DuckDB)                       | [Giuseppe Di Terlizzi](https://github.com/giterlizzi) |
| [PHP](#docs:current:clients:tertiary_clients:php)     | [satur-io](https://github.com/satur-io/duckdb-php)    |
| [Pyodide](https://github.com/duckdb/duckdb-pyodide)                | The DuckDB team                                       |
| [Raku](https://raku.land/zef:bduggan/Duckie)                       | [bduggan](https://github.com/bduggan)                 |
| [Ruby](https://suketa.github.io/ruby-duckdb/)                      | [suketa](https://github.com/suketa)                   |
| [Scala](https://www.duck4s.com/docs/index.html)                    | [Salar Rahmanian](https://www.softinio.com)           |
| [Swift](#docs:current:clients:tertiary_clients:swift) | The DuckDB team                                       |
| [Zig](https://github.com/karlseguin/zuckdb.zig)                    | [karlseguin](https://github.com/karlseguin)           |

### Dart Client {#docs:current:clients:tertiary_clients:dart}

> The latest stable version of the DuckDB Dart client is {% if site.current_duckdb_dart_version != "" %}{{ site.current_duckdb_dart_version }}{% else %}{{ site.lts_duckdb_dart_version }}{% endif %}.

DuckDB.Dart is the native Dart API for DuckDB.

#### Installation {#docs:current:clients:tertiary_clients:dart::installation}

DuckDB.Dart can be installed from [pub.dev](https://pub.dev/packages/dart_duckdb). Please see the [API Reference](https://pub.dev/documentation/dart_duckdb/latest/) for details.

##### Use This Package as a Library {#docs:current:clients:tertiary_clients:dart::use-this-package-as-a-library}

###### Depend on It {#docs:current:clients:tertiary_clients:dart::depend-on-it}

Add the dependency with Flutter:

```batch
flutter pub add dart_duckdb
```

This will add a line like this to your package's `pubspec.yaml` (and run an implicit `flutter pub get`):

```yaml
dependencies:
  dart_duckdb: ^{{ site.current_duckdb_dart_version }}
```

Alternatively, your editor might support `flutter pub get`. Check the docs for your editor to learn more.

###### Import It {#docs:current:clients:tertiary_clients:dart::import-it}

Now in your Dart code, you can import it:

```dart
import 'package:dart_duckdb/dart_duckdb.dart';
```

#### Usage Examples {#docs:current:clients:tertiary_clients:dart::usage-examples}

See the example projects in the [`duckdb-dart` repository](https://github.com/TigerEyeLabs/duckdb-dart/):

* [`cli`](https://github.com/TigerEyeLabs/duckdb-dart/tree/main/examples/cli): command-line application
* [`duckdbexplorer`](https://github.com/TigerEyeLabs/duckdb-dart/tree/main/examples/duckdbexplorer): GUI application which builds for desktop operating systems as well as Android and iOS.

Here are some common code snippets for DuckDB.Dart:

##### Querying an In-Memory Database {#docs:current:clients:tertiary_clients:dart::querying-an-in-memory-database}

```dart
import 'package:dart_duckdb/dart_duckdb.dart';

void main() async {
  final db = await duckdb.open(":memory:");
  final connection = await duckdb.connect(db);

  await connection.execute('''
    CREATE TABLE users (id INTEGER, name VARCHAR, age INTEGER);
    INSERT INTO users VALUES (1, 'Alice', 30), (2, 'Bob', 25);
  ''');

  final result = (await connection.query("SELECT * FROM users WHERE age > 28")).fetchAll();

  for (final row in result) {
    print(row);
  }

  connection.dispose();
  db.dispose();
}
```

##### Using Multiple Connections {#docs:current:clients:tertiary_clients:dart::using-multiple-connections}

DuckDB.Dart automatically manages dedicated background isolates per connection, enabling efficient non-blocking I/O for concurrent queries. Each connection handles its own isolate internally, so you can simply create multiple connections for parallel operations:

```dart
import 'package:dart_duckdb/dart_duckdb.dart';

void main() async {
  final db = await duckdb.open(":memory:");

  // Create a table
  final con1 = await duckdb.connect(db);
  await con1.execute('''
    CREATE TABLE users (id INTEGER, name VARCHAR);
    INSERT INTO users VALUES (1, 'Alice'), (2, 'Bob');
  ''');

  // Query from multiple connections concurrently
  final con2 = await duckdb.connect(db);
  final con3 = await duckdb.connect(db);

  final future1 = con2.query("SELECT * FROM users WHERE id = 1");
  final future2 = con3.query("SELECT * FROM users WHERE id = 2");

  final result1 = (await future1).fetchAll();
  final result2 = (await future2).fetchAll();

  print(result1);
  print(result2);

  con1.dispose();
  con2.dispose();
  con3.dispose();
  db.dispose();
}
```

#### Web Support {#docs:current:clients:tertiary_clients:dart::web-support}

DuckDB.Dart supports web platforms through DuckDB WASM. For Flutter web builds, you need to configure the necessary JavaScript dependencies.

##### Setup for Flutter Web {#docs:current:clients:tertiary_clients:dart::setup-for-flutter-web}

Add the following to `web/index.html` inside the `<head>` section to load DuckDB WASM and Apache Arrow:

```html
<script type="importmap">
  {
    "imports": {
      "apache-arrow": "https://cdn.jsdelivr.net/npm/apache-arrow@17.0.0/+esm"
    }
  }
</script>
<script type="module">
  import * as duckdb from "https://cdn.jsdelivr.net/npm/@duckdb/duckdb-wasm@1.29.1-dev222.0/+esm";
  import * as arrow from "apache-arrow";
  window.duckdbWasmReady = new Promise((resolve) => {
    window.duckdbduckdbWasm = duckdb;
    window.ArrowTable = arrow.Table;
    resolve();
  });
</script>
```

##### Usage in Flutter Web {#docs:current:clients:tertiary_clients:dart::usage-in-flutter-web}

Once configured, you can use DuckDB the same way as on other platforms:

```dart
import 'package:dart_duckdb/dart_duckdb.dart';

void main() async {
  final db = await duckdb.open(":memory:");
  final connection = await duckdb.connect(db);

  await connection.execute('''
    CREATE TABLE data (id INTEGER, value VARCHAR);
    INSERT INTO data VALUES (1, 'hello'), (2, 'world');
  ''');

  final result = (await connection.query("SELECT * FROM data")).fetchAll();
  for (final row in result) {
    print(row);
  }

  connection.dispose();
  db.dispose();
}
```

For more platform-specific details, see the [Building Instructions](https://github.com/TigerEyeLabs/duckdb-dart/blob/main/BUILDING.md).

### Julia Client {#docs:current:clients:tertiary_clients:julia}

The DuckDB Julia package provides a high-performance front-end for DuckDB. Much like SQLite, DuckDB runs in-process within the Julia client, and provides a `DBInterface` front-end.

The package also supports multi-threaded execution. It uses Julia threads/tasks for this purpose. If you wish to run queries in parallel, you must launch Julia with multi-threading support (by e.g., setting the `JULIA_NUM_THREADS` environment variable).

#### Installation {#docs:current:clients:tertiary_clients:julia::installation}

Install DuckDB as follows:

```julia
using Pkg
Pkg.add("DuckDB")
```

Alternatively, enter the package manager using the `]` key, and issue the following command:

```julia
pkg> add DuckDB
```

#### Basics {#docs:current:clients:tertiary_clients:julia::basics}

```julia
using DuckDB

# create a new in-memory database
con = DBInterface.connect(DuckDB.DB, ":memory:")

# create a table
DBInterface.execute(con, "CREATE TABLE integers (i INTEGER)")

# insert data by executing a prepared statement
stmt = DBInterface.prepare(con, "INSERT INTO integers VALUES (?)")
DBInterface.execute(stmt, [42])

# query the database
results = DBInterface.execute(con, "SELECT 42 a")
print(results)
```

Some SQL statements, such as PIVOT and IMPORT DATABASE are executed as multiple prepared statements and will error when using `DuckDB.execute()`. Instead they can be run with `DuckDB.query()` instead of `DuckDB.execute()` and will always return a materialized result.

#### Scanning DataFrames {#docs:current:clients:tertiary_clients:julia::scanning-dataframes}

The DuckDB Julia package also provides support for querying Julia DataFrames. Note that the DataFrames are directly read by DuckDB – they are not inserted or copied into the database itself.

If you wish to load data from a DataFrame into a DuckDB table you can run a `CREATE TABLE ... AS` or `INSERT INTO` query.

```julia
using DuckDB
using DataFrames

# create a new in-memory database
con = DBInterface.connect(DuckDB.DB)

# create a DataFrame
df = DataFrame(a = [1, 2, 3], b = [42, 84, 42])

# register it as a view in the database
DuckDB.register_data_frame(con, df, "my_df")

# run a SQL query over the DataFrame
results = DBInterface.execute(con, "SELECT * FROM my_df")
print(results)
```

#### Appender API {#docs:current:clients:tertiary_clients:julia::appender-api}

The DuckDB Julia package also supports the [Appender API](#docs:current:data:appender), which is much faster than using prepared statements or individual `INSERT INTO` statements. Appends are made in row-wise format. For every column, an `append()` call should be made, after which the row should be finished by calling `flush()`. After all rows have been appended, `close()` should be used to finalize the Appender and clean up the resulting memory.

```julia
using DuckDB, DataFrames, Dates
db = DuckDB.DB()
# create a table
DBInterface.execute(db,
    "CREATE OR REPLACE TABLE data (id INTEGER PRIMARY KEY, value FLOAT, timestamp TIMESTAMP, date DATE)")
# create data to insert
len = 100
df = DataFrames.DataFrame(
        id = collect(1:len),
        value = rand(len),
        timestamp = Dates.now() + Dates.Second.(1:len),
        date = Dates.today() + Dates.Day.(1:len)
    )
# append data by row
appender = DuckDB.Appender(db, "data")
for i in eachrow(df)
    for j in i
        DuckDB.append(appender, j)
    end
    DuckDB.end_row(appender)
end
# close the appender after all rows
DuckDB.close(appender)
```

#### Concurrency {#docs:current:clients:tertiary_clients:julia::concurrency}

Within a Julia process, tasks are able to concurrently read and write to the database, as long as each task maintains its own connection to the database. In the example below, a single task is spawned to periodically read the database and many tasks are spawned to write to the database using both [`INSERT` statements](#docs:current:sql:statements:insert) as well as the [Appender API](#docs:current:data:appender).

```julia
using Dates, DataFrames, DuckDB
db = DuckDB.DB()
DBInterface.connect(db)
DBInterface.execute(db, "CREATE OR REPLACE TABLE data (date TIMESTAMP, id INTEGER)")

function run_reader(db)
    # create a DuckDB connection specifically for this task
    conn = DBInterface.connect(db)
    while true
        println(DBInterface.execute(conn,
                "SELECT id, count(date) AS count, max(date) AS max_date
                FROM data GROUP BY id ORDER BY id") |> DataFrames.DataFrame)
        Threads.sleep(1)
    end
    DBInterface.close(conn)
end
# spawn one reader task
Threads.@spawn run_reader(db)

function run_inserter(db, id)
    # create a DuckDB connection specifically for this task
    conn = DBInterface.connect(db)
    for i in 1:1000
        Threads.sleep(0.01)
        DuckDB.execute(conn, "INSERT INTO data VALUES (current_timestamp, ?)"; id);
    end
    DBInterface.close(conn)
end
# spawn many insert tasks
for i in 1:100
    Threads.@spawn run_inserter(db, 1)
end

function run_appender(db, id)
    # create a DuckDB connection specifically for this task
    appender = DuckDB.Appender(db, "data")
    for i in 1:1000
        Threads.sleep(0.01)
        row = (Dates.now(Dates.UTC), id)
        for j in row
            DuckDB.append(appender, j);
        end
        DuckDB.end_row(appender);
    end
    DuckDB.close(appender);
end
# spawn many appender tasks
for i in 1:100
    Threads.@spawn run_appender(db, 2)
end
```

#### Original Julia Connector {#docs:current:clients:tertiary_clients:julia::original-julia-connector}

Credits to kimmolinna for the [original DuckDB Julia connector](https://github.com/kimmolinna/DuckDB.jl).

### PHP Client {#docs:current:clients:tertiary_clients:php}

> The DuckDB PHP client is a [tertiary client](#docs:current:clients:overview) and is maintained by a third-party.

Client API for PHP, focused on performance. The DuckDB PHP client uses the official C API internally through [FFI](https://www.php.net/manual/en/book.ffi.php), achieving good benchmark results.
This library is more than just a wrapper for the C API; it introduces custom, PHP-friendly methods to simplify working with DuckDB. It is compatible with Linux, Windows and macOS, requiring PHP version 8.3 or higher.

Full documentation is available at [https://duckdb-php.readthedocs.io/](https://duckdb-php.readthedocs.io/).

#### Automatic Install (Recommended for Newcomers) {#docs:current:clients:tertiary_clients:php::automatic-install-recommended-for-newcomers}

```batch
composer require satur.io/duckdb-auto
```

You will need to allow `satur.io/duckdb-auto` to execute code to use this installation method,
check [installation](https://duckdb-php.readthedocs.io/en/latest/installation) for more details.

#### Quick Start {#docs:current:clients:tertiary_clients:php::quick-start}

```php
DuckDB::sql("SELECT 'quack' as my_column")->print();    
```

```text
-------------------
| my_column       |
-------------------
| quack           |
-------------------
```

The function we used here, `DuckDB::sql()`, performs the query in a new
in-memory database which is destroyed after retrieving the result.

This is not the most common use case, let's see how to get a persistent connection.

##### Connection {#docs:current:clients:tertiary_clients:php::connection}

```php
$duckDB = DuckDB::create('duck.db'); // or DuckDB::create() for in-memory database

$duckDB->query('CREATE TABLE test (i INTEGER, b BOOL, f FLOAT);');
$duckDB->query('INSERT INTO test VALUES (3, true, 1.1), (5, true, 1.2), (3, false, 1.1), (3, null, 1.2);');

$duckDB->query('SELECT * FROM test')->print();
```

As you probably guessed, `DuckDB::create()` creates a new connection to the specified database,
or creates a new one if it doesn't exist yet and then establishes the connection.

After that, we can use the function `query` to perform the requests.

> Notice the difference between the static method `sql` and the non-static method `query`.
> While the first one always creates and destroys a new in-memory database, the second one
> uses a previously established connection and should be the preferred option in most cases.

In addition, the library also provides prepared statements for binding parameters to our query.

##### Prepared Statements {#docs:current:clients:tertiary_clients:php::prepared-statements}

```php
$duckDB = DuckDB::create();

$duckDB->query('CREATE TABLE test (i INTEGER, b BOOL, f FLOAT);');
$duckDB->query('INSERT INTO test VALUES (3, true, 1.1), (5, true, 1.2), (3, false, 1.1), (3, null, 1.2);');

$boolPreparedStatement = $duckDB->preparedStatement('SELECT * FROM test WHERE b = $1');
$boolPreparedStatement->bindParam(1, true);
$result = $boolPreparedStatement->execute();
$result->print();

$intPreparedStatement = $duckDB->preparedStatement('SELECT * FROM test WHERE i = ?');
$intPreparedStatement->bindParam(1, 3);
$result = $intPreparedStatement->execute();
$result->print();
```

##### Appenders {#docs:current:clients:tertiary_clients:php::appenders}

Appenders are the preferred method to load data in DuckDB. See [Appender page](#docs:current:clients:c:appender)
for more information.

```php
$duckDB = DuckDB::create();
$result = $duckDB->query('CREATE TABLE people (id INTEGER, name VARCHAR);');

$appender = $duckDB->appender('people');

for ($i = 0; $i < 100; ++$i) {
    $appender->append(rand(1, 100000));
    $appender->append('string-'.rand(1, 100));
    $appender->endRow();
}

$appender->flush();
```

##### DuckDB-Powerful {#docs:current:clients:tertiary_clients:php::duckdb-powerful}

DuckDB provides some amazing features. For example, 
you can query remote files directly.

Let's use an aggregate function to calculate the average of a column
for a Parquet remote file:

```php
DuckDB::sql(
    'SELECT "Reporting Year", avg("Gas Produced, MCF") as "AVG Gas Produced" 
    FROM "https://github.com/plotly/datasets/raw/refs/heads/master/oil-and-gas.parquet" 
    WHERE "Reporting Year" BETWEEN 1985 AND 1990
    GROUP BY "Reporting Year";'
)->print();
```

```text
--------------------------------------
| Reporting Year   | AVG Gas Produce |
--------------------------------------
| 1985             | 2461.4047344111 |
| 1986             | 6060.8575605681 |
| 1987             | 5047.5813074014 |
| 1988             | 4763.4090541633 |
| 1989             | 4175.2989758837 |
| 1990             | 3706.9404742437 |
--------------------------------------
```

Or summarize a remote CSV:

```php
DuckDB::sql('SUMMARIZE TABLE "https://blobs.duckdb.org/data/Star_Trek-Season_1.csv";')->print();
```

```text
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| column_name      | column_type      | min              | max              | approx_unique    | avg              | std              | q25              | q50              | q75              | count            | null_percentage |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| season_num       | BIGINT           | 1                | 1                | 1                | 1.0              | 0.0              | 1                | 1                | 1                | 30               | 0               |
| episode_num      | BIGINT           | 0                | 29               | 29               | 14.5             | 8.8034084308295  | 7                | 14               | 22               | 30               | 0               |
| aired_date       | DATE             | 1965-02-28       | 1967-04-13       | 35               |                  |                  | 1966-10-20       | 1966-12-22       | 1967-02-16       | 30               | 0               |
| cnt_kirk_hookup  | BIGINT           | 0                | 2                | 3                | 0.3333333333333  | 0.6064784348631  | 0                | 0                | 1                | 30               | 0               |
...
```

#### Requirements {#docs:current:clients:tertiary_clients:php::requirements}

* Linux, macOS or Windows.
* x64 platform.
* PHP >= 8.3.
* ext-ffi.

##### Recommended {#docs:current:clients:tertiary_clients:php::recommended}

* ext-bcmath – Needed for big integers (> PHP_INT_MAX).
* ext-zend-opcache – For better performance.

#### Type Support {#docs:current:clients:tertiary_clients:php::type-support}

From version 1.2.0 on the library supports all DuckDB file types.



| DuckDB Type              | SQL Type     | PHP Type                             |
|--------------------------|--------------|--------------------------------------|
| DUCKDB_TYPE_BOOLEAN      | BOOLEAN      | bool                                 |
| DUCKDB_TYPE_TINYINT      | TINYINT      | int                                  |
| DUCKDB_TYPE_SMALLINT     | SMALLINT     | int                                  |
| DUCKDB_TYPE_INTEGER      | INTEGER      | int                                  |
| DUCKDB_TYPE_BIGINT       | BIGINT       | int                                  |
| DUCKDB_TYPE_UTINYINT     | UTINYINT     | int                                  |
| DUCKDB_TYPE_USMALLINT    | USMALLINT    | int                                  |
| DUCKDB_TYPE_UINTEGER     | UINTEGER     | int                                  |
| DUCKDB_TYPE_UBIGINT      | UBIGINT      | Saturio\DuckDB\Type\Math\LongInteger |
| DUCKDB_TYPE_FLOAT        | FLOAT        | float                                |
| DUCKDB_TYPE_DOUBLE       | DOUBLE       | float                                |
| DUCKDB_TYPE_TIMESTAMP    | TIMESTAMP    | Saturio\DuckDB\Type\Timestamp        |
| DUCKDB_TYPE_DATE         | DATE         | Saturio\DuckDB\Type\Date             |
| DUCKDB_TYPE_TIME         | TIME         | Saturio\DuckDB\Type\Time             |
| DUCKDB_TYPE_INTERVAL     | INTERVAL     | Saturio\DuckDB\Type\Interval         |
| DUCKDB_TYPE_HUGEINT      | HUGEINT      | Saturio\DuckDB\Type\Math\LongInteger |
| DUCKDB_TYPE_UHUGEINT     | UHUGEINT     | Saturio\DuckDB\Type\Math\LongInteger |
| DUCKDB_TYPE_VARCHAR      | VARCHAR      | string                               |
| DUCKDB_TYPE_BLOB         | BLOB         | Saturio\DuckDB\Type\Blob             |
| DUCKDB_TYPE_TIMESTAMP_S  | TIMESTAMP_S  | Saturio\DuckDB\Type\Timestamp        |
| DUCKDB_TYPE_TIMESTAMP_MS | TIMESTAMP_MS | Saturio\DuckDB\Type\Timestamp        |
| DUCKDB_TYPE_TIMESTAMP_NS | TIMESTAMP_NS | Saturio\DuckDB\Type\Timestamp        |
| DUCKDB_TYPE_UUID         | UUID         | Saturio\DuckDB\Type\UUID             |
| DUCKDB_TYPE_TIME_TZ      | TIMETZ       | Saturio\DuckDB\Type\Time             |
| DUCKDB_TYPE_TIMESTAMP_TZ | TIMESTAMPTZ  | Saturio\DuckDB\Type\Timestamp        |
| DUCKDB_TYPE_DECIMAL      | DECIMAL      | float                                |
| DUCKDB_TYPE_ENUM         | ENUM         | string                               |
| DUCKDB_TYPE_LIST         | LIST         | array                                |
| DUCKDB_TYPE_STRUCT       | STRUCT       | array                                |
| DUCKDB_TYPE_ARRAY        | ARRAY        | array                                |
| DUCKDB_TYPE_MAP          | MAP          | array                                |
| DUCKDB_TYPE_UNION        | UNION        | mixed                                |
| DUCKDB_TYPE_BIT          | BIT          | string                               |
| DUCKDB_TYPE_BIGNUM       | BIGNUM       | string                               |
| DUCKDB_TYPE_SQLNULL      | NULL         | null                                 |

### Swift Client {#docs:current:clients:tertiary_clients:swift}

DuckDB has a Swift client. See the [announcement post](https://duckdb.org/2023/04/21/swift) for details.

#### Instantiating DuckDB {#docs:current:clients:tertiary_clients:swift::instantiating-duckdb}

DuckDB supports both in-memory and persistent databases.
To work with an in-memory database, run:

```swift
let database = try Database(store: .inMemory)
```

To work with a persistent database, run:

```swift
let database = try Database(store: .file(at: "test.db"))
```

Queries can be issued through a database connection.

```swift
let connection = try database.connect()
```

DuckDB supports multiple connections per database.

#### Application Example {#docs:current:clients:tertiary_clients:swift::application-example}

The rest of the page is based on the example of our [announcement post](https://duckdb.org/2023/04/21/swift), which uses raw data from [NASA's Exoplanet Archive](https://exoplanetarchive.ipac.caltech.edu) loaded directly into DuckDB.

##### Creating an Application-Specific Type {#docs:current:clients:tertiary_clients:swift::creating-an-application-specific-type}

We first create an application-specific type that we'll use to house our database and connection and through which we'll eventually define our app-specific queries.

```swift
import DuckDB

final class ExoplanetStore {

  let database: Database
  let connection: Connection

  init(database: Database, connection: Connection) {
    self.database = database
    self.connection = connection
  }
}
```

##### Loading a CSV File {#docs:current:clients:tertiary_clients:swift::loading-a-csv-file}

We load the data from [NASA's Exoplanet Archive](https://exoplanetarchive.ipac.caltech.edu):

```text
wget https://exoplanetarchive.ipac.caltech.edu/TAP/sync?query=select+pl_name+,+disc_year+from+pscomppars&format=csv -O downloaded_exoplanets.csv
```

Once we have our CSV downloaded locally, we can use the following SQL command to load it as a new table to DuckDB:

```sql
CREATE TABLE exoplanets AS
    SELECT * FROM read_csv('downloaded_exoplanets.csv');
```

Let's package this up as a new asynchronous factory method on our `ExoplanetStore` type:

```swift
import DuckDB
import Foundation

final class ExoplanetStore {

  // Factory method to create and prepare a new ExoplanetStore
  static func create() async throws -> ExoplanetStore {

  // Create our database and connection as described above
    let database = try Database(store: .inMemory)
    let connection = try database.connect()

  // Download the CSV from the exoplanet archive
  let (csvFileURL, _) = try await URLSession.shared.download(
    from: URL(string: "https://exoplanetarchive.ipac.caltech.edu/TAP/sync?query=select+pl_name+,+disc_year+from+pscomppars&format=csv")!)

  // Issue our first query to DuckDB
  try connection.execute("""
      CREATE TABLE exoplanets AS
          SELECT * FROM read_csv('\(csvFileURL.path)');
  """)

  // Create our pre-populated ExoplanetStore instance
    return ExoplanetStore(
    database: database,
      connection: connection
  )
  }

  // Let's make the initializer we defined previously
  // private. This prevents anyone accidentally instantiating
  // the store without having pre-loaded our Exoplanet CSV
  // into the database
  private init(database: Database, connection: Connection) {
  ...
  }
}
```

##### Querying the Database {#docs:current:clients:tertiary_clients:swift::querying-the-database}

The following example queries DuckDB from within Swift via an async function. This means the callee won't be blocked while the query is executing. We'll then cast the result columns to Swift native types using DuckDB's `ResultSet` `cast(to:)` family of methods, before finally wrapping them up in a `DataFrame` from the TabularData framework.

```swift
...

import TabularData

extension ExoplanetStore {

  // Retrieves the number of exoplanets discovered by year
  func groupedByDiscoveryYear() async throws -> DataFrame {

  // Issue the query we described above
    let result = try connection.query("""
      SELECT disc_year, count(disc_year) AS Count
        FROM exoplanets
        GROUP BY disc_year
        ORDER BY disc_year
      """)

    // Cast our DuckDB columns to their native Swift
    // equivalent types
    let discoveryYearColumn = result[0].cast(to: Int.self)
    let countColumn = result[1].cast(to: Int.self)

    // Use our DuckDB columns to instantiate TabularData
    // columns and populate a TabularData DataFrame
    return DataFrame(columns: [
      TabularData.Column(discoveryYearColumn).eraseToAnyColumn(),
      TabularData.Column(countColumn).eraseToAnyColumn(),
    ])
  }
}
```

##### Complete Project {#docs:current:clients:tertiary_clients:swift::complete-project}

For the complete example project, clone the [DuckDB Swift repository](https://github.com/duckdb/duckdb-swift) and open up the runnable app project located in [`Examples/SwiftUI/ExoplanetExplorer.xcodeproj`](https://github.com/duckdb/duckdb-swift/tree/main/Examples/SwiftUI/ExoplanetExplorer.xcodeproj).

# SQL {#sql}

## SQL Introduction {#docs:current:sql:introduction}

This page provides an overview of how to perform simple operations in SQL.
This tutorial is only intended to give you an introduction and is in no way a complete tutorial on SQL.
This tutorial is adapted from the [PostgreSQL tutorial](https://www.postgresql.org/docs/current/tutorial-sql-intro.html).

> DuckDB's SQL dialect closely follows the conventions of the PostgreSQL dialect.
> The few exceptions to this are listed on the [PostgreSQL compatibility page](#docs:current:sql:dialect:postgresql_compatibility).

In the examples that follow, we assume that you have installed the DuckDB Command Line Interface (CLI) shell. See the [installation page](https://duckdb.org/install) for information on how to install the CLI.

> **Tip.** If you are looking for a comprehensive to SQL introduction,
> check the slide decks of the [“Tabular Database Systems” course](#_library:2026-03-19-tabular-database-systems):
>
> * [The Structured Query Language (SQL)](https://github.com/DBatUTuebingen/TaDa/blob/main/slides/TaDa-06.pdf)
> * [More SQL (Subqueries + Embedded SQL)](https://github.com/DBatUTuebingen/TaDa/blob/main/slides/TaDa-07.pdf)
> * [SQL: Grouping + Aggregation and Functional Dependencies](https://github.com/DBatUTuebingen/TaDa/blob/main/slides/TaDa-08.pdf)

#### Concepts {#docs:current:sql:introduction::concepts}

DuckDB is a relational database management system (RDBMS). That means it is a system for managing data stored in relations. A relation is essentially a mathematical term for a table.

Each table is a named collection of rows. Each row of a given table has the same set of named columns, and each column is of a specific data type. Tables themselves are stored inside schemas, and a collection of schemas constitutes the entire database that you can access.

#### Creating a New Table {#docs:current:sql:introduction::creating-a-new-table}

You can create a new table by specifying the table name, along with all column names and their types:

```sql
CREATE TABLE weather (
    city    VARCHAR,
    temp_lo INTEGER, -- minimum temperature on a day
    temp_hi INTEGER, -- maximum temperature on a day
    prcp    FLOAT,
    date    DATE
);
```

You can enter this into the shell with the line breaks. The command is not terminated until the semicolon.

White space (i.e., spaces, tabs and newlines) can be used freely in SQL commands. That means you can type the command aligned differently than above, or even all on one line. Two dash characters (` --`) introduce comments. Whatever follows them is ignored up to the end of the line. SQL is case-insensitive about keywords and identifiers. When returning identifiers, [their original cases are preserved](#docs:current:sql:dialect:keywords_and_identifiers::rules-for-case-sensitivity).

In the SQL command, we first specify the type of command that we want to perform: `CREATE TABLE`. After that follows the parameters for the command. First, the table name, `weather`, is given. Then the column names and column types follow.

`city VARCHAR` specifies that the table has a column called `city` that is of type `VARCHAR`. `VARCHAR` specifies a data type that can store text of arbitrary length. The temperature fields are stored in an `INTEGER` type, a type that stores integer numbers (i.e., whole numbers without a decimal point). `FLOAT` columns store single precision floating-point numbers (i.e., numbers with a decimal point). `DATE` stores a date (i.e., year, month, day combination). `DATE` only stores the specific day, not a time associated with that day.

DuckDB supports the standard SQL types `INTEGER`, `SMALLINT`, `FLOAT`, `DOUBLE`, `DECIMAL`, `CHAR(n)`, `VARCHAR(n)`, `DATE`, `TIME` and `TIMESTAMP`.

The second example will store cities and their associated geographical location:

```sql
CREATE TABLE cities (
    name VARCHAR,
    lat  DECIMAL,
    lon  DECIMAL
);
```

Finally, it should be mentioned that if you don't need a table any longer or want to recreate it differently you can remove it using the following command:

```sql
DROP TABLE ⟨tablename⟩;
```

#### Populating a Table with Rows {#docs:current:sql:introduction::populating-a-table-with-rows}

The insert statement is used to populate a table with rows:

```sql
INSERT INTO weather
VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27');
```

Constants that are not numeric values (e.g., text and dates) must be surrounded by single quotes (` ''`), as in the example. Input dates for the date type must be formatted as `'YYYY-MM-DD'`.

We can insert into the `cities` table in the same manner.

```sql
INSERT INTO cities
VALUES ('San Francisco', -194.0, 53.0);
```

The syntax used so far requires you to remember the order of the columns. An alternative syntax allows you to list the columns explicitly:

```sql
INSERT INTO weather (city, temp_lo, temp_hi, prcp, date)
VALUES ('San Francisco', 43, 57, 0.0, '1994-11-29');
```

You can list the columns in a different order if you wish or even omit some columns, e.g., if the `prcp` is unknown:

```sql
INSERT INTO weather (date, city, temp_hi, temp_lo)
VALUES ('1994-11-29', 'Hayward', 54, 37);
```

> **Tip.** Many developers consider explicitly listing the columns better style than relying on the order implicitly.

Please enter all the commands shown above so you have some data to work with in the following sections.

Alternatively, you can use the `COPY` statement. This is faster for large amounts of data because the `COPY` command is optimized for bulk loading while allowing less flexibility than `INSERT`. An example with [`weather.csv`](https://duckdb.org/data/weather.csv) would be:

```sql
COPY weather
FROM 'weather.csv';
```

Where the file name for the source file must be available on the machine running the process. There are many other ways of loading data into DuckDB, see the [corresponding documentation section](#docs:current:data:overview) for more information.

#### Querying a Table {#docs:current:sql:introduction::querying-a-table}

To retrieve data from a table, the table is queried. A SQL `SELECT` statement is used to do this. The statement is divided into a select list (the part that lists the columns to be returned), a table list (the part that lists the tables from which to retrieve the data), and an optional qualification (the part that specifies any restrictions). For example, to retrieve all the rows of table weather, type:

```sql
SELECT *
FROM weather;
```

Here `*` is a shorthand for “all columns”. So the same result would be had with:

```sql
SELECT city, temp_lo, temp_hi, prcp, date
FROM weather;
```

The output should be:

|     city      | temp_lo | temp_hi | prcp |    date    |
|---------------|--------:|--------:|-----:|------------|
| San Francisco | 46      | 50      | 0.25 | 1994-11-27 |
| San Francisco | 43      | 57      | 0.0  | 1994-11-29 |
| Hayward       | 37      | 54      | NULL | 1994-11-29 |

You can write expressions, not just simple column references, in the select list. For example, you can do:

```sql
SELECT city, (temp_hi + temp_lo) / 2 AS temp_avg, date
FROM weather;
```

This should give:

|     city      | temp_avg |    date    |
|---------------|---------:|------------|
| San Francisco | 48.0     | 1994-11-27 |
| San Francisco | 50.0     | 1994-11-29 |
| Hayward       | 45.5     | 1994-11-29 |

Notice how the `AS` clause is used to relabel the output column. (The `AS` clause is optional.)

A query can be “qualified” by adding a `WHERE` clause that specifies which rows are wanted. The `WHERE` clause contains a Boolean (truth value) expression, and only rows for which the Boolean expression is true are returned. The usual Boolean operators (` AND`, `OR` and `NOT`) are allowed in the qualification. For example, the following retrieves the weather of San Francisco on rainy days:

```sql
SELECT *
FROM weather
WHERE city = 'San Francisco'
  AND prcp > 0.0;
```

Result:

|     city      | temp_lo | temp_hi | prcp |    date    |
|---------------|--------:|--------:|-----:|------------|
| San Francisco | 46      | 50      | 0.25 | 1994-11-27 |

You can request that the results of a query be returned in sorted order:

```sql
SELECT *
FROM weather
ORDER BY city;
```

|     city      | temp_lo | temp_hi | prcp |    date    |
|---------------|--------:|--------:|-----:|------------|
| Hayward       | 37      | 54      | NULL | 1994-11-29 |
| San Francisco | 43      | 57      | 0.0  | 1994-11-29 |
| San Francisco | 46      | 50      | 0.25 | 1994-11-27 |

In this example, the sort order isn't fully specified, and so you might get the San Francisco rows in either order. But you'd always get the results shown above if you do:

```sql
SELECT *
FROM weather
ORDER BY city, temp_lo;
```

You can request that duplicate rows be removed from the result of a query:

```sql
SELECT DISTINCT city
FROM weather;
```

|     city      |
|---------------|
| San Francisco |
| Hayward       |

Here again, the result row ordering might vary. You can ensure consistent results by using `DISTINCT` and `ORDER BY` together:

```sql
SELECT DISTINCT city
FROM weather
ORDER BY city;
```

#### Joins between Tables {#docs:current:sql:introduction::joins-between-tables}

Thus far, our queries have only accessed one table at a time. Queries can access multiple tables at once, or access the same table in such a way that multiple rows of the table are being processed at the same time. A query that accesses multiple rows of the same or different tables at one time is called a join query. As an example, say you wish to list all the weather records together with the location of the associated city. To do that, we need to compare the city column of each row of the `weather` table with the name column of all rows in the `cities` table, and select the pairs of rows where these values match.

This would be accomplished by the following query:

```sql
SELECT *
FROM weather, cities
WHERE city = name;
```

|     city      | temp_lo | temp_hi | prcp |    date    |     name      |   lat    |  lon   |
|---------------|--------:|--------:|-----:|------------|---------------|---------:|-------:|
| San Francisco | 46      | 50      | 0.25 | 1994-11-27 | San Francisco | -194.000 | 53.000 |
| San Francisco | 43      | 57      | 0.0  | 1994-11-29 | San Francisco | -194.000 | 53.000 |

Observe two things about the result set:

* There is no result row for the city of Hayward. This is because there is no matching entry in the `cities` table for Hayward, so the join ignores the unmatched rows in the `weather` table. We will see shortly how this can be fixed.
* There are two columns containing the city name. This is correct because the lists of columns from the `weather` and `cities` tables are concatenated. In practice this is undesirable, though, so you will probably want to list the output columns explicitly rather than using `*`:

```sql
SELECT city, temp_lo, temp_hi, prcp, date, lon, lat
FROM weather, cities
WHERE city = name;
```

|     city      | temp_lo | temp_hi | prcp |    date    |  lon   |   lat    |
|---------------|--------:|--------:|-----:|------------|-------:|---------:|
| San Francisco | 46      | 50      | 0.25 | 1994-11-27 | 53.000 | -194.000 |
| San Francisco | 43      | 57      | 0.0  | 1994-11-29 | 53.000 | -194.000 |

Since the columns all had different names, the parser automatically found which table they belong to. If there were duplicate column names in the two tables you'd need to qualify the column names to show which one you meant, as in:

```sql
SELECT weather.city, weather.temp_lo, weather.temp_hi,
       weather.prcp, weather.date, cities.lon, cities.lat
FROM weather, cities
WHERE cities.name = weather.city;
```

It is widely considered good style to qualify all column names in a join query, so that the query won't fail if a duplicate column name is later added to one of the tables.

Join queries of the kind seen thus far can also be written in this alternative form:

```sql
SELECT *
FROM weather
INNER JOIN cities ON weather.city = cities.name;
```

This syntax is not as commonly used as the one above, but we show it here to help you understand the following topics.

Now we will figure out how we can get the Hayward records back in. What we want the query to do is to scan the `weather` table and for each row to find the matching cities row(s). If no matching row is found we want some “empty values” to be substituted for the `cities` table's columns. This kind of query is called an outer join. (The joins we have seen so far are inner joins.) The command looks like this:

```sql
SELECT *
FROM weather
LEFT OUTER JOIN cities ON weather.city = cities.name;
```

|     city      | temp_lo | temp_hi | prcp |    date    |     name      |   lat    |  lon   |
|---------------|--------:|--------:|-----:|------------|---------------|---------:|-------:|
| San Francisco | 46      | 50      | 0.25 | 1994-11-27 | San Francisco | -194.000 | 53.000 |
| San Francisco | 43      | 57      | 0.0  | 1994-11-29 | San Francisco | -194.000 | 53.000 |
| Hayward       | 37      | 54      | NULL | 1994-11-29 | NULL          | NULL     | NULL   |

This query is called a left outer join because the table mentioned on the left of the join operator will have each of its rows in the output at least once, whereas the table on the right will only have those rows output that match some row of the left table. When outputting a left-table row for which there is no right-table match, empty (null) values are substituted for the right-table columns.

#### Aggregate Functions {#docs:current:sql:introduction::aggregate-functions}

Like most other relational database products, DuckDB supports aggregate functions. An aggregate function computes a single result from multiple input rows. For example, there are aggregates to compute the `count`, `sum`, `avg` (average), `max` (maximum) and `min` (minimum) over a set of rows.

As an example, we can find the highest low-temperature reading anywhere with:

```sql
SELECT max(temp_lo)
FROM weather;
```

| max(temp_lo) |
|-------------:|
| 46           |

If we wanted to know what city (or cities) that reading occurred in, we might try:

```sql
SELECT city
FROM weather
WHERE temp_lo = max(temp_lo);
```

But this will not work since the aggregate max cannot be used in the `WHERE` clause:

```console
Binder Error:
WHERE clause cannot contain aggregates!
```

This restriction exists because the `WHERE` clause determines which rows will be included in the aggregate calculation; so obviously it has to be evaluated before aggregate functions are computed.
However, as is often the case the query can be restated to accomplish the desired result, here by using a subquery:

```sql
SELECT city
FROM weather
WHERE temp_lo = (SELECT max(temp_lo) FROM weather);
```

|     city      |
|---------------|
| San Francisco |

This is OK because the subquery is an independent computation that computes its own aggregate separately from what is happening in the outer query.

Aggregates are also very useful in combination with `GROUP BY` clauses. For example, we can get the maximum low temperature observed in each city with:

```sql
SELECT city, max(temp_lo)
FROM weather
GROUP BY city;
```

|     city      | max(temp_lo) |
|---------------|--------------|
| San Francisco | 46           |
| Hayward       | 37           |

Which gives us one output row per city. Each aggregate result is computed over the table rows matching that city. We can filter these grouped rows using `HAVING`:

```sql
SELECT city, max(temp_lo)
FROM weather
GROUP BY city
HAVING max(temp_lo) < 40;
```

|  city   | max(temp_lo) |
|---------|-------------:|
| Hayward | 37           |

which gives us the same results for only the cities that have all `temp_lo` values below 40. Finally, if we only care about cities whose names begin with `S`, we can use the `LIKE` operator:

```sql
SELECT city, max(temp_lo)
FROM weather
WHERE city LIKE 'S%'            -- (1)
GROUP BY city
HAVING max(temp_lo) < 40;
```

More information about the `LIKE` operator can be found in the [pattern matching page](#docs:current:sql:functions:pattern_matching).

It is important to understand the interaction between aggregates and SQL's `WHERE` and `HAVING` clauses. The fundamental difference between `WHERE` and `HAVING` is this: `WHERE` selects input rows before groups and aggregates are computed (thus, it controls which rows go into the aggregate computation), whereas `HAVING` selects group rows after groups and aggregates are computed. Thus, the `WHERE` clause must not contain aggregate functions; it makes no sense to try to use an aggregate to determine which rows will be inputs to the aggregates. On the other hand, the `HAVING` clause always contains aggregate functions.

In the previous example, we can apply the city name restriction in `WHERE`, since it needs no aggregate. This is more efficient than adding the restriction to `HAVING`, because we avoid doing the grouping and aggregate calculations for all rows that fail the `WHERE` check.

#### Updates {#docs:current:sql:introduction::updates}

You can update existing rows using the `UPDATE` command. Suppose you discover the temperature readings are all off by 2 degrees after November 28. You can correct the data as follows:

```sql
UPDATE weather
SET temp_hi = temp_hi - 2,  temp_lo = temp_lo - 2
WHERE date > '1994-11-28';
```

Look at the new state of the data:

```sql
SELECT *
FROM weather;
```

|     city      | temp_lo | temp_hi | prcp |    date    |
|---------------|--------:|--------:|-----:|------------|
| San Francisco | 46      | 50      | 0.25 | 1994-11-27 |
| San Francisco | 41      | 55      | 0.0  | 1994-11-29 |
| Hayward       | 35      | 52      | NULL | 1994-11-29 |

#### Deletions {#docs:current:sql:introduction::deletions}

Rows can be removed from a table using the `DELETE` command. Suppose you are no longer interested in the weather of Hayward. Then you can do the following to delete those rows from the table:

```sql
DELETE FROM weather
WHERE city = 'Hayward';
```

All weather records belonging to Hayward are removed.

```sql
SELECT *
FROM weather;
```

|     city      | temp_lo | temp_hi | prcp |    date    |
|---------------|--------:|--------:|-----:|------------|
| San Francisco | 46      | 50      | 0.25 | 1994-11-27 |
| San Francisco | 41      | 55      | 0.0  | 1994-11-29 |

One should be cautious when issuing statements of the following form:

```sql
DELETE FROM ⟨table_name⟩;
```

> **Warning.** Without a qualification, `DELETE` will remove all rows from the given table, leaving it empty. The system will not request confirmation before doing this.

## Statements {#sql:statements}

### Statements Overview {#docs:current:sql:statements:overview}


### ANALYZE Statement {#docs:current:sql:statements:analyze}

The `ANALYZE` statement recomputes the statistics on DuckDB's tables.

#### Usage {#docs:current:sql:statements:analyze::usage}

The statistics recomputed by the `ANALYZE` statement are only used for [join order optimization](https://blobs.duckdb.org/papers/tom-ebergen-msc-thesis-join-order-optimization-with-almost-no-statistics.pdf). It is therefore recommended to recompute these statistics for improved join orders, especially after performing large updates (inserts and/or deletes).

To recompute the statistics, run:

```sql
ANALYZE;
```

### ALTER TABLE Statement {#docs:current:sql:statements:alter_table}

The `ALTER TABLE` statement changes the schema of an existing table in the catalog.

#### Examples {#docs:current:sql:statements:alter_table::examples}

```sql
CREATE TABLE integers (i INTEGER, j INTEGER);
```

Add a new column with name `k` to the table `integers`, it will be filled with the default value `NULL`:

```sql
ALTER TABLE integers
ADD COLUMN k INTEGER;
```

Add a new column with name `l` to the table integers, it will be filled with the default value 10:

```sql
ALTER TABLE integers
ADD COLUMN l INTEGER DEFAULT 10;
```

Drop the column `k` from the table integers:

```sql
ALTER TABLE integers
DROP k;
```

Change the type of the column `i` to the type `VARCHAR` using a standard cast:

```sql
ALTER TABLE integers
ALTER i TYPE VARCHAR;
```

Change the type of the column `i` to the type `VARCHAR`, using the specified expression to convert the data for each row:

```sql
ALTER TABLE integers
ALTER i SET DATA TYPE VARCHAR USING concat(i, '_', j);
```

Set the default value of a column:

```sql
ALTER TABLE integers
ALTER COLUMN i SET DEFAULT 10;
```

Drop the default value of a column:

```sql
ALTER TABLE integers
ALTER COLUMN i DROP DEFAULT;
```

Make a column not nullable:

```sql
ALTER TABLE integers
ALTER COLUMN i SET NOT NULL;
```

Drop the not-`NULL` constraint:

```sql
ALTER TABLE integers
ALTER COLUMN i DROP NOT NULL;
```

Rename a table:

```sql
ALTER TABLE integers
RENAME TO integers_old;
```

Rename a column of a table:

```sql
ALTER TABLE integers
RENAME i TO ii;
```

Add a primary key to a column of a table:

```sql
ALTER TABLE integers
ADD PRIMARY KEY (i);
```

#### Syntax {#docs:current:sql:statements:alter_table::syntax}



`ALTER TABLE` changes the schema of an existing table.
All the changes made by `ALTER TABLE` fully respect the transactional semantics, i.e., they will not be visible to other transactions until committed, and can be fully reverted through a rollback.

#### `RENAME TABLE` {#docs:current:sql:statements:alter_table::rename-table}

Rename a table:

```sql
ALTER TABLE integers
RENAME TO integers_old;
```

The `RENAME TO` clause renames an entire table, changing its name in the schema. Note that any views that rely on the table are **not** automatically updated.

#### `RENAME COLUMN` {#docs:current:sql:statements:alter_table::rename-column}

To rename a column of a table, use the `RENAME` or `RENAME COLUMN` clauses:

```sql
ALTER TABLE integers 
RENAME COLUMN i TO j;
```

```sql
ALTER TABLE integers
RENAME i TO j;
```

The `RENAME [COLUMN]` clause renames a single column within a table. Any constraints that rely on this name (e.g., `CHECK` constraints) are automatically updated. However, note that any views that rely on this column name are **not** automatically updated.

#### `ADD COLUMN` {#docs:current:sql:statements:alter_table::add-column}

To add a column of a table, use the `ADD` or `ADD COLUMN` clauses.

E.g., to add a new column with name `k` to the table `integers`, it will be filled with the default value `NULL`:

```sql
ALTER TABLE integers
ADD COLUMN k INTEGER;
```

Or:

```sql
ALTER TABLE integers
ADD k INTEGER;
```

Add a new column with name `l` to the table integers, it will be filled with the default value 10:

```sql
ALTER TABLE integers
ADD COLUMN l INTEGER DEFAULT 10;
```

The `ADD [COLUMN]` clause can be used to add a new column of a specified type to a table. The new column will be filled with the specified default value, or `NULL` if none is specified.

#### `DROP COLUMN` {#docs:current:sql:statements:alter_table::drop-column}

To drop a column of a table, use the `DROP` or `DROP COLUMN` clause:

E.g., to drop the column `k` from the table `integers`:

```sql
ALTER TABLE integers
DROP COLUMN k;
```

Or:

```sql
ALTER TABLE integers
DROP k;
```

The `DROP [COLUMN]` clause can be used to remove a column from a table. Note that columns can only be removed if they do not have any indexes that rely on them. This includes any indexes created as part of a `PRIMARY KEY` or `UNIQUE` constraint. Columns that are part of multi-column check constraints cannot be dropped either.
If you attempt to drop a column with an index on it, DuckDB will return the following error message:

```console
Dependency Error:
Cannot alter entry "..." because there are entries that depend on it.
```

#### `[SET [DATA]] TYPE` {#docs:current:sql:statements:alter_table::set-data-type}

Change the type of the column `i` to the type `VARCHAR` using a standard cast:

```sql
ALTER TABLE integers
ALTER i TYPE VARCHAR;
```

> Instead of
> `ALTER ⟨column_name⟩ TYPE ⟨type⟩`{:.language-sql .highlight}, you can also use the equivalent
> `ALTER ⟨column_name⟩ SET TYPE ⟨type⟩`{:.language-sql .highlight} and the 
> `ALTER ⟨column_name⟩ SET DATA TYPE ⟨type⟩`{:.language-sql .highlight} clauses.

Change the type of the column `i` to the type `VARCHAR`, using the specified expression to convert the data for each row:

```sql
ALTER TABLE integers
ALTER i SET DATA TYPE VARCHAR USING concat(i, '_', j);
```

The `[SET [DATA]] TYPE` clause changes the type of a column in a table. Any data present in the column is converted according to the provided expression in the `USING` clause, or, if the `USING` clause is absent, cast to the new data type. Note that columns can only have their type changed if they do not have any indexes that rely on them and are not part of any `CHECK` constraints.

##### Handling Structs {#docs:current:sql:statements:alter_table::handling-structs}

There are two options to change the sub-schema of a [`STRUCT`](#docs:current:sql:data_types:struct)-typed column.

###### `ALTER TABLE` with `struct_insert` {#docs:current:sql:statements:alter_table::alter-table-with-struct_insert}

You can use `ALTER TABLE` with the `struct_insert` function.
For example:

```sql
CREATE TABLE tbl (col STRUCT(i INTEGER));
ALTER TABLE tbl
ALTER col TYPE USING struct_insert(col, a := 42, b := NULL::VARCHAR);
```

###### `ALTER TABLE` with `ADD COLUMN` / `DROP COLUMN` / `RENAME COLUMN` {#docs:current:sql:statements:alter_table::alter-table-with-add-column--drop-column--rename-column}

Starting with DuckDB v1.3.0, `ALTER TABLE` supports the
[`ADD COLUMN`, `DROP COLUMN` and `RENAME COLUMN` clauses](#docs:current:sql:data_types:struct::updating-the-schema)
to update the sub-schema of a `STRUCT`.

#### `SET` / `DROP DEFAULT` {#docs:current:sql:statements:alter_table::set--drop-default}

Set the default value of a column:

```sql
ALTER TABLE integers
ALTER COLUMN i SET DEFAULT 10;
```

Drop the default value of a column:

```sql
ALTER TABLE integers
ALTER COLUMN i DROP DEFAULT;
```

The `SET/DROP DEFAULT` clause modifies the `DEFAULT` value of an existing column. Note that this does not modify any existing data in the column. Dropping the default is equivalent to setting the default value to NULL.

> **Warning.** At the moment DuckDB will not allow you to alter a table if there are any dependencies. That means that if you have an index on a column you will first need to drop the index, alter the table, and then recreate the index. Otherwise, you will get a `Dependency Error`.

#### `ADD PRIMARY KEY` {#docs:current:sql:statements:alter_table::add-primary-key}

Add a primary key to a column of a table:

```sql
ALTER TABLE integers
ADD PRIMARY KEY (i);
```

Add a primary key to multiple columns of a table:

```sql
ALTER TABLE integers
ADD PRIMARY KEY (i, j);
```

#### `SET` / `RESET` (Table Options) {#docs:current:sql:statements:alter_table::set--reset-table-options}

> This feature was introduced in DuckDB v1.5.

Modify table options after table creation.

Set table options:

```sql
ALTER TABLE my_table
SET ('option_name' = 'value');
```

Reset table options to their defaults:

```sql
ALTER TABLE my_table
RESET ('option_name');
```

The `SET` clause assigns values to table options as key-value pairs. The `RESET` clause removes options or restores them to their default values, depending on the catalog implementation.

Multiple options can be set or reset in a single statement:

```sql
ALTER TABLE my_table
SET ('option1' = 'value1', 'option2' = 'value2');

ALTER TABLE my_table
RESET ('option1', 'option2');
```

#### `ADD` / `DROP CONSTRAINT` {#docs:current:sql:statements:alter_table::add--drop-constraint}

> `ADD CONSTRAINT` and `DROP CONSTRAINT` clauses are not yet supported in DuckDB.

#### Limitations {#docs:current:sql:statements:alter_table::limitations}

`ALTER COLUMN` fails if values of conflicting types have occurred in the table at any point, even if they have been deleted:

```sql
CREATE TABLE tbl (col VARCHAR);

INSERT INTO tbl
VALUES ('asdf'), ('42');

DELETE FROM tbl
WHERE col = 'asdf';

ALTER TABLE tbl
ALTER COLUMN col TYPE INTEGER;
```

```console
Conversion Error:
Could not convert string 'asdf' to INT32
```

Currently, this is expected behavior.
As a workaround, you can create a copy of the table:

```sql
CREATE OR REPLACE TABLE tbl AS FROM tbl;
```

### ALTER VIEW Statement {#docs:current:sql:statements:alter_view}

The `ALTER VIEW` statement changes the schema of an existing view in the catalog.

#### Examples {#docs:current:sql:statements:alter_view::examples}

Rename a view:

```sql
ALTER VIEW view1 RENAME TO view2;
```

`ALTER VIEW` changes the schema of an existing view. All the changes made by `ALTER VIEW` fully respect the transactional semantics, i.e., they will not be visible to other transactions until committed, and can be fully reverted through a rollback. Note that other views that rely on the view are **not** automatically updated.

### ATTACH and DETACH Statements {#docs:current:sql:statements:attach}

DuckDB allows attaching to and detaching from database files.

#### Examples {#docs:current:sql:statements:attach::examples}

Attach the database `file.db` with the alias inferred from the name (` file`):

```sql
ATTACH 'file.db';
```

Attach the database `file.db` with an explicit alias (` file_db`):

```sql
ATTACH 'file.db' AS file_db;
```

Attach the database `file.db` in read only mode:

```sql
ATTACH 'file.db' (READ_ONLY);
```

Attach the database `file.db` with a block size of 16 kB:

```sql
ATTACH 'file.db' (BLOCK_SIZE 16_384);
```

Attach the database `file.db` with a row group size of 2048 rows:

```sql
ATTACH 'file.db' (ROW_GROUP_SIZE 2048);
```

Attach the database `file.db` with WAL writes disabled for improved performance:

```sql
ATTACH 'file.db' (RECOVERY_MODE no_wal_writes);
```

Attach a SQLite database for reading and writing (see the [`sqlite` extension](#docs:current:core_extensions:sqlite) for more information):

```sql
ATTACH 'sqlite_file.db' AS sqlite_db (TYPE sqlite);
```

Attach the database `file.db` if inferred database alias `file` does not yet exist:

```sql
ATTACH IF NOT EXISTS 'file.db';
```

Attach the database `file.db` if explicit database alias `file_db` does not yet exist:

```sql
ATTACH IF NOT EXISTS 'file.db' AS file_db;
```

Attach the database `file2.db` as alias `file_db` detaching and replacing the existing alias if it exists:

```sql
ATTACH OR REPLACE 'file2.db' AS file_db;
```

Create a table in the attached database with alias `file`:

```sql
CREATE TABLE file.new_table (i INTEGER);
```

Detach the database with alias `file`:

```sql
DETACH file;
```

Show a list of all attached databases:

```sql
SHOW DATABASES;
```

Change the default database that is used to the database `file`:

```sql
USE file;
```

#### `ATTACH` {#docs:current:sql:statements:attach::attach}

The `ATTACH` statement adds a new database file to the catalog that can be read from and written to.
Note that attachment definitions are not persisted between sessions: when a new session is launched, you have to re-attach to all databases.

##### `ATTACH` Syntax {#docs:current:sql:statements:attach::attach-syntax}



`ATTACH` allows DuckDB to operate on multiple database files, and allows for transfer of data between different database files.

`ATTACH` supports HTTP and S3 endpoints. For these, it creates a read-only connection by default.
Therefore, the following two commands are equivalent:

```sql
ATTACH 'https://blobs.duckdb.org/databases/stations.duckdb' AS stations_db;
ATTACH 'https://blobs.duckdb.org/databases/stations.duckdb' AS stations_db (READ_ONLY);
```

Similarly, the following two commands connecting to S3 are equivalent:

```sql
ATTACH 's3://⟨blobs-duckdb⟩/databases/stations.duckdb' AS stations_db;
ATTACH 's3://⟨blobs-duckdb⟩/databases/stations.duckdb' AS stations_db (READ_ONLY);
```

##### Explicit Storage Versions {#docs:current:sql:statements:attach::explicit-storage-versions}

[DuckDB v1.2.0 introduced the `STORAGE_VERSION` option](https://duckdb.org/2025/02/05/announcing-duckdb-120#explicit-storage-versions), which allows explicitly specifying the storage version.
Using this, you can opt-in to newer forwards-incompatible features:

```sql
ATTACH 'file.db' (STORAGE_VERSION 'v1.2.0');
```

This setting specifies the minimum DuckDB version that should be able to read the database file. When database files are written with this option, the resulting files cannot be opened by older DuckDB versions than the specified version. They can be read by the specified version and all newer versions of DuckDB.

For more details, see the [“Storage” page](#docs:current:internals:storage::explicit-storage-versions).

##### Database Encryption {#docs:current:sql:statements:attach::database-encryption}

DuckDB supports database encryption. By default, it uses [AES encryption](https://en.wikipedia.org/wiki/Advanced_Encryption_Standard) with a key length of 256 bits using the recommended [GCM](https://en.wikipedia.org/wiki/Galois/Counter_Mode) mode. The encryption covers the main database file, the write-ahead-log (WAL) file, and even temporary files. To attach to an encrypted database, use the `ATTACH` statement with an `ENCRYPTION_KEY`.

```sql
ATTACH 'encrypted.db' AS enc_db (ENCRYPTION_KEY 'quack_quack');
```

To encrypt data, DuckDB can use either the built-in `mbedtls` library or the OpenSSL library from the [`httpfs` extension](#docs:current:core_extensions:httpfs:overview). Note that the OpenSSL versions are much faster due to hardware acceleration, so make sure to load the `httpfs` for good encryption performance:

```sql
LOAD httpfs;
ATTACH 'encrypted.db' AS enc_db (ENCRYPTION_KEY 'quack_quack'); -- will be faster thanks to httpfs
```

To change the AES mode to [CBC](#<https:::en.wikipedia.org:wiki:Block_cipher_mode_of_operation::Cipher_block_chaining_(CBC)>) or [CTR](#<https:::en.wikipedia.org:wiki:Block_cipher_mode_of_operation::Counter_(CTR)>), use the `ENCRYPTION_CIPHER` option:

```sql
ATTACH 'encrypted.db' AS enc_db (ENCRYPTION_KEY 'quack_quack', ENCRYPTION_CIPHER 'CBC');
ATTACH 'encrypted.db' AS enc_db (ENCRYPTION_KEY 'quack_quack', ENCRYPTION_CIPHER 'CTR');
```

Database encryption implies using [storage version](#::explicit-storage-versions) 1.4.0 or later.

> DuckDB's encryption does not yet meet the official [NIST requirements](https://csrc.nist.gov/projects/cryptographic-standards-and-guidelines).
> Please follow issue [`#20162` “Store and verify tag for canary encryption”](https://github.com/duckdb/duckdb/issues/20162) to track our progress towards NIST-compliance.

##### Options {#docs:current:sql:statements:attach::options}

Zero or more copy options may be provided within parentheses following the `ATTACH` statement. Parameter values can be passed in with or without wrapping in single quotes. Arbitrary expressions may be used for parameter values.

| Name                | Description                                                                                                                 | Type      | Default value |
| ------------------- | --------------------------------------------------------------------------------------------------------------------------- | --------- | ------------- |
| `ACCESS_MODE`       | Access mode of the database (` AUTOMATIC`, `READ_ONLY`, or `READ_WRITE`).                                                    | `VARCHAR` | `automatic`   |
| `COMPRESS`          | Whether the database is compressed. Only applicable for in-memory databases.                                                | `VARCHAR` | `false`       |
| `TYPE`              | The file type (` DUCKDB` or `SQLITE`), or deduced from the input string literal (MySQL, PostgreSQL).                         | `VARCHAR` | `DUCKDB`      |
| `BLOCK_SIZE`        | The block size of a new database file. Must be a power of two and within [16384, 262144]. Cannot be set for existing files. | `UBIGINT` | `262144`      |
| `ROW_GROUP_SIZE`    | The row group size of a new database file.                                                                                  | `UBIGINT` | `122880`      |
| `STORAGE_VERSION`   | The version of the storage used.                                                                                            | `VARCHAR` | `v1.0.0`      |
| `ENCRYPTION_KEY`    | The encryption key used for encrypting the database.                                                                        | `VARCHAR` | -             |
| `ENCRYPTION_CIPHER` | The encryption cipher used for encrypting the database (` CBC`, `CTR` or `GCM`).                                             | `VARCHAR` | -             |
| `RECOVERY_MODE`     | Recovery mode for the database. `no_wal_writes` disables WAL writes, improving performance at the cost of crash recovery.   | `VARCHAR` | -             |

#### `DETACH` {#docs:current:sql:statements:attach::detach}

The `DETACH` statement allows previously attached database files to be closed and detached, releasing any locks held on the database file.

Note that it is not possible to detach from the default database: if you would like to do so, issue the [`USE` statement](#docs:current:sql:statements:use) to change the default database to another one. For example, if you are connected to a persistent database, you may change to an in-memory database by issuing:

```sql
ATTACH ':memory:' AS memory_db;
USE memory_db;
```

> **Warning.** Closing the connection, e.g., invoking the [`close()` function in Python](#docs:current:clients:python:dbapi::connection), does not release the locks held on the database files as the file handles are held by the main DuckDB instance (in Python's case, the `duckdb` module).

##### `DETACH` Syntax {#docs:current:sql:statements:attach::detach-syntax}



#### Name Qualification {#docs:current:sql:statements:attach::name-qualification}

The fully qualified name of catalog objects contains the _catalog_, the _schema_ and the _name_ of the object. For example:

Attach the database `new_db`:

```sql
ATTACH 'new_db.db';
```

Create the schema `my_schema` in the database `new_db`:

```sql
CREATE SCHEMA new_db.my_schema;
```

Create the table `my_table` in the schema `my_schema`:

```sql
CREATE TABLE new_db.my_schema.my_table (col INTEGER);
```

Refer to the column `col` inside the table `my_table`:

```sql
SELECT new_db.my_schema.my_table.col FROM new_db.my_schema.my_table;
```

Note that often the fully qualified name is not required. When a name is not fully qualified, the system looks for which entries to reference using the _catalog search path_. The default catalog search path includes the system catalog, the temporary catalog and the initially attached database together with the `main` schema.

Also note the rules on [identifiers and database names in particular](#docs:current:sql:dialect:keywords_and_identifiers::database-names).

##### Default Database and Schema {#docs:current:sql:statements:attach::default-database-and-schema}

When a table is created without any qualifications, the table is created in the default schema of the default database. The default database is the database that is launched when the system is created – and the default schema is `main`.

Create the table `my_table` in the default database:

```sql
CREATE TABLE my_table (col INTEGER);
```

##### Changing the Default Database and Schema {#docs:current:sql:statements:attach::changing-the-default-database-and-schema}

The default database and schema can be changed using the `USE` command.

Set the default database schema to `new_db.main`:

```sql
USE new_db;
```

Set the default database schema to `new_db.my_schema`:

```sql
USE new_db.my_schema;
```

##### Resolving Conflicts {#docs:current:sql:statements:attach::resolving-conflicts}

When providing only a single qualification, the system can interpret this as _either_ a catalog _or_ a schema, as long as there are no conflicts. For example:

```sql
ATTACH 'new_db.db';
CREATE SCHEMA my_schema;
```

Creates the table `new_db.main.tbl`:

```sql
CREATE TABLE new_db.tbl (i INTEGER);
```

Creates the table `default_db.my_schema.tbl`:

```sql
CREATE TABLE my_schema.tbl (i INTEGER);
```

If we create a conflict (i.e., we have both a schema and a catalog with the same name) the system requests that a fully qualified path is used instead:

```sql
CREATE SCHEMA new_db;
CREATE TABLE new_db.tbl (i INTEGER);
```

```console
Binder Error:
Ambiguous reference to catalog or schema "new_db" - use a fully qualified path like "memory.new_db"
```

##### Changing the Catalog Search Path {#docs:current:sql:statements:attach::changing-the-catalog-search-path}

The catalog search path can be adjusted by setting the `search_path` configuration option, which uses a comma-separated list of values that will be on the search path. The following example demonstrates searching in two databases:

```sql
ATTACH ':memory:' AS db1;
ATTACH ':memory:' AS db2;
CREATE table db1.tbl1 (i INTEGER);
CREATE table db2.tbl2 (j INTEGER);
```

Reference the tables using their fully qualified name:

```sql
SELECT * FROM db1.tbl1;
SELECT * FROM db2.tbl2;
```

Or set the search path and reference the tables using their name:

```sql
SET search_path = 'db1,db2';
SELECT * FROM tbl1;
SELECT * FROM tbl2;
```

#### Transactional Semantics {#docs:current:sql:statements:attach::transactional-semantics}

When running queries on multiple databases, the system opens separate transactions per database. The transactions are started _lazily_ by default – when a given database is referenced for the first time in a query, a transaction for that database will be started. `SET immediate_transaction_mode = true` can be toggled to change this behavior to eagerly start transactions in all attached databases instead.

While multiple transactions can be active at a time – the system only supports _writing_ to a single attached database in a single transaction. If you try to write to multiple attached databases in a single transaction the following error will be thrown:

```console
Attempting to write to database "db2" in a transaction that has already modified database "db1" -
a single transaction can only write to a single attached database.
```

The reason for this restriction is that the system does not maintain atomicity for transactions across attached databases. Transactions are only atomic _within_ each database file. By restricting the global transaction to write to only a single database file the atomicity guarantees are maintained.

### CALL Statement {#docs:current:sql:statements:call}

The `CALL` statement invokes the given [table function](#docs:current:sql:query_syntax:from::table-functions) and returns the results. 

> Thanks to the [`FROM`-first syntax](#docs:current:sql:query_syntax:from::from-first-syntax) and the fact that procedures in DuckDB are implemented as table functions, you can use `FROM` instead of `CALL`.

#### Examples {#docs:current:sql:statements:call::examples}

Invoke the 'duckdb_functions' table function:

```sql
CALL duckdb_functions();
```

Invoke the 'pragma_table_info' table function:

```sql
CALL pragma_table_info('pg_am');
```

Select only the functions where the name starts with `ST_`:

```sql
SELECT function_name, parameters, parameter_types, return_type
FROM duckdb_functions()
WHERE function_name LIKE 'ST_%';
```

#### Syntax {#docs:current:sql:statements:call::syntax}


### CHECKPOINT Statement {#docs:current:sql:statements:checkpoint}

The `CHECKPOINT` statement synchronizes data in the write-ahead log (WAL) to the database data file.

#### Examples {#docs:current:sql:statements:checkpoint::examples}

Synchronize data in the default database:

```sql
CHECKPOINT;
```

Synchronize data in the specified database:

```sql
CHECKPOINT file_db;
```

Synchronize data and prevent new transactions from starting:

```sql
FORCE CHECKPOINT;
```

> In earlier DuckDB versions, `FORCE CHECKPOINT` aborted any in-progress transactions.
> From v1.4, it waits until it can grab the checkpoint lock.

#### Checkpointing In-Memory Tables {#docs:current:sql:statements:checkpoint::checkpointing-in-memory-tables}

Starting with v1.4.0, in-memory tables support checkpointing. This has two key benefits:

* In-memory tables also support compression. This is disabled by default – you can turn it on using:

  ```sql
  ATTACH ':memory:' AS memory_compressed (COMPRESS);
  USE memory_compressed;
  ```

* Checkpointing triggers vacuuming deleted rows, allowing space to be reclaimed after deletes/truncation.

#### Syntax {#docs:current:sql:statements:checkpoint::syntax}



Checkpoint operations happen automatically based on the WAL size (see [Configuration](#docs:current:configuration:overview)). This
statement is for manual checkpoint actions.

#### Behavior {#docs:current:sql:statements:checkpoint::behavior}

The default `CHECKPOINT` command will fail if there are any running transactions. Including `FORCE` will abort any
transactions and execute the checkpoint operation.

Also see the related [`PRAGMA` option](#docs:current:configuration:pragmas::force-checkpoint) for further behavior modification.

##### Reclaiming Space {#docs:current:sql:statements:checkpoint::reclaiming-space}

When performing a checkpoint (automatic or otherwise), the space occupied by deleted rows is partially reclaimed. Note that this does not remove all deleted rows, but rather merges row groups that have a significant amount of deletes together. In the current implementation this requires ~25% of rows to be deleted in adjacent row groups.

When running in in-memory mode, checkpointing has no effect, hence it does not reclaim space after deletes in in-memory databases.

> **Warning.** The [`VACUUM` statement](#docs:current:sql:statements:vacuum) does _not_ trigger vacuuming deletes and hence does not reclaim space.

### COMMENT ON Statement {#docs:current:sql:statements:comment_on}

The `COMMENT ON` statement allows adding metadata to catalog entries (tables, columns, etc.).
It follows the [PostgreSQL syntax](https://www.postgresql.org/docs/16/sql-comment.html).

#### Examples {#docs:current:sql:statements:comment_on::examples}

Create a comment on a `TABLE`:

```sql
COMMENT ON TABLE test_table IS 'very nice table';
```

Create a comment on a `COLUMN`:

```sql
COMMENT ON COLUMN test_table.test_table_column IS 'very nice column';
```

Create a comment on a `VIEW`:

```sql
COMMENT ON VIEW test_view IS 'very nice view';
```

Create a comment on an `INDEX`:

```sql
COMMENT ON INDEX test_index IS 'very nice index';
```

Create a comment on a `SEQUENCE`:

```sql
COMMENT ON SEQUENCE test_sequence IS 'very nice sequence';
```

Create a comment on a `TYPE`:

```sql
COMMENT ON TYPE test_type IS 'very nice type';
```

Create a comment on a `MACRO`:

```sql
COMMENT ON MACRO test_macro IS 'very nice macro';
```

Create a comment on a `MACRO TABLE`:

```sql
COMMENT ON MACRO TABLE test_table_macro IS 'very nice table macro';
```

To unset a comment, set it to `NULL`, e.g.:

```sql
COMMENT ON TABLE test_table IS NULL;
```

#### Reading Comments {#docs:current:sql:statements:comment_on::reading-comments}

Comments can be read by querying the `comment` column of the respective [metadata functions](#docs:current:sql:meta:duckdb_table_functions):

List comments on `TABLE`s:

```sql
SELECT comment FROM duckdb_tables();
```

List comments on `COLUMN`s:

```sql
SELECT comment FROM duckdb_columns();
```

List comments on `VIEW`s:

```sql
SELECT comment FROM duckdb_views();
```

List comments on `INDEX`s:

```sql
SELECT comment FROM duckdb_indexes();
```

List comments on `SEQUENCE`s:

```sql
SELECT comment FROM duckdb_sequences();
```

List comments on `TYPE`s:

```sql
SELECT comment FROM duckdb_types();
```

List comments on `MACRO`s:

```sql
SELECT comment FROM duckdb_functions();
```

List comments on `MACRO TABLE`s:

```sql
SELECT comment FROM duckdb_functions();
```

#### Limitations {#docs:current:sql:statements:comment_on::limitations}

The `COMMENT ON` statement currently has the following limitations:

* It is not possible to comment on schemas or databases.
* It is not possible to comment on things that have a dependency (e.g., a table with an index).

#### Syntax {#docs:current:sql:statements:comment_on::syntax}


### COPY Statement {#docs:current:sql:statements:copy}

#### Examples {#docs:current:sql:statements:copy::examples}

Read a CSV file into the `lineitem` table, using auto-detected CSV options:

```sql
COPY lineitem FROM 'lineitem.csv';
```

Read a CSV file into the `lineitem` table, using manually specified CSV options:

```sql
COPY lineitem FROM 'lineitem.csv' (DELIMITER '|');
```

Read a Parquet file into the `lineitem` table:

```sql
COPY lineitem FROM 'lineitem.pq' (FORMAT parquet);
```

Read a JSON file into the `lineitem` table, using auto-detected options:

```sql
COPY lineitem FROM 'lineitem.json' (FORMAT json, AUTO_DETECT true);
```

Read a CSV file into the `lineitem` table, using double quotes:

```sql
COPY lineitem FROM "lineitem.csv";
```

Read a CSV file into the `lineitem` table, omitting quotes:

```sql
COPY lineitem FROM lineitem.csv;
```

Write a table to a CSV file:

```sql
COPY lineitem TO 'lineitem.csv' (FORMAT csv, DELIMITER '|', HEADER);
```

Write a table to a CSV file, using double quotes:

```sql
COPY lineitem TO "lineitem.csv";
```

Write a table to a CSV file, omitting quotes:

```sql
COPY lineitem TO lineitem.csv;
```

Write the result of a query to a Parquet file:

```sql
COPY (SELECT l_orderkey, l_partkey FROM lineitem) TO 'lineitem.parquet' (COMPRESSION zstd);
```

Copy the entire content of database `db1` to database `db2`:

```sql
COPY FROM DATABASE db1 TO db2;
```

Copy only the schema (catalog elements) but not any data:

```sql
COPY FROM DATABASE db1 TO db2 (SCHEMA);
```

#### Overview {#docs:current:sql:statements:copy::overview}

`COPY` moves data between DuckDB and external files. `COPY ... FROM` imports data into DuckDB from an external file. `COPY ... TO` writes data from DuckDB to an external file. The `COPY` command can be used for `CSV`, `PARQUET` and `JSON` files.

#### `COPY ... FROM` {#docs:current:sql:statements:copy::copy--from}

`COPY ... FROM` imports data from an external file into an existing table. The data is appended to whatever data is in the table already. The amount of columns inside the file must match the amount of columns in the table `tbl`, and the contents of the columns must be convertible to the column types of the table. In case this is not possible, an error will be thrown.

If a list of columns is specified, `COPY` will only copy the data in the specified columns from the file. If there are any columns in the table that are not in the column list, `COPY ... FROM` will insert the default values for those columns.

Copy the contents of a comma-separated file `test.csv` without a header into the table `test`:

```sql
COPY test FROM 'test.csv';
```

Copy the contents of a comma-separated file with a header into the `category` table:

```sql
COPY category FROM 'categories.csv' (HEADER);
```

Copy the contents of `lineitem.tbl` into the `lineitem` table, where the contents are delimited by a pipe character (` |`):

```sql
COPY lineitem FROM 'lineitem.tbl' (DELIMITER '|');
```

Copy the contents of `lineitem.tbl` into the `lineitem` table, where the delimiter, quote character, and presence of a header are automatically detected:

```sql
COPY lineitem FROM 'lineitem.tbl' (AUTO_DETECT true);
```

Read the contents of a comma-separated file `names.csv` into the `name` column of the `category` table. Any other columns of this table are filled with their default value:

```sql
COPY category(name) FROM 'names.csv';
```

Read the contents of a Parquet file `lineitem.parquet` into the `lineitem` table:

```sql
COPY lineitem FROM 'lineitem.parquet' (FORMAT parquet);
```

Read the contents of a newline-delimited JSON file `lineitem.ndjson` into the `lineitem` table:

```sql
COPY lineitem FROM 'lineitem.ndjson' (FORMAT json);
```

Read the contents of a JSON file `lineitem.json` into the `lineitem` table:

```sql
COPY lineitem FROM 'lineitem.json' (FORMAT json, ARRAY true);
```

An expression may be used as the source of a `COPY ... FROM` command if it is placed within parentheses. 

Read the contents of a file whose path is stored in a variable into the `lineitem` table:

```sql
SET VARIABLE source_file = 'lineitem.json';
COPY lineitem FROM (getvariable('source_file'));
```

Read the contents of a file provided as parameter of a prepared statement into the `lineitem` table:

```sql
PREPARE v1 AS COPY lineitem FROM ($1);
EXECUTE v1('lineitem.json');
```

##### Syntax {#docs:current:sql:statements:copy::syntax}



> To ensure compatibility with PostgreSQL, DuckDB accepts `COPY ... FROM` statements that do not fully comply with the railroad diagram shown here. For example, the following is a valid statement:
>
> ```sql
> COPY tbl FROM 'tbl.csv' WITH DELIMITER '|' CSV HEADER;
> ```

#### `COPY ... TO` {#docs:current:sql:statements:copy::copy--to}

`COPY ... TO` exports data from DuckDB to an external CSV, Parquet, JSON or BLOB file. It has mostly the same set of options as `COPY ... FROM`, however, in the case of `COPY ... TO` the options specify how the file should be written to disk. Any file created by `COPY ... TO` can be copied back into the database by using `COPY ... FROM` with a similar set of options.

The `COPY ... TO` function can be called specifying either a table name, or a query. When a table name is specified, the contents of the entire table will be written into the resulting file. When a query is specified, the query is executed and the result of the query is written to the resulting file.

Copy the contents of the `lineitem` table to a CSV file with a header:

```sql
COPY lineitem TO 'lineitem.csv';
```

Copy the contents of the `lineitem` table to the file `lineitem.tbl`, where the columns are delimited by a pipe character (` |`), including a header line:

```sql
COPY lineitem TO 'lineitem.tbl' (DELIMITER '|');
```

Use tab separators to create a TSV file without a header:

```sql
COPY lineitem TO 'lineitem.tsv' (DELIMITER '\t', HEADER false);
```

Copy the l_orderkey column of the `lineitem` table to the file `orderkey.tbl`:

```sql
COPY lineitem(l_orderkey) TO 'orderkey.tbl' (DELIMITER '|');
```

Copy the result of a query to the file `query.csv`, including a header with column names:

```sql
COPY (SELECT 42 AS a, 'hello' AS b) TO 'query.csv' (DELIMITER ',');
```

Copy the result of a query to the Parquet file `query.parquet`:

```sql
COPY (SELECT 42 AS a, 'hello' AS b) TO 'query.parquet' (FORMAT parquet);
```

Copy the result of a query to the newline-delimited JSON file `query.ndjson`:

```sql
COPY (SELECT 42 AS a, 'hello' AS b) TO 'query.ndjson' (FORMAT json);
```

Copy the result of a query to the JSON file `query.json`:

```sql
COPY (SELECT 42 AS a, 'hello' AS b) TO 'query.json' (FORMAT json, ARRAY true);
```

Return the files and their column statistics that were written as part of the `COPY` statement:

```sql
COPY (SELECT l_orderkey, l_comment FROM lineitem) TO 'lineitem_part.parquet' (RETURN_STATS);
```

|       filename        | count  | file_size_bytes | footer_size_bytes |                                                                                   column_statistics                                                                                    | partition_keys |
|-----------------------|-------:|----------------:|------------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|
| lineitem_part.parquet | 600572 | 8579141         | 1445              | {'"l_comment"'={column_size_bytes=7642227, max=zzle. slyly, min=' Tiresias above the blit', null_count=0}, '"l_orderkey"'={column_size_bytes=935457, max=600000, min=1, null_count=0}} | NULL           |

Note: for nested columns (e.g., structs) the column statistics are defined for each part. For example, if we have a column `name STRUCT(field1 INTEGER, field2 INTEGER)` the column statistics will have stats for `name.field1` and `name.field2`.

An expression may be used as the target of a `COPY ... TO` command if it is placed within parentheses. 

Copy the result of a query to a file whose path is stored in a variable:

```sql
SET VARIABLE target_file = 'target_file.parquet';
COPY (SELECT 'hello world') TO (getvariable('target_file'));
```

Copy to a file provided as parameter of a prepared statement:

```sql
PREPARE v1 AS COPY (SELECT 42 AS i) to $1;
EXECUTE v1('file.csv');
```

Expressions may be used for options as well. Copy to a file using a format stored in a variable:

```sql
SET VARIABLE my_format = 'parquet';
COPY (SELECT 42 AS i) TO 'file' (FORMAT getvariable('my_format'));
```

##### `COPY ... TO` Options {#docs:current:sql:statements:copy::copy--to-options}

Zero or more copy options may be provided as a part of the copy operation. The `WITH` specifier is optional, but if any options are specified, the parentheses are required. Parameter values can be passed in with or without wrapping in single quotes. Arbitrary expressions may be used for parameter values.

Any option that is a Boolean can be enabled or disabled in multiple ways. You can write `true`, `ON`, or `1` to enable the option, and `false`, `OFF`, or `0` to disable it. The `BOOLEAN` value can also be omitted, e.g., by only passing `(HEADER)`, in which case `true` is assumed.

With few exceptions, the below options are applicable to all formats written with `COPY`.

| Name | Description | Type | Default |
|:--|:-----|:-|:-|
| `FORMAT` | Specifies the copy function to use. The default is selected from the file extension (e.g., `.parquet` results in a Parquet file being written/read). If the file extension is unknown `CSV` is selected. Vanilla DuckDB provides `CSV`, `PARQUET` and `JSON` but additional copy functions can be added by [`extensions`](#docs:current:extensions:overview). | `VARCHAR` | `auto` |
| `USE_TMP_FILE` | Whether or not to write to a temporary file first if the original file exists (` target.csv.tmp`). This prevents overwriting an existing file with a broken file in case the writing is cancelled. | `BOOL` | `auto` |
| `OVERWRITE_OR_IGNORE` | Whether or not to allow overwriting files if they already exist. Only has an effect when used with `PARTITION_BY`. | `BOOL` | `false` |
| `OVERWRITE` | When `true`, all existing files inside targeted directories will be removed (not supported on remote filesystems). Only has an effect when used with `PARTITION_BY`. | `BOOL` | `false` |
| `APPEND` | When `true`, in the event a filename pattern is generated that already exists, the path will be regenerated to ensure no existing files are overwritten. Only has an effect when used with `PARTITION_BY`. | `BOOL` | `false` |
| `FILENAME_PATTERN` | Set a pattern to use for the filename, can optionally contain `{uuid}` / `{uuidv4}` or `{uuidv7}` to be filled in with a generated [UUID](#docs:current:sql:data_types:numeric::universally-unique-identifiers-uuids) (v4 or v7, respectively), and `{i}`, which is replaced by an incrementing index. Only has an effect when used with `PARTITION_BY`. | `VARCHAR` | `auto` |
| `FILE_EXTENSION` | Set the file extension that should be assigned to the generated file(s). | `VARCHAR` | `auto` |
| `PER_THREAD_OUTPUT` | When `true`, the `COPY` command generates one file per thread, rather than one file in total. This allows for faster parallel writing. | `BOOL` | `false` |
| `FILE_SIZE_BYTES` | If this parameter is set, the `COPY` process creates a directory which will contain the exported files. If a file exceeds the set limit (specified as bytes such as `1000` or in human-readable format such as `1k`), the process creates a new file in the directory. This parameter works in combination with `PER_THREAD_OUTPUT`. Note that the size is used as an approximation, and files can be occasionally slightly over the limit. | `VARCHAR` or `BIGINT` | (empty) |
| `PARTITION_BY` | The columns to partition by using a Hive partitioning scheme, see the [partitioned writes section](#docs:current:data:partitioning:partitioned_writes). | `VARCHAR[]` | (empty) |
| `PRESERVE_ORDER` | Whether or not to [preserve order](#docs:current:sql:dialect:order_preservation) during the copy operation. Defaults to the value of the `preserve_insertion_order` [configuration option](#docs:current:configuration:overview). | `BOOL`| (*) |
| `RETURN_FILES` | Whether or not to include the created filepath(s) (as a `files VARCHAR[]` column) in the query result. | `BOOL` | `false` |
| `RETURN_STATS` | Whether or not to return the files and their column statistics that were written as part of the `COPY` statement. | `BOOL`| `false` |
| `WRITE_PARTITION_COLUMNS` | Whether or not to write partition columns into files. Only has an effect when used with `PARTITION_BY`. | `BOOL` | `false` |

##### Syntax {#docs:current:sql:statements:copy::syntax}



> To ensure compatibility with PostgreSQL, DuckDB accepts `COPY ... TO` statements that do not fully comply with the railroad diagram shown here. For example, the following is a valid statement:
>
> ```sql
> COPY (SELECT 42 AS x, 84 AS y) TO 'out.csv' WITH DELIMITER '|' CSV HEADER;
> ```

#### `COPY FROM DATABASE ... TO` {#docs:current:sql:statements:copy::copy-from-database--to}

The `COPY FROM DATABASE ... TO` statement copies the entire content from one attached database to another attached database. This includes the schema, including constraints, indexes, sequences, macros and the data itself.

```sql
ATTACH 'db1.db' AS db1;
CREATE TABLE db1.tbl AS SELECT 42 AS x, 3 AS y;
CREATE MACRO db1.two_x_plus_y(x, y) AS 2 * x + y;

ATTACH 'db2.db' AS db2;
COPY FROM DATABASE db1 TO db2;
SELECT db2.two_x_plus_y(x, y) AS z FROM db2.tbl;
```

| z  |
|---:|
| 87 |

To only copy the **schema** of `db1` to `db2` but omit copying the data, add `SCHEMA` to the statement:

```sql
COPY FROM DATABASE db1 TO db2 (SCHEMA);
```

##### Syntax {#docs:current:sql:statements:copy::syntax}



#### Format-Specific Options {#docs:current:sql:statements:copy::format-specific-options}

##### CSV Options {#docs:current:sql:statements:copy::csv-options}

The below options are applicable when writing CSV files.

| Name | Description | Type | Default |
|:--|:-----|:-|:-|
| `COMPRESSION` | The compression type for the file. By default this will be detected automatically from the file extension (e.g., `file.csv.gz` will use `gzip`, `file.csv.zst` will use `zstd`, and `file.csv` will use `none`). Options are `none`, `gzip`, `zstd`. | `VARCHAR` | `auto` |
| `DATEFORMAT` | Specifies the date format to use when writing dates. See [Date Format](#docs:current:sql:functions:dateformat). | `VARCHAR` | (empty) |
| `DELIM` or `SEP` | The character that is written to separate columns within each row. | `VARCHAR` | `,` |
| `ESCAPE` | The character that should appear before a character that matches the `quote` value. | `VARCHAR` | `"` |
| `FORCE_QUOTE` | The list of columns to always add quotes to, even if not required. | `VARCHAR[]` | `[]` |
| `HEADER` | Whether or not to write a header for the CSV file. | `BOOL` | `true` |
| `NULLSTR` | The string that is written to represent a `NULL` value. | `VARCHAR` | (empty) |
| `PREFIX` | Prefixes the CSV file with a specified string. This option must be used in conjunction with `SUFFIX` and requires `HEADER` to be set to `false`.| `VARCHAR` | (empty) |
| `SUFFIX` | Appends a specified string as a suffix to the CSV file. This option must be used in conjunction with `PREFIX` and requires `HEADER` to be set to `false`.| `VARCHAR` | (empty) |
| `QUOTE` | The quoting character to be used when a data value is quoted. | `VARCHAR` | `"` |
| `TIMESTAMPFORMAT` | Specifies the date format to use when writing timestamps. See [Date Format](#docs:current:sql:functions:dateformat). | `VARCHAR` | (empty) |

##### Parquet Options {#docs:current:sql:statements:copy::parquet-options}

The below options are applicable when writing Parquet files.

| Name | Description | Type | Default |
|:--|:-----|:-|:-|
| `COMPRESSION` | The compression format to use (` uncompressed`, `snappy`, `gzip`, `zstd`, `brotli`, `lz4`, `lz4_raw`). | `VARCHAR` | `snappy` |
| `COMPRESSION_LEVEL` | Compression level, set between 1 (lowest compression, fastest) and 22 (highest compression, slowest). Only supported for zstd compression. | `BIGINT` | `3` |
| `FIELD_IDS` | The `field_id` for each column. Pass `auto` to attempt to infer automatically. | `STRUCT` | (empty) |
| `ROW_GROUP_SIZE_BYTES` | The target size of each row group. You can pass either a human-readable string, e.g., `2MB`, or an integer, i.e., the number of bytes. This option is only used when you have issued `SET preserve_insertion_order = false;`, otherwise, it is ignored. | `BIGINT` | `row_group_size * 1024` |
| `ROW_GROUP_SIZE` | The target size, i.e., number of rows, of each row group. | `BIGINT` | 122880 |
| `ROW_GROUPS_PER_FILE` | Create a new Parquet file if the current one has a specified number of row groups. If multiple threads are active, the number of row groups in a file may slightly exceed the specified number of row groups to limit the amount of locking – similarly to the behavior of `FILE_SIZE_BYTES`. However, if `per_thread_output` is set, only one thread writes to each file, and it becomes accurate again. | `BIGINT` |  (empty) |
| `PARQUET_VERSION` | The Parquet version to use (` V1`, `V2`). | `VARCHAR` | `V1` |

Some examples of `FIELD_IDS` are as follows.

Assign `field_ids` automatically:

```sql
COPY
    (SELECT 128 AS i)
    TO 'my.parquet'
    (FIELD_IDS 'auto');
```

Sets the `field_id` of column `i` to 42:

```sql
COPY
    (SELECT 128 AS i)
    TO 'my.parquet'
    (FIELD_IDS {i: 42});
```

Sets the `field_id` of column `i` to 42, and column `j` to 43:

```sql
COPY
    (SELECT 128 AS i, 256 AS j)
    TO 'my.parquet'
    (FIELD_IDS {i: 42, j: 43});
```

Sets the `field_id` of column `my_struct` to 42, and column `i` (nested inside `my_struct`) to 43:

```sql
COPY
    (SELECT {i: 128} AS my_struct)
    TO 'my.parquet'
    (FIELD_IDS {my_struct: {__duckdb_field_id: 42, i: 43}});
```

Sets the `field_id` of column `my_list` to 42, and column `element` (default name of list child) to 43:

```sql
COPY
    (SELECT [128, 256] AS my_list)
    TO 'my.parquet'
    (FIELD_IDS {my_list: {__duckdb_field_id: 42, element: 43}});
```

Sets the `field_id` of column `my_map` to 42, and columns `key` and `value` (default names of map children) to 43 and 44:

```sql
COPY
    (SELECT MAP {'key1' : 128, 'key2': 256} my_map)
    TO 'my.parquet'
    (FIELD_IDS {my_map: {__duckdb_field_id: 42, key: 43, value: 44}});
```

##### JSON Options {#docs:current:sql:statements:copy::json-options}

The below options are applicable when writing `JSON` files.

| Name | Description | Type | Default |
|:--|:-----|:-|:-|
| `ARRAY` | Whether to write a JSON array. If `true`, a JSON array of records is written, if `false`, newline-delimited JSON is written | `BOOL` | `false` |
| `COMPRESSION` | The compression type for the file. By default this will be detected automatically from the file extension (e.g., `file.json.gz` will use `gzip`, `file.json.zst` will use `zstd`, and `file.json` will use `none`). Options are `none`, `gzip`, `zstd`. | `VARCHAR` | `auto` |
| `DATEFORMAT` | Specifies the date format to use when writing dates. See [Date Format](#docs:current:sql:functions:dateformat). | `VARCHAR` | (empty) |
| `TIMESTAMPFORMAT` | Specifies the date format to use when writing timestamps. See [Date Format](#docs:current:sql:functions:dateformat). | `VARCHAR` | (empty) |


Sets the value of column `hello` to `QUACK!` and outputs the results to `quack.json`:

```sql
COPY (SELECT 'QUACK!' AS hello) TO 'quack.json';
--RETURNS: {"hello":"QUACK!"}
```

Sets the value of column `num_list` to `[1,2,3]` and outputs the results to `numbers.json`:

```sql
COPY (SELECT [1, 2, 3] AS num_list) TO 'numbers.json';
--RETURNS: {"num_list":[1,2,3]}
```

Sets the value of column `compression_type` to `gzip_explicit` and outputs the results to `compression.json.gz` with explicit compression:

```sql
COPY (SELECT 'gzip_explicit' AS compression_type) TO 'explicit_compression.json' (FORMAT json, COMPRESSION 'GZIP');
-- RETURNS: {"compression_type":"gzip_explicit"}
```

Sets all values of single rows to be returned as nested arrays to `array_true.json`:

```sql
COPY (SELECT 1 AS id, 'Alice' AS name, [1, 2, 3] AS numbers
      UNION ALL
      SELECT 2, 'Bob', [4, 5, 6] AS numbers)
TO 'array_true.json' (FORMAT json, ARRAY true);

-- RETURNS: 
/*
[
	{"id":1,"name":"Alice","numbers":[1,2,3]},
	{"id":2,"name":"Bob","numbers":[1,2,3]}
]
*/
```

Sets all values of single rows to be returned as non-nested arrays to `array_false.json`:

```sql
COPY (SELECT 1 AS id, 'Alice' AS name, [1, 2, 3] AS numbers
      UNION ALL
      SELECT 2, 'Bob', [4, 5, 6] AS numbers)
TO 'array_false.json' (FORMAT json, ARRAY false);

-- RETURNS:
/*
{"id":1,"name":"Alice","numbers":[1,2,3]}
{"id":2,"name":"Bob","numbers":[4,5,6]}
*/
```

##### BLOB Options {#docs:current:sql:statements:copy::blob-options}

The `BLOB` format option allows you to select a single column of a DuckDB table into a `.blob` file.
The column must be cast to the `BLOB` data type. For details on typecasting, see the 
[Casting Operations Matrix](#docs:current:sql:data_types:typecasting::Casting-Operations-Matrix).

The below options are applicable when writing `BLOB` files.

| Name | Description | Type | Default |
|:--|:-----|:-|:-|
| `COMPRESSION` | The compression type for the file. By default this will be detected automatically from the file extension (e.g., `file.blob.gz` will use `gzip`, `file.blob.zst` will use `zstd`, and `file.blob` will use `none`). Options are `none`, `gzip`, `zstd`. | `VARCHAR` | `auto` |

Type casts the string value `foo` to the `BLOB` data type and outputs the results to `blob_output.blob`:

```sql
COPY (select 'foo'::BLOB) TO 'blob_output.blob' (FORMAT BLOB);
```

Type casts the string value `foo` to the `BLOB` data type and outputs the results to `blob_output_gzip.blob.gz` with `gzip` compression:

```sql
COPY (select 'foo'::BLOB) TO 'blob_output_gzip.blob' (FORMAT BLOB, COMPRESSION 'GZIP');
```

#### Limitations {#docs:current:sql:statements:copy::limitations}

`COPY` does not support copying between tables. To copy between tables, use an [`INSERT statement`](#docs:current:sql:statements:insert):

```sql
INSERT INTO tbl2
    FROM tbl1;
```

### CREATE MACRO Statement {#docs:current:sql:statements:create_macro}

The `CREATE MACRO` statement can create a scalar or table macro (function) in the catalog.

For a scalar macro, `CREATE MACRO` is followed by the name of the macro, and optionally parameters within a set of parentheses. The keyword `AS` is next, followed by the text of the macro. By design, a scalar macro may only return a single value.
For a table macro, the syntax is similar to a scalar macro except `AS` is replaced with `AS TABLE`. A table macro may return a table of arbitrary size and shape.

> If a `MACRO` is temporary, it is only usable within the same database connection and is deleted when the connection is closed.

#### Examples {#docs:current:sql:statements:create_macro::examples}

##### Scalar Macros {#docs:current:sql:statements:create_macro::scalar-macros}

Create a macro that adds two expressions (` a` and `b`):

```sql
CREATE MACRO add(a, b) AS a + b;
```

Create a macro, replacing possible existing definitions:

```sql
CREATE OR REPLACE MACRO add(a, b) AS a + b;
```

Create a macro if it does not already exist, else do nothing:

```sql
CREATE MACRO IF NOT EXISTS add(a, b) AS a + b;
```

Create a macro for a `CASE` expression:

```sql
CREATE MACRO ifelse(a, b, c) AS CASE WHEN a THEN b ELSE c END;
```

Create a macro that does a subquery:

```sql
CREATE MACRO one() AS (SELECT 1);
```

Macros are schema-dependent, and have an alias, `FUNCTION`:

```sql
CREATE FUNCTION main.my_avg(x) AS sum(x) / count(x);
```

Create a macro with a default parameter:

```sql
CREATE MACRO add_default(a, b := 5) AS a + b;
```

Create a macro `arr_append` (with a functionality equivalent to `array_append`):

```sql
CREATE MACRO arr_append(l, e) AS list_concat(l, list_value(e));
```

Create a macro with a typed parameter:

```sql
CREATE MACRO is_maximal(a INTEGER) AS a = 2^31 - 1;
```

##### Table Macros {#docs:current:sql:statements:create_macro::table-macros}

Create a table macro without parameters:

```sql
CREATE MACRO static_table() AS TABLE
    SELECT 'Hello' AS column1, 'World' AS column2;
```

Create a table macro with parameters (that can be of any type):

```sql
CREATE MACRO dynamic_table(col1_value, col2_value) AS TABLE
    SELECT col1_value AS column1, col2_value AS column2;
```

Create a table macro that returns multiple rows. It will be replaced if it already exists, and it is temporary (will be automatically deleted when the connection ends):

```sql
CREATE OR REPLACE TEMP MACRO dynamic_table(col1_value, col2_value) AS TABLE
    SELECT col1_value AS column1, col2_value AS column2
    UNION ALL
    SELECT 'Hello' AS col1_value, 456 AS col2_value;
```

Pass an argument as a list:

```sql
CREATE MACRO get_users(i) AS TABLE
    SELECT * FROM users WHERE uid IN (SELECT unnest(i));
```

An example for how to use the `get_users` table macro is the following:

```sql
CREATE TABLE users AS
    SELECT *
    FROM (VALUES (1, 'Ada'), (2, 'Bob'), (3, 'Carl'), (4, 'Dan'), (5, 'Eve')) t(uid, name);
SELECT * FROM get_users([1, 5]);
```

To define macros on arbitrary tables, use the [`query_table` function](#docs:current:guides:sql_features:query_and_query_table_functions). For example, the following macro computes a column-wise checksum on a table:

```sql
CREATE MACRO checksum(tbl) AS TABLE
    SELECT bit_xor(md5_number(COLUMNS(*)::VARCHAR))
    FROM query_table(tbl);

CREATE TABLE tbl AS SELECT unnest([42, 43]) AS x, 100 AS y;
SELECT * FROM checksum('tbl');
```

#### Overloading {#docs:current:sql:statements:create_macro::overloading}

It is possible to overload a macro based on the types or the number of its parameters; this works for both scalar and table macros.

By providing overloads we can have both `add_x(a, b)` and `add_x(a, b, c)` with different function bodies.

```sql
CREATE MACRO add_x
    (a, b) AS a + b,
    (a, b, c) AS a + b + c;
```

```sql
SELECT
    add_x(21, 42) AS two_args,
    add_x(21, 42, 21) AS three_args;
```

| two_args | three_args |
|----------|------------|
|    63    |     84     |


```sql
CREATE OR REPLACE MACRO is_maximal
    (a TINYINT) AS a = 2^7 - 1,
    (a INT) AS a = 2^31 - 1;
```

```sql
SELECT
    is_maximal(127::TINYINT) AS tiny,
    is_maximal(127) AS regular;
```

|   tiny   |  regular   |
|----------|------------|
|   true   |    false   |


#### Syntax {#docs:current:sql:statements:create_macro::syntax}



Macros allow you to create shortcuts for combinations of expressions.

```sql
CREATE MACRO add(a) AS a + b;
```

```console
Binder Error:
Referenced column "b" not found in FROM clause!
```

This works:

```sql
CREATE MACRO add(a, b) AS a + b;
```

Usage example:

```sql
SELECT add(1, 2) AS x;
```

| x |
|--:|
| 3 |

However, this fails:

```sql
SELECT add('hello', 3);
```

```console
Binder Error:
Could not choose a best candidate function for the function call "add(STRING_LITERAL, INTEGER_LITERAL)". In order to select one, please add explicit type casts.
	Candidate functions:
	add(DATE, INTEGER) -> DATE
	add(INTEGER, INTEGER) -> INTEGER
```

Macros can have default parameters.

`b` is a default parameter:

```sql
CREATE MACRO add_default(a, b := 5) AS a + b;
```

The following will result in 42:

```sql
SELECT add_default(37);
```

The order of named parameters does not matter:

```sql
CREATE MACRO triple_add(a, b := 5, c := 10) AS a + b + c;
```

```sql
SELECT triple_add(40, c := 1, b := 1) AS x;
```

| x  |
|---:|
| 42 |

When macros are used, they are expanded (i.e., replaced with the original expression), and the parameters within the expanded expression are replaced with the supplied arguments. Step by step:

The `add` macro we defined above is used in a query:

```sql
SELECT add(40, 2) AS x;
```

Internally, `add` is replaced with its definition of `a + b`:

```sql
SELECT a + b AS x;
```

Then, the parameters are replaced by the supplied arguments:

```sql
SELECT 40 + 2 AS x;
```

#### Limitations {#docs:current:sql:statements:create_macro::limitations}

##### Using Subquery Macros {#docs:current:sql:statements:create_macro::using-subquery-macros}

Table macros as well as scalar macros defined using scalar subqueries cannot be used in the arguments of table functions. DuckDB will return the following error:

```console
Binder Error:
Table function cannot contain subqueries
```

##### Overloads {#docs:current:sql:statements:create_macro::overloads}

Overloads for macro functions have to be set at creation, it is not possible to define a macro by the same name twice without first removing the first definition.

##### Recursive Functions {#docs:current:sql:statements:create_macro::recursive-functions}

Defining recursive functions is not supported.
For example, the following macro – supposed to compute the *n*th number of the Fibonacci sequence – fails:

```sql
CREATE OR REPLACE FUNCTION fibo(n) AS (SELECT 1);
CREATE OR REPLACE FUNCTION fibo(n) AS (
    CASE
        WHEN n <= 1 THEN 1
        ELSE fibo(n - 1)
    END
);
SELECT fibo(3);
```

```console
Binder Error:
Max expression depth limit of 1000 exceeded. Use "SET max_expression_depth TO x" to increase the maximum expression depth.
```

##### Function Chaining on the First Function Does Not Work {#docs:current:sql:statements:create_macro::function-chaining-on-the-first-function-does-not-work}

Macros do not support the dot operator for function chaining on the first function.
To illustrate this, see an example with the `lower` function, which works:

```sql
CREATE OR REPLACE MACRO low(s) AS lower(s);
SELECT low('AA');
```

However, rewriting `lower(s)` to use function chaining does not work:

```sql
CREATE OR REPLACE MACRO low(s) AS s.lower();
SELECT low('AA');
```

```console
Binder Error:
Referenced column "s" not found in FROM clause!
```

##### Viewing the List of Macros and Table Macros {#docs:current:sql:statements:create_macro::viewing-the-list-of-macros-and-table-macros}

You can use the following query to display the list of macros and table macros:

```sql
SELECT schema_name, function_name, function_type, parameters
FROM duckdb_function();
```

### CREATE SCHEMA Statement {#docs:current:sql:statements:create_schema}

The `CREATE SCHEMA` statement creates a schema in the catalog. The default schema is `main`.

#### Examples {#docs:current:sql:statements:create_schema::examples}

Create a schema:

```sql
CREATE SCHEMA s1;
```

Create a schema if it does not exist yet:

```sql
CREATE SCHEMA IF NOT EXISTS s2;
```

Create a schema or replace a schema if it exists:

```sql
CREATE OR REPLACE SCHEMA s2;
```

Create table in the schemas:

```sql
CREATE TABLE s1.t (id INTEGER PRIMARY KEY, other_id INTEGER);
CREATE TABLE s2.t (id INTEGER PRIMARY KEY, j VARCHAR);
```

Compute a join between tables from two schemas:

```sql
SELECT *
FROM s1.t s1t, s2.t s2t
WHERE s1t.other_id = s2t.id;
```

#### Syntax {#docs:current:sql:statements:create_schema::syntax}


### CREATE SECRET Statement {#docs:current:sql:statements:create_secret}

The `CREATE SECRET` statement creates a new secret in the [Secrets Manager](#docs:current:configuration:secrets_manager).

#### Syntax for `CREATE SECRET` {#docs:current:sql:statements:create_secret::syntax-for-create-secret}



> **Warning.** When using the [command line client](#docs:current:clients:cli:overview), the `CREATE SECRET` statements are stored in your DuckDB history as plain text.

#### Syntax for `DROP SECRET` {#docs:current:sql:statements:create_secret::syntax-for-drop-secret}


### CREATE SEQUENCE Statement {#docs:current:sql:statements:create_sequence}

The `CREATE SEQUENCE` statement creates a new sequence number generator.

#### Examples {#docs:current:sql:statements:create_sequence::examples}

Generate an ascending sequence starting from 1:

```sql
CREATE SEQUENCE serial;
```

Generate sequence from a given start number:

```sql
CREATE SEQUENCE serial START 101;
```

Generate odd numbers using `INCREMENT BY`:

```sql
CREATE SEQUENCE serial START WITH 1 INCREMENT BY 2;
```

Generate a descending sequence starting from 99:

```sql
CREATE SEQUENCE serial START WITH 99 INCREMENT BY -1 MAXVALUE 99;
```

By default, cycles are not allowed and will result in error, e.g.:

```console
Sequence Error:
nextval: reached maximum value of sequence "serial" (10)
```

```sql
CREATE SEQUENCE serial START WITH 1 MAXVALUE 10;
```

`CYCLE` allows cycling through the same sequence repeatedly:

```sql
CREATE SEQUENCE serial START WITH 1 MAXVALUE 10 CYCLE;
```

##### Creating and Dropping Sequences {#docs:current:sql:statements:create_sequence::creating-and-dropping-sequences}

Sequences can be created and dropped similarly to other catalog items.

Overwrite an existing sequence:

```sql
CREATE OR REPLACE SEQUENCE serial;
```

Only create sequence if no such sequence exists yet:

```sql
CREATE SEQUENCE IF NOT EXISTS serial;
```

Remove sequence:

```sql
DROP SEQUENCE serial;
```

Remove sequence if exists:

```sql
DROP SEQUENCE IF EXISTS serial;
```

##### Using Sequences for Primary Keys {#docs:current:sql:statements:create_sequence::using-sequences-for-primary-keys}

Sequences can be used as `DEFAULT` values in [`CREATE TABLE` statements](#docs:current:sql:statements:create_table). 

The example below uses a sequence to create an integer [primary key](#docs:current:sql:constraints::primary-key-and-unique-constraint):

```sql
CREATE SEQUENCE id_sequence START 1;
CREATE TABLE tbl (id INTEGER PRIMARY KEY DEFAULT nextval('id_sequence'), s VARCHAR);
INSERT INTO tbl (s) VALUES ('hello'), ('world');
SELECT * FROM tbl;
```

The script results in the following table:

| id |   s   |
|---:|-------|
| 1  | hello |
| 2  | world |

Sequences can also be added using the [`ALTER TABLE` statement](#docs:current:sql:statements:alter_table). The following example adds an `id` column and fills it with values generated by the sequence:

```sql
CREATE TABLE tbl (s VARCHAR);
INSERT INTO tbl VALUES ('hello'), ('world');
CREATE SEQUENCE id_sequence START 1;
ALTER TABLE tbl ADD COLUMN id INTEGER DEFAULT nextval('id_sequence');
SELECT * FROM tbl;
```

This script results in the same table as the previous example.

##### Selecting the Next Value {#docs:current:sql:statements:create_sequence::selecting-the-next-value}

To select the next number from a sequence, use `nextval`:

```sql
CREATE SEQUENCE serial START 1;
SELECT nextval('serial') AS nextval;
```

| nextval |
|--------:|
| 1       |

Using this sequence in an `INSERT` command:

```sql
INSERT INTO distributors VALUES (nextval('serial'), 'nothing');
```

##### Selecting the Current Value {#docs:current:sql:statements:create_sequence::selecting-the-current-value}

You may also view the current number from the sequence. Note that the `nextval` function must have already been called before calling `currval`, otherwise a Serialization Error (` sequence is not yet defined in this session`) will be thrown.

```sql
CREATE SEQUENCE serial START 1;
SELECT nextval('serial') AS nextval;
SELECT currval('serial') AS currval;
```

| currval |
|--------:|
| 1       |

#### Syntax {#docs:current:sql:statements:create_sequence::syntax}



`CREATE SEQUENCE` creates a new sequence number generator.

If a schema name is given then the sequence is created in the specified schema. Otherwise it is created in the current schema. Temporary sequences exist in a special schema, so a schema name may not be given when creating a temporary sequence. The sequence name must be distinct from the name of any other sequence in the same schema.

After a sequence is created, you use the function `nextval` to operate on the sequence.

#### Parameters {#docs:current:sql:statements:create_sequence::parameters}

| Name | Description |
|:--|:-----|
| `CYCLE` or `NO CYCLE` | The `CYCLE` option allows the sequence to wrap around when the `maxvalue` or `minvalue` has been reached by an ascending or descending sequence respectively. If the limit is reached, the next number generated will be the `minvalue` or `maxvalue`, respectively. If `NO CYCLE` is specified, any calls to `nextval` after the sequence has reached its maximum value will return an error. If neither `CYCLE` nor `NO CYCLE` are specified, `NO CYCLE` is the default. |
| `increment` | The optional clause `INCREMENT BY increment` specifies which value is added to the current sequence value to create a new value. A positive value will make an ascending sequence, a negative one a descending sequence. The default value is 1. |
| `maxvalue` | The optional clause `MAXVALUE maxvalue` determines the maximum value for the sequence. If this clause is not supplied or `NO MAXVALUE` is specified, then default values will be used. The defaults are 2^63 - 1 and -1 for ascending and descending sequences, respectively. |
| `minvalue` | The optional clause `MINVALUE minvalue` determines the minimum value a sequence can generate. If this clause is not supplied or `NO MINVALUE` is specified, then defaults will be used. The defaults are 1 and -(2^63 - 1) for ascending and descending sequences, respectively. |
| `name` | The name (optionally schema-qualified) of the sequence to be created. |
| `start` | The optional clause `START WITH start` allows the sequence to begin anywhere. The default starting value is `minvalue` for ascending sequences and `maxvalue` for descending ones. |
| `TEMPORARY` or `TEMP` | If specified, the sequence object is created only for this session, and is automatically dropped on session exit. Existing permanent sequences with the same name are not visible (in this session) while the temporary sequence exists, unless they are referenced with schema-qualified names. |

> Sequences are based on `BIGINT` arithmetic, so the range cannot exceed the range of an eight-byte integer (-9223372036854775808 to 9223372036854775807).

#### Limitations {#docs:current:sql:statements:create_sequence::limitations}

Due to limitations in DuckDB's dependency manager, `DROP SEQUENCE` will fail in some corner cases.

```sql
CREATE SEQUENCE id_sequence START 1;

CREATE TABLE tbl (
    id INTEGER DEFAULT nextval('id_sequence'),
    s VARCHAR
);
INSERT INTO tbl(s) VALUES ('default is the next value from id_sequence');

ALTER TABLE tbl ALTER COLUMN id SET DEFAULT NULL;
INSERT INTO tbl(s) VALUES ('default is NULL');

SELECT * FROM tbl;
```

```text
┌───────┬────────────────────────────────────────────┐
│  id   │                     s                      │
│ int32 │                  varchar                   │
├───────┼────────────────────────────────────────────┤
│     1 │ default is the next value from id_sequence │
│  NULL │ default is NULL                            │
└───────┴────────────────────────────────────────────┘
```

Even though the sequence is no longer used, attempting to drop it results in an error:

```sql
DROP SEQUENCE id_sequence;
```

```console
Dependency Error:
Cannot drop entry "id_sequence" because there are entries that depend on it.
table "tbl" depends on index "id_sequence".
Use DROP...CASCADE to drop all dependents.
```

As the error message suggests, you can force dropping by adding `CASCADE`. However, DuckDB currently tracks dependencies at the table level, so attempting to drop with `CASCADE` drops the entire table:

```sql
DROP SEQUENCE id_sequence CASCADE;
SELECT * FROM tbl;
```

```console
Catalog Error:
Table with name tbl does not exist!
```

### CREATE TABLE Statement {#docs:current:sql:statements:create_table}

The `CREATE TABLE` statement creates a table in the catalog.

#### Examples {#docs:current:sql:statements:create_table::examples}

Create a table with two integer columns (` i` and `j`):

```sql
CREATE TABLE t1 (i INTEGER, j INTEGER);
```

Create a table with a primary key:

```sql
CREATE TABLE t1 (id INTEGER PRIMARY KEY, j VARCHAR);
```

Create a table with a composite primary key:

```sql
CREATE TABLE t1 (id INTEGER, j VARCHAR, PRIMARY KEY (id, j));
```

Create a table with various different types, constraints and default values:

```sql
CREATE TABLE t1 (
    i INTEGER NOT NULL DEFAULT 0,
    decimalnr DOUBLE CHECK (decimalnr < 10),
    date DATE UNIQUE,
    time TIMESTAMP
);
```

Create table with `CREATE TABLE ... AS SELECT` (CTAS):

```sql
CREATE TABLE t1 AS
    SELECT 42 AS i, 84 AS j;
```

Create a table from a CSV file (automatically detecting column names and types):

```sql
CREATE TABLE t1 AS
    SELECT *
    FROM read_csv('path/file.csv');
```

We can use the `FROM`-first syntax to omit `SELECT *`:

```sql
CREATE TABLE t1 AS
    FROM read_csv('path/file.csv');
```

Copy the schema of `t2` to `t1`:

```sql
CREATE TABLE t1 AS
    FROM t2
    LIMIT 0;
```

Note that only the column names and types are copied to `t1`, other pieces of information (indexes, constraints, default values, etc.) are not copied.

#### Temporary Tables {#docs:current:sql:statements:create_table::temporary-tables}

Temporary tables are session scoped, meaning that only the specific connection that created them can access them and once the connection to DuckDB is closed they will be automatically dropped (similar to PostgreSQL, for example).

They can be created using the `CREATE TEMP TABLE` or the `CREATE TEMPORARY TABLE` statement (see diagram below) and are part of the `temp.main` schema. While discouraged, their names can overlap with the names of the regular database tables. In these cases, temporary tables take priority in name resolution and full qualification is required to refer to a regular table e.g., `memory.main.t1`.

Temporary tables reside in memory rather than on disk even when connecting to a persistent DuckDB, but if the `temp_directory` [configuration](#docs:current:configuration:overview) is set, data will be spilled to disk if memory becomes constrained.

Create a temporary table from a CSV file (automatically detecting column names and types):

```sql
CREATE TEMP TABLE t1 AS
    SELECT *
    FROM read_csv('path/file.csv');
```

Allow temporary tables to off-load excess memory to disk:

```sql
SET temp_directory = '/path/to/directory/';
```

#### `CREATE OR REPLACE` {#docs:current:sql:statements:create_table::create-or-replace}

The `CREATE OR REPLACE` syntax allows a new table to be created or for an existing table to be overwritten by the new table. This is shorthand for dropping the existing table and then creating the new one.

Create a table with two integer columns (i and j) even if t1 already exists:

```sql
CREATE OR REPLACE TABLE t1 (i INTEGER, j INTEGER);
```

#### `IF NOT EXISTS` {#docs:current:sql:statements:create_table::if-not-exists}

The `IF NOT EXISTS` syntax will only proceed with the creation of the table if it does not already exist. If the table already exists, no action will be taken and the existing table will remain in the database.

Create a table with two integer columns (` i` and `j`) only if `t1` does not exist yet:

```sql
CREATE TABLE IF NOT EXISTS t1 (i INTEGER, j INTEGER);
```

#### `CREATE TABLE ... AS SELECT` (CTAS) {#docs:current:sql:statements:create_table::create-table--as-select-ctas}

DuckDB supports the `CREATE TABLE ... AS SELECT` syntax, also known as “CTAS”:

```sql
CREATE TABLE nums AS
    SELECT i
    FROM range(0, 3) t(i);
```

This syntax can be used in combination with the [CSV reader](#docs:current:data:csv:overview), the shorthand to read directly from CSV files without specifying a function, the [`FROM`-first syntax](#docs:current:sql:query_syntax:from), and the [HTTP(S) support](#docs:current:core_extensions:httpfs:https), yielding concise SQL commands such as the following:

```sql
CREATE TABLE flights AS
    FROM 'https://duckdb.org/data/flights.csv';
```

The CTAS construct also works with the `OR REPLACE` modifier, yielding `CREATE OR REPLACE TABLE ... AS` statements:

```sql
CREATE OR REPLACE TABLE flights AS
    FROM 'https://duckdb.org/data/flights.csv';
```

##### Copying the Schema {#docs:current:sql:statements:create_table::copying-the-schema}

You can create a copy of the table's schema (column names and types only) as follows:

```sql
CREATE TABLE t1 AS
    FROM t2
    WITH NO DATA;
```

Or:

```sql
CREATE TABLE t1 AS
    FROM t2
    LIMIT 0;
```

It is not possible to create tables using CTAS statements with constraints (primary keys, check constraints, etc.).

#### Check Constraints {#docs:current:sql:statements:create_table::check-constraints}

A `CHECK` constraint is an expression that must be satisfied by the values of every row in the table.

```sql
CREATE TABLE t1 (
    id INTEGER PRIMARY KEY,
    percentage INTEGER CHECK (0 <= percentage AND percentage <= 100)
);
INSERT INTO t1 VALUES (1, 5);
INSERT INTO t1 VALUES (2, -1);
```

```console
Constraint Error:
CHECK constraint failed: t1
```

```sql
INSERT INTO t1 VALUES (3, 101);
```

```console
Constraint Error:
CHECK constraint failed: t1
```

```sql
CREATE TABLE t2 (id INTEGER PRIMARY KEY, x INTEGER, y INTEGER CHECK (x < y));
INSERT INTO t2 VALUES (1, 5, 10);
INSERT INTO t2 VALUES (2, 5, 3);
```

```console
Constraint Error:
CHECK constraint failed: t2
```

`CHECK` constraints can also be added as part of the `CONSTRAINTS` clause:

```sql
CREATE TABLE t3 (
    id INTEGER PRIMARY KEY,
    x INTEGER,
    y INTEGER,
    CONSTRAINT x_smaller_than_y CHECK (x < y)
);
INSERT INTO t3 VALUES (1, 5, 10);
INSERT INTO t3 VALUES (2, 5, 3);
```

```console
Constraint Error:
CHECK constraint failed: t3
```

#### Foreign Key Constraints {#docs:current:sql:statements:create_table::foreign-key-constraints}

A `FOREIGN KEY` is a column (or set of columns) that references another table's primary key. Foreign keys check referential integrity, i.e., the referred primary key must exist in the other table upon insertion.

```sql
CREATE TABLE t1 (id INTEGER PRIMARY KEY, j VARCHAR);
CREATE TABLE t2 (
    id INTEGER PRIMARY KEY,
    t1_id INTEGER,
    FOREIGN KEY (t1_id) REFERENCES t1 (id)
);
```

Example:

```sql
INSERT INTO t1 VALUES (1, 'a');
INSERT INTO t2 VALUES (1, 1);
INSERT INTO t2 VALUES (2, 2);
```

```console
Constraint Error:
Violates foreign key constraint because key "id: 2" does not exist in the referenced table
```

Foreign keys can be defined on composite primary keys:

```sql
CREATE TABLE t3 (id INTEGER, j VARCHAR, PRIMARY KEY (id, j));
CREATE TABLE t4 (
    id INTEGER PRIMARY KEY, t3_id INTEGER, t3_j VARCHAR,
    FOREIGN KEY (t3_id, t3_j) REFERENCES t3(id, j)
);
```

Example:

```sql
INSERT INTO t3 VALUES (1, 'a');
INSERT INTO t4 VALUES (1, 1, 'a');
INSERT INTO t4 VALUES (2, 1, 'b');
```

```console
Constraint Error:
Violates foreign key constraint because key "id: 1, j: b" does not exist in the referenced table
```

Foreign keys can also be defined on unique columns:

```sql
CREATE TABLE t5 (id INTEGER UNIQUE, j VARCHAR);
CREATE TABLE t6 (
    id INTEGER PRIMARY KEY,
    t5_id INTEGER,
    FOREIGN KEY (t5_id) REFERENCES t5(id)
);
```

##### Limitations {#docs:current:sql:statements:create_table::limitations}

Foreign keys have the following limitations.

Foreign keys with cascading deletes (` FOREIGN KEY ... REFERENCES ... ON DELETE CASCADE`) are not supported.

Inserting into tables with self-referencing foreign keys is currently not supported and will result in the following error:

```console
Constraint Error:
Violates foreign key constraint because key "..." does not exist in the referenced table.
```

#### Generated Columns {#docs:current:sql:statements:create_table::generated-columns}

The `[type] [GENERATED ALWAYS] AS (expr) [VIRTUAL|STORED]` syntax will create a generated column. The data in this kind of column is generated from its expression, which can reference other (regular or generated) columns of the table. Since they are produced by calculations, these columns cannot be inserted into directly.

DuckDB can infer the type of the generated column based on the expression's return type. This allows you to leave out the type when declaring a generated column. It is possible to explicitly set a type, but insertions into the referenced columns might fail if the type cannot be cast to the type of the generated column.

Generated columns come in two varieties: `VIRTUAL` and `STORED`.
The data of virtual generated columns is not stored on disk, instead it is computed from the expression every time the column is referenced (through a select statement).

The data of stored generated columns is stored on disk and is computed every time the data of their dependencies change (through an `INSERT` / `UPDATE` / `DROP` statement).

Currently, only the `VIRTUAL` kind is supported, and it is also the default option if the last field is left blank.

The simplest syntax for a generated column:

The type is derived from the expression, and the variant defaults to `VIRTUAL`:

```sql
CREATE TABLE t1 (x FLOAT, two_x AS (2 * x));
```

Fully specifying the same generated column for completeness:

```sql
CREATE TABLE t1 (x FLOAT, two_x FLOAT GENERATED ALWAYS AS (2 * x) VIRTUAL);
```

#### Syntax {#docs:current:sql:statements:create_table::syntax}


### CREATE VIEW Statement {#docs:current:sql:statements:create_view}

The `CREATE VIEW` statement defines a new view in the catalog.

#### Examples {#docs:current:sql:statements:create_view::examples}

Create a simple view:

```sql
CREATE VIEW view1 AS SELECT * FROM tbl;
```

Create a view or replace it if a view with that name already exists:

```sql
CREATE OR REPLACE VIEW view1 AS SELECT 42;
```

Create a view and replace the column names:

```sql
CREATE VIEW view1(a) AS SELECT 42;
```

The SQL query behind an existing view can be read using the [`duckdb_views()` function](#docs:current:sql:meta:duckdb_table_functions::duckdb_views) like this:

```sql
SELECT sql FROM duckdb_views() WHERE view_name = 'view1';
```

#### Syntax {#docs:current:sql:statements:create_view::syntax}



`CREATE VIEW` defines a view of a query. The view is not physically materialized. Instead, the query is run every time the view is referenced in a query.

`CREATE OR REPLACE VIEW` is similar, but if a view of the same name already exists, it is replaced.

If a schema name is given then the view is created in the specified schema. Otherwise, it is created in the current schema. Temporary views exist in a special schema, so a schema name cannot be given when creating a temporary view. The name of the view must be distinct from the name of any other view or table in the same schema.

### CREATE TYPE Statement {#docs:current:sql:statements:create_type}

The `CREATE TYPE` statement defines a new type in the catalog.

#### Examples {#docs:current:sql:statements:create_type::examples}

Create a simple `ENUM` type:

```sql
CREATE TYPE mood AS ENUM ('happy', 'sad', 'curious');
```

Create a simple `STRUCT` type:

```sql
CREATE TYPE many_things AS STRUCT(k INTEGER, l VARCHAR);
```

Create a simple `UNION` type:

```sql
CREATE TYPE one_thing AS UNION(number INTEGER, string VARCHAR);
```

Create a type alias:

```sql
CREATE TYPE x_index AS INTEGER;
```

#### Syntax {#docs:current:sql:statements:create_type::syntax}



The `CREATE TYPE` clause defines a new data type available to this DuckDB instance.
These new types can then be inspected in the [`duckdb_types` table](#docs:current:sql:meta:duckdb_table_functions::duckdb_types).

#### Limitations {#docs:current:sql:statements:create_type::limitations}

* Extending types to support custom operators (such as the PostgreSQL `&&` operator) is not possible via plain SQL.
  Instead, it requires adding additional C++ code. To do this, create an [extension](#docs:current:extensions:overview).

* The `CREATE TYPE` clause does not support the `OR REPLACE` modifier.

### DELETE Statement {#docs:current:sql:statements:delete}

The `DELETE` statement removes rows from the table identified by the table-name.
If the `WHERE` clause is not present, all records in the table are deleted.
If a `WHERE` clause is supplied, then only those rows for which the `WHERE` clause results in true are deleted. Rows for which the expression is false or `NULL` are retained.

#### Examples {#docs:current:sql:statements:delete::examples}

Remove the rows matching the condition `i = 2` from the database:

```sql
DELETE FROM tbl WHERE i = 2;
```

Delete all rows in the table `tbl`:

```sql
DELETE FROM tbl;
```

##### `USING` Clause {#docs:current:sql:statements:delete::using-clause}

The `USING` clause allows deleting based on the content of other tables or subqueries.

##### `RETURNING` Clause {#docs:current:sql:statements:delete::returning-clause}

The `RETURNING` clause allows returning the deleted values. It uses the same syntax as the `SELECT` clause except the `DISTINCT` modifier is not supported.

```sql
CREATE TABLE employees (name VARCHAR, age INTEGER);
INSERT INTO employees VALUES ('Kat', 32);
DELETE FROM employees RETURNING name, 2025 - age AS approx_birthyear;
```

| name | approx_birthyear |
|------|-----------------:|
| Kat  | 1993             |

#### Syntax {#docs:current:sql:statements:delete::syntax}



#### The `TRUNCATE` Statement {#docs:current:sql:statements:delete::the-truncate-statement}

The `TRUNCATE` statement removes all rows from a table, acting as an alias for `DELETE FROM` without a `WHERE` clause:

```sql
TRUNCATE tbl;
```

#### Limitations on Reclaiming Memory and Disk Space {#docs:current:sql:statements:delete::limitations-on-reclaiming-memory-and-disk-space}

Running `DELETE` does not mean space is reclaimed. In general, rows are only marked as deleted. DuckDB reclaims space upon [performing a `CHECKPOINT`](#docs:current:sql:statements:checkpoint). [`VACUUM`](#docs:current:sql:statements:vacuum) currently does not reclaim space.

### DESCRIBE Statement {#docs:current:sql:statements:describe}

The `DESCRIBE` statement shows the schema of a table, view or query.

#### Usage {#docs:current:sql:statements:describe::usage}

```sql
DESCRIBE tbl;
```

To describe a query, prepend `DESCRIBE` to a query.

```sql
DESCRIBE SELECT * FROM tbl;
```

#### Alias {#docs:current:sql:statements:describe::alias}

The `SHOW` statement is an alias for `DESCRIBE`.

#### See Also {#docs:current:sql:statements:describe::see-also}

For more examples, see the [guide on `DESCRIBE`](#docs:current:guides:meta:describe).

### DROP Statement {#docs:current:sql:statements:drop}

The `DROP` statement removes a catalog entry added previously with the `CREATE` command.

#### Examples {#docs:current:sql:statements:drop::examples}

Delete the table with the name `tbl`:

```sql
DROP TABLE tbl;
```

Drop the view with the name `view1`; do not throw an error if the view does not exist:

```sql
DROP VIEW IF EXISTS view1;
```

Drop function `fn`:

```sql
DROP FUNCTION fn;
```

Drop index `idx`:

```sql
DROP INDEX idx;
```

Drop schema `sch`:

```sql
DROP SCHEMA sch;
```

Drop sequence `seq`:

```sql
DROP SEQUENCE seq;
```

Drop macro `mcr`:

```sql
DROP MACRO mcr;
```

Drop macro table `mt`:

```sql
DROP MACRO TABLE mt; -- the `TABLE` is optional since v1.4.0
```

Drop type `typ`:

```sql
DROP TYPE typ;
```

#### Syntax {#docs:current:sql:statements:drop::syntax}



#### Dependencies of Dropped Objects {#docs:current:sql:statements:drop::dependencies-of-dropped-objects}

DuckDB performs limited dependency tracking for some object types.
By default or if the `RESTRICT` clause is provided, the entry will not be dropped if there are any other objects that depend on it.
If the `CASCADE` clause is provided then all the objects that are dependent on the object will be dropped as well.

```sql
CREATE SCHEMA myschema;
CREATE TABLE myschema.t1 (i INTEGER);
DROP SCHEMA myschema;
```

```console
Dependency Error:
Cannot drop entry "myschema" because there are entries that depend on it.
table "t1" depends on schema "myschema".
Use DROP...CASCADE to drop all dependents.
```

The `CASCADE` modifier drops both myschema and `myschema.t1`:

```sql
CREATE SCHEMA myschema;
CREATE TABLE myschema.t1 (i INTEGER);
DROP SCHEMA myschema CASCADE;
```

The following dependencies are tracked and thus will raise an error if the user tries to drop the depending object without the `CASCADE` modifier.

| Depending object type | Dependent object type |
|--|--|
| `SCHEMA` | `FUNCTION` |
| `SCHEMA` | `INDEX` |
| `SCHEMA` | `MACRO TABLE` |
| `SCHEMA` | `MACRO` |
| `SCHEMA` | `SCHEMA` |
| `SCHEMA` | `SEQUENCE` |
| `SCHEMA` | `TABLE` |
| `SCHEMA` | `TYPE` |
| `SCHEMA` | `VIEW` |
| `TABLE`  | `INDEX` |

#### Limitations {#docs:current:sql:statements:drop::limitations}

##### Dependencies on Views {#docs:current:sql:statements:drop::dependencies-on-views}

Currently, dependencies are not tracked for views. For example, if a view is created that references a table and the table is dropped, then the view will be in an invalid state:

```sql
CREATE TABLE tbl (i INTEGER);
CREATE VIEW view1 AS
    SELECT i FROM tbl;
DROP TABLE tbl RESTRICT;
SELECT * FROM view1;
```

This returns the following error message:

```console
Catalog Error:
Table with name tbl does not exist!
```

#### Limitations on Reclaiming Disk Space {#docs:current:sql:statements:drop::limitations-on-reclaiming-disk-space}

Running `DROP TABLE` should free the memory used by the table, but not always disk space.
Even if disk space does not decrease, the free blocks will be marked as `free`.
For example, if we have a 2 GB file and we drop a 1 GB table, the file might still be 2 GB, but it should have 1 GB of free blocks in it.
To check this, use the following `PRAGMA` and check the number of `free_blocks` in the output:

```sql
PRAGMA database_size;
```

For instructions on reclaiming space after dropping a table, refer to the [“Reclaiming space” page](#docs:current:operations_manual:footprint_of_duckdb:reclaiming_space).

### EXPORT and IMPORT DATABASE Statements {#docs:current:sql:statements:export}

The `EXPORT DATABASE` command allows you to export the contents of the database to a specific directory. The `IMPORT DATABASE` command allows you to then read the contents again.

#### Examples {#docs:current:sql:statements:export::examples}

Export the database to the target directory 'target_directory' as CSV files:

```sql
EXPORT DATABASE 'target_directory';
```

Export to directory 'target_directory', using the given options for the CSV serialization:

```sql
EXPORT DATABASE 'target_directory' (FORMAT csv, DELIMITER '|');
```

Export to directory 'target_directory', tables serialized as Parquet:

```sql
EXPORT DATABASE 'target_directory' (FORMAT parquet);
```

Export to directory 'target_directory', tables serialized as Parquet, compressed with Zstd, with a row_group_size of 100,000:

```sql
EXPORT DATABASE 'target_directory' (
    FORMAT parquet,
    COMPRESSION zstd,
    ROW_GROUP_SIZE 100_000
);
```

Reload the database again:

```sql
IMPORT DATABASE 'source_directory';
```

Alternatively, use a `PRAGMA`:

```sql
PRAGMA import_database('source_directory');
```

For details regarding the writing of Parquet files, see the [Parquet Files page in the Data Import section](#docs:current:data:parquet:overview::writing-to-parquet-files) and the [`COPY` Statement page](#docs:current:sql:statements:copy).

#### `EXPORT DATABASE` {#docs:current:sql:statements:export::export-database}

The `EXPORT DATABASE` command exports the full contents of the database – including schema information, tables, views and sequences – to a specific directory that can then be loaded again. The created directory will be structured as follows:

```text
target_directory/schema.sql
target_directory/load.sql
target_directory/t_1.csv
...
target_directory/t_n.csv
```

The `schema.sql` file contains the schema statements that are found in the database. It contains any `CREATE SCHEMA`, `CREATE TABLE`, `CREATE VIEW` and `CREATE SEQUENCE` commands that are necessary to re-construct the database.

The `load.sql` file contains a set of `COPY` statements that can be used to read the data from the CSV files again. The file contains a single `COPY` statement for every table found in the schema.

##### Syntax {#docs:current:sql:statements:export::syntax}



#### `IMPORT DATABASE` {#docs:current:sql:statements:export::import-database}

The database can be reloaded by using the `IMPORT DATABASE` command again, or manually by running `schema.sql` followed by `load.sql` to re-load the data.

##### Syntax {#docs:current:sql:statements:export::syntax}


### INSERT Statement {#docs:current:sql:statements:insert}

The `INSERT` statement inserts new data into a table.

#### Examples {#docs:current:sql:statements:insert::examples}

Insert the values 1, 2, 3 into `tbl`:

```sql
INSERT INTO tbl
    VALUES (1), (2), (3);
```

Insert the result of a query into a table:

```sql
INSERT INTO tbl
    SELECT * FROM other_tbl;
```

Insert values into the `i` column, inserting the default value into other columns:

```sql
INSERT INTO tbl (i)
    VALUES (1), (2), (3);
```

Explicitly insert the default value into a column:

```sql
INSERT INTO tbl (i)
    VALUES (1), (DEFAULT), (3);
```

Assuming `tbl` has a primary key/unique constraint, do nothing on conflict:

```sql
INSERT OR IGNORE INTO tbl (i)
    VALUES (1);
```

Or update the table with the new values instead:

```sql
INSERT OR REPLACE INTO tbl (i)
    VALUES (1);
```

#### Syntax {#docs:current:sql:statements:insert::syntax}



`INSERT INTO` inserts new rows into a table. One can insert one or more rows specified by value expressions, or zero or more rows resulting from a query.

#### Insert Column Order {#docs:current:sql:statements:insert::insert-column-order}

It's possible to provide an optional insert column order, this can either be `BY POSITION` (the default) or `BY NAME`.
Each column not present in the explicit or implicit column list will be filled with a default value, either its declared default value or `NULL` if there is none.

If the expression for any column is not of the correct data type, automatic type conversion will be attempted.

##### `INSERT INTO ... [BY POSITION]` {#docs:current:sql:statements:insert::insert-into--by-position}

The order that values are inserted into the columns of the table is determined by the order that the columns were declared in.
That is, the values supplied by the `VALUES` clause or query are associated with the column list left-to-right.
This is the default option, that can be explicitly specified using the `BY POSITION` option.
For example:

```sql
CREATE TABLE tbl (a INTEGER, b INTEGER);
INSERT INTO tbl
    VALUES (5, 42);
```

Specifying `BY POSITION` is optional and is equivalent to the default behavior:

```sql
INSERT INTO tbl
    BY POSITION
    VALUES (5, 42);
```

To use a different order, column names can be provided as part of the target, for example:

```sql
CREATE TABLE tbl (a INTEGER, b INTEGER);
INSERT INTO tbl (b, a)
    VALUES (5, 42);
```

Adding `BY POSITION` results in the same behavior:

```sql
INSERT INTO tbl
    BY POSITION (b, a)
    VALUES (5, 42);
```

This will insert `5` into `b` and `42` into `a`.

##### `INSERT INTO ... BY NAME` {#docs:current:sql:statements:insert::insert-into--by-name}

Using the `BY NAME` modifier, the names of the column list of the `SELECT` statement are matched against the column names of the table to determine the order that values should be inserted into the table. This allows inserting even in cases when the order of the columns in the table differs from the order of the values in the `SELECT` statement or certain columns are missing.

For example:

```sql
CREATE TABLE tbl (a INTEGER, b INTEGER);
INSERT INTO tbl BY NAME (SELECT 42 AS b, 32 AS a);
INSERT INTO tbl BY NAME (SELECT 22 AS b);
SELECT * FROM tbl;
```

|  a   | b  |
|-----:|---:|
| 32   | 42 |
| NULL | 22 |

It's important to note that when using `INSERT INTO ... BY NAME`, the column names specified in the `SELECT` statement must match the column names in the table. If a column name is misspelled or does not exist in the table, an error will occur. Columns that are missing from the `SELECT` statement will be filled with the default value.

#### `ON CONFLICT` Clause {#docs:current:sql:statements:insert::on-conflict-clause}

An `ON CONFLICT` clause can be used to perform a certain action on conflicts that arise from `UNIQUE` or `PRIMARY KEY` constraints.
An example for such a conflict is shown in the following example:

```sql
CREATE TABLE tbl (i INTEGER PRIMARY KEY, j INTEGER);
INSERT INTO tbl
    VALUES (1, 42);
INSERT INTO tbl
    VALUES (1, 84);
```

This raises an error:

```console
Constraint Error:
Duplicate key "i: 1" violates primary key constraint.
```

The table will contain the row that was first inserted:

```sql
SELECT * FROM tbl;
```

| i | j  |
|--:|---:|
| 1 | 42 |

These error messages can be avoided by explicitly handling conflicts.
DuckDB supports two such clauses: [`ON CONFLICT DO NOTHING`](#::do-nothing-clause) and [`ON CONFLICT DO UPDATE SET ...`](#::do-update-clause-upsert).

##### `DO NOTHING` Clause {#docs:current:sql:statements:insert::do-nothing-clause}

The `DO NOTHING` clause causes the error(s) to be ignored, and the values are not inserted or updated.
For example:

```sql
CREATE TABLE tbl (i INTEGER PRIMARY KEY, j INTEGER);
INSERT INTO tbl
    VALUES (1, 42);
INSERT INTO tbl
    VALUES (1, 84)
    ON CONFLICT DO NOTHING;
```

These statements finish successfully and leave the table with the row `<i: 1, j: 42>`.

###### `INSERT OR IGNORE INTO` {#docs:current:sql:statements:insert::insert-or-ignore-into}

The `INSERT OR IGNORE INTO ...` statement is a shorter syntax alternative to `INSERT INTO ... ON CONFLICT DO NOTHING`.
For example, the following statements are equivalent:

```sql
INSERT OR IGNORE INTO tbl
    VALUES (1, 84);
INSERT INTO tbl
    VALUES (1, 84) ON CONFLICT DO NOTHING;
```

##### `DO UPDATE` Clause (Upsert) {#docs:current:sql:statements:insert::do-update-clause-upsert}

The `DO UPDATE` clause causes the `INSERT` to turn into an `UPDATE` on the conflicting row(s) instead.
The `SET` expressions that follow determine how these rows are updated. The expressions can use the special virtual table `EXCLUDED`, which contains the conflicting values for the row.
Optionally you can provide an additional `WHERE` clause that can exclude certain rows from the update.
The conflicts that don't meet this condition are ignored instead.

Because we need a way to refer to both the **to-be-inserted** tuple and the **existing** tuple, we introduce the special `EXCLUDED` qualifier.
When the `EXCLUDED` qualifier is provided, the reference refers to the **to-be-inserted** tuple, otherwise, it refers to the **existing** tuple.
This special qualifier can be used within the `WHERE` clauses and `SET` expressions of the `ON CONFLICT` clause.

```sql
CREATE TABLE tbl (i INTEGER PRIMARY KEY, j INTEGER);
INSERT INTO tbl VALUES (1, 42);
INSERT INTO tbl VALUES (1, 52), (1, 62) ON CONFLICT DO UPDATE SET j = EXCLUDED.j;
```

###### Examples {#docs:current:sql:statements:insert::examples}

An example using `DO UPDATE` is the following:

```sql
CREATE TABLE tbl (i INTEGER PRIMARY KEY, j INTEGER);
INSERT INTO tbl
    VALUES (1, 42);
INSERT INTO tbl
    VALUES (1, 84)
    ON CONFLICT DO UPDATE SET j = EXCLUDED.j;
SELECT * FROM tbl;
```

| i | j  |
|--:|---:|
| 1 | 84 |

Rearranging columns and using `BY NAME` is also possible:

```sql
CREATE TABLE tbl (i INTEGER PRIMARY KEY, j INTEGER);
INSERT INTO tbl
    VALUES (1, 42);
INSERT INTO tbl (j, i)
    VALUES (168, 1)
    ON CONFLICT DO UPDATE SET j = EXCLUDED.j;
INSERT INTO tbl
    BY NAME (SELECT 1 AS i, 336 AS j)
    ON CONFLICT DO UPDATE SET j = EXCLUDED.j;
SELECT * FROM tbl;
```

| i |  j  |
|--:|----:|
| 1 | 336 |

###### `INSERT OR REPLACE INTO` {#docs:current:sql:statements:insert::insert-or-replace-into}

The `INSERT OR REPLACE INTO ...` statement is a shorter syntax alternative to `INSERT INTO ... DO UPDATE SET c1 = EXCLUDED.c1, c2 = EXCLUDED.c2, ...`.
That is, it updates every column of the **existing** row to the new values of the **to-be-inserted** row.
For example, given the following input table:

```sql
CREATE TABLE tbl (i INTEGER PRIMARY KEY, j INTEGER);
INSERT INTO tbl
    VALUES (1, 42);
```

These statements are equivalent:

```sql
INSERT OR REPLACE INTO tbl
    VALUES (1, 84);
INSERT INTO tbl
    VALUES (1, 84)
    ON CONFLICT DO UPDATE SET j = EXCLUDED.j;
INSERT INTO tbl (j, i)
    VALUES (84, 1)
    ON CONFLICT DO UPDATE SET j = EXCLUDED.j;
INSERT INTO tbl BY NAME
    (SELECT 84 AS j, 1 AS i)
    ON CONFLICT DO UPDATE SET j = EXCLUDED.j;
```

###### Limitations {#docs:current:sql:statements:insert::limitations}

When the `ON CONFLICT ... DO UPDATE` clause is used and a conflict occurs, DuckDB internally assigns `NULL` values to the row's columns that are unaffected by the conflict, then re-assigns their values. If the affected columns use a `NOT NULL` constraint, this will trigger a `NOT NULL constraint failed` error. For example:

```sql
CREATE TABLE t1 (id INTEGER PRIMARY KEY, val1 DOUBLE, val2 DOUBLE NOT NULL);
CREATE TABLE t2 (id INTEGER PRIMARY KEY, val1 DOUBLE);
INSERT INTO t1
    VALUES (1, 2, 3);
INSERT INTO t2
    VALUES (1, 5);

INSERT INTO t1 BY NAME (SELECT id, val1 FROM t2)
    ON CONFLICT DO UPDATE
    SET val1 = EXCLUDED.val1;
```

This fails with the following error:

```console
Constraint Error:
NOT NULL constraint failed: t1.val2
```

###### Composite Primary Key {#docs:current:sql:statements:insert::composite-primary-key}

When multiple columns need to be part of the uniqueness constraint, use a single `PRIMARY KEY` clause including all relevant columns:

```sql
CREATE TABLE t1 (id1 INTEGER, id2 INTEGER, val1 DOUBLE, PRIMARY KEY (id1, id2));
INSERT OR REPLACE INTO t1
    VALUES (1, 2, 3);
INSERT OR REPLACE INTO t1
    VALUES (1, 2, 4);
```

##### Defining a Conflict Target {#docs:current:sql:statements:insert::defining-a-conflict-target}

A conflict target may be provided as `ON CONFLICT (conflict_target)`. This is a group of columns that an index or uniqueness/key constraint is defined on. If the conflict target is omitted, the `PRIMARY KEY` constraint(s) on the table are targeted.

Specifying a conflict target is optional unless using a [`DO UPDATE`](#::do-update-clause-upsert) and there are multiple unique/primary key constraints on the table.

```sql
CREATE TABLE tbl (i INTEGER PRIMARY KEY, j INTEGER UNIQUE, k INTEGER);
INSERT INTO tbl
    VALUES (1, 20, 300);
SELECT * FROM tbl;
```

| i | j  |  k  |
|--:|---:|----:|
| 1 | 20 | 300 |

```sql
INSERT INTO tbl
    VALUES (1, 40, 700)
    ON CONFLICT (i) DO UPDATE SET k = 2 * EXCLUDED.k;
```

| i | j  |  k   |
|--:|---:|-----:|
| 1 | 20 | 1400 |

```sql
INSERT INTO tbl
    VALUES (1, 20, 900)
    ON CONFLICT (j) DO UPDATE SET k = 5 * EXCLUDED.k;
```

| i | j  |  k   |
|--:|---:|-----:|
| 1 | 20 | 4500 |

When a conflict target is provided, you can further filter this with a `WHERE` clause, that should be met by all conflicts.

```sql
INSERT INTO tbl
    VALUES (1, 40, 700)
    ON CONFLICT (i) DO UPDATE SET k = 2 * EXCLUDED.k WHERE k < 100;
```

#### `RETURNING` Clause {#docs:current:sql:statements:insert::returning-clause}

The `RETURNING` clause may be used to return the contents of the rows that were inserted. This can be useful if some columns are calculated upon insert. For example, if the table contains an automatically incrementing primary key, then the `RETURNING` clause will include the automatically created primary key. This is also useful in the case of generated columns.

Some or all columns can be explicitly chosen to be returned and they may optionally be renamed using aliases. Arbitrary non-aggregating expressions may also be returned instead of simply returning a column. All columns can be returned using the `*` expression, and columns or expressions can be returned in addition to all columns returned by the `*`.

For example:

```sql
CREATE TABLE t1 (i INTEGER);
INSERT INTO t1
    SELECT 42
    RETURNING *;
```

| i  |
|---:|
| 42 |

A more complex example that includes an expression in the `RETURNING` clause:

```sql
CREATE TABLE t2 (i INTEGER, j INTEGER);
INSERT INTO t2
    SELECT 2 AS i, 3 AS j
    RETURNING *, i * j AS i_times_j;
```

| i | j | i_times_j |
|--:|--:|----------:|
| 2 | 3 | 6         |

The next example shows a situation where the `RETURNING` clause is more helpful. First, a table is created with a primary key column. Then a sequence is created to allow for that primary key to be incremented as new rows are inserted. When we insert into the table, we do not already know the values generated by the sequence, so it is valuable to return them. For additional information, see the [`CREATE SEQUENCE` page](#docs:current:sql:statements:create_sequence).

```sql
CREATE TABLE t3 (i INTEGER PRIMARY KEY, j INTEGER);
CREATE SEQUENCE 't3_key';
INSERT INTO t3
    SELECT nextval('t3_key') AS i, 42 AS j
    UNION ALL
    SELECT nextval('t3_key') AS i, 43 AS j
    RETURNING *;
```

| i | j  |
|--:|---:|
| 1 | 42 |
| 2 | 43 |

### LOAD / INSTALL Statements {#docs:current:sql:statements:load_and_install}

#### `INSTALL` {#docs:current:sql:statements:load_and_install::install}

The `INSTALL` statement downloads an extension so it can be loaded into a DuckDB session.

##### Examples {#docs:current:sql:statements:load_and_install::examples}

Install the [`httpfs`](#docs:current:core_extensions:httpfs:overview) extension:

```sql
INSTALL httpfs;
```

Install the [`h3` community extension](#community_extensions:extensions:h3):

```sql
INSTALL h3 FROM community;
```

##### Syntax {#docs:current:sql:statements:load_and_install::syntax}



#### `LOAD` {#docs:current:sql:statements:load_and_install::load}

The `LOAD` statement loads an installed DuckDB extension into the current session.

##### Examples {#docs:current:sql:statements:load_and_install::examples}

Load the [`httpfs`](#docs:current:core_extensions:httpfs:overview) extension:

```sql
LOAD httpfs;
```

Load the [`spatial`](#docs:current:core_extensions:spatial:overview) extension:

```sql
LOAD spatial;
```

##### Syntax {#docs:current:sql:statements:load_and_install::syntax}


### MERGE INTO Statement {#docs:current:sql:statements:merge_into}

The `MERGE INTO` statement is an alternative to `INSERT INTO ... ON CONFLICT` that doesn't need a primary key since it allows for a custom match condition. This is a very useful alternative for upserting use cases (` INSERT` + `UPDATE`) when the destination table does not have a primary key constraint.

#### Examples {#docs:current:sql:statements:merge_into::examples}

First, let's create a simple table.

```sql
CREATE TABLE people (id INTEGER, name VARCHAR, salary FLOAT);
INSERT INTO people VALUES (1, 'John', 92_000.0), (2, 'Anna', 100_000.0);
```

The simplest upsert would be to use a whole row in the `USING` clause.
This way, if there is a match,
the row can be updated to the new row without further instuctions
(` WHEN MATCHED THEN UPDATE`), and when there is no match,
the row can be trivially inserted into the table
(` WHEN NOT MATCHED THEN INSERT`).

```sql
MERGE INTO people
    USING (
        SELECT
            unnest([3, 1]) AS id,
            unnest(['Sarah', 'John']) AS name,
            unnest([95_000.0, 105_000.0]) AS salary
    ) AS upserts
    ON (upserts.id = people.id)
    WHEN MATCHED THEN UPDATE
    WHEN NOT MATCHED THEN INSERT;

FROM people
ORDER BY id;
```

| id | name  |  salary  |
|---:|-------|---------:|
| 1  | John  | 105000.0 |
| 2  | Anna  | 100000.0 |
| 3  | Sarah | 95000.0  |


In the previous example we are updating the whole row if `id` matches. However, it is also a common pattern to receive a _change set_ with some keys and the changed value. This is a good use for `SET`. If the match condition uses a column that has the same name in the source and destination, the keyword `USING` can be used in the match condition.

```sql
MERGE INTO people
    USING (
        SELECT
            1 AS id, 
            98_000.0 AS salary
    ) AS salary_updates
    USING (id)
    WHEN MATCHED THEN UPDATE SET salary = salary_updates.salary;

FROM people
ORDER BY id;
```

| id | name  |  salary  |
|---:|-------|---------:|
| 1  | John  | 98000.0  |
| 2  | Anna  | 100000.0 |
| 3  | Sarah | 95000.0  |

Another common pattern is to receive a _delete set_ of rows, which may only contain ids of rows to be deleted.

```sql
MERGE INTO people
    USING (
        SELECT
            1 AS id, 
    ) AS deletes
    USING (id)
    WHEN MATCHED THEN DELETE;

FROM people
ORDER BY id;
```

| id | name  |  salary  |
|---:|-------|---------:|
| 2  | Anna  | 100000.0 |
| 3  | Sarah | 95000.0  |

`MERGE INTO` also supports more complex conditions, for example, for a given _delete set_ we can decide to only remove rows that contain a `salary` bigger or equal than a certain amount.

```sql
MERGE INTO people
    USING (
        SELECT
            unnest([3, 2]) AS id, 
    ) AS deletes
    USING (id)
    WHEN MATCHED AND people.salary >= 100_000.0 THEN DELETE;

FROM people
ORDER BY id;
```

| id | name  | salary  |
|---:|-------|--------:|
| 3  | Sarah | 95000.0 |

If needed, DuckDB also supports multiple `UPDATE` and `DELETE` conditions. The `RETURNING` clause can be used to indicate which rows were affected by the `MERGE` statement.

```sql
-- Let's get John back in!
INSERT INTO people VALUES (1, 'John', 105_000.0);

MERGE INTO people
    USING (
        SELECT
            unnest([3, 1]) AS id,
            unnest([89_000.0, 70_000.0]) AS salary
    ) AS upserts
    USING (id)
    WHEN MATCHED AND people.salary < 100_000.0 THEN UPDATE SET salary = upserts.salary
    -- Second update or delete condition
    WHEN MATCHED AND people.salary > 100_000.0 THEN DELETE
    WHEN NOT MATCHED THEN INSERT BY NAME
    RETURNING merge_action, *;
```

| merge_action | id | name  |  salary  |
|--------------|---:|-------|---------:|
| UPDATE       | 3  | Sarah | 89000.0  |
| DELETE       | 1  | John  | 105000.0 |

In some cases, you may want to perform a different action specifically if the source doesn't meet a condition. For example, if we expect that data that is not present on the source shouldn't be present in the target:

```sql
CREATE TABLE target AS
    SELECT unnest([1,2]) AS id;

MERGE INTO target
    USING (SELECT 1 AS id) source
    USING (id)
    WHEN MATCHED THEN UPDATE
    WHEN NOT MATCHED BY SOURCE THEN DELETE
    RETURNING merge_action, *;
```

| merge_action | id |
|--------------|---:|
| UPDATE       | 1  |
| DELETE       | 2  |

There is also the possibility of specifying `WHEN NOT MATCHED BY TARGET`. However, the behavior is, as you may expect, the same as `WHEN NOT MATCHED` since by default when specifying conditions, we look at the target.

#### Syntax {#docs:current:sql:statements:merge_into::syntax}


### PIVOT Statement {#docs:current:sql:statements:pivot}

The `PIVOT` statement allows distinct values within a column to be separated into their own columns.
The values within those new columns are calculated using an aggregate function on the subset of rows that match each distinct value.

DuckDB implements both the SQL Standard `PIVOT` syntax and a simplified `PIVOT` syntax that automatically detects the columns to create while pivoting.
`PIVOT_WIDER` may also be used in place of the `PIVOT` keyword.

For details on how the `PIVOT` statement is implemented, see the [Pivot Internals site](#docs:current:internals:pivot::pivot).

> The [`UNPIVOT` statement](#docs:current:sql:statements:unpivot) is the inverse of the `PIVOT` statement.

#### Simplified `PIVOT` Syntax {#docs:current:sql:statements:pivot::simplified-pivot-syntax}

The full syntax diagram is below, but the simplified `PIVOT` syntax can be summarized using spreadsheet pivot table naming conventions as:

```sql
PIVOT ⟨dataset⟩
ON ⟨columns⟩
USING ⟨values⟩
GROUP BY ⟨rows⟩
ORDER BY ⟨columns_with_order_directions⟩
LIMIT ⟨number_of_rows⟩;
```

The `ON`, `USING`, and `GROUP BY` clauses are each optional, but they may not all be omitted.

##### Example Data {#docs:current:sql:statements:pivot::example-data}

All examples use the dataset produced by the queries below:

```sql
CREATE TABLE cities (
    country VARCHAR, name VARCHAR, year INTEGER, population INTEGER
);
INSERT INTO cities VALUES
    ('NL', 'Amsterdam', 2000, 1005),
    ('NL', 'Amsterdam', 2010, 1065),
    ('NL', 'Amsterdam', 2020, 1158),
    ('US', 'Seattle', 2000, 564),
    ('US', 'Seattle', 2010, 608),
    ('US', 'Seattle', 2020, 738),
    ('US', 'New York City', 2000, 8015),
    ('US', 'New York City', 2010, 8175),
    ('US', 'New York City', 2020, 8772);
```

```sql
SELECT *
FROM cities;
```

| country |     name      | year | population |
|---------|---------------|-----:|-----------:|
| NL      | Amsterdam     | 2000 | 1005       |
| NL      | Amsterdam     | 2010 | 1065       |
| NL      | Amsterdam     | 2020 | 1158       |
| US      | Seattle       | 2000 | 564        |
| US      | Seattle       | 2010 | 608        |
| US      | Seattle       | 2020 | 738        |
| US      | New York City | 2000 | 8015       |
| US      | New York City | 2010 | 8175       |
| US      | New York City | 2020 | 8772       |

##### `PIVOT ON` and `USING` {#docs:current:sql:statements:pivot::pivot-on-and-using}

Use the `PIVOT` statement below to create a separate column for each year and calculate the total population in each.
The `ON` clause specifies which column(s) to split into separate columns.
It is equivalent to the columns parameter in a spreadsheet pivot table.

The `USING` clause determines how to aggregate the values that are split into separate columns.
This is equivalent to the values parameter in a spreadsheet pivot table.
If the `USING` clause is not included, it defaults to `count(*)`.

```sql
PIVOT cities
ON year
USING sum(population);
```

| country |     name      | 2000 | 2010 | 2020 |
|---------|---------------|-----:|-----:|-----:|
| NL      | Amsterdam     | 1005 | 1065 | 1158 |
| US      | Seattle       | 564  | 608  | 738  |
| US      | New York City | 8015 | 8175 | 8772 |

In the above example, the `sum` aggregate is always operating on a single value.
If we only want to change the orientation of how the data is displayed without aggregating, use the `first` aggregate function.
In this example, we are pivoting numeric values, but the `first` function works very well for pivoting out a text column.
(This is something that is difficult to do in a spreadsheet pivot table, but easy in DuckDB!)

This query produces a result that is identical to the one above:

```sql
PIVOT cities
ON year
USING first(population);
```

> **Note.** The SQL syntax permits [`FILTER` clauses](#docs:current:sql:query_syntax:filter) with aggregate functions in the `USING` clause.
> In DuckDB, the `PIVOT` statement currently does not support these and they are silently ignored.

##### `PIVOT ON`, `USING`, and `GROUP BY` {#docs:current:sql:statements:pivot::pivot-on-using-and-group-by}

By default, the `PIVOT` statement retains all columns not specified in the `ON` or `USING` clauses.
To include only certain columns and further aggregate, specify columns in the `GROUP BY` clause.
This is equivalent to the rows parameter of a spreadsheet pivot table.

In the below example, the `name` column is no longer included in the output, and the data is aggregated up to the `country` level.

```sql
PIVOT cities
ON year
USING sum(population)
GROUP BY country;
```

| country | 2000 | 2010 | 2020 |
|---------|-----:|-----:|-----:|
| NL      | 1005 | 1065 | 1158 |
| US      | 8579 | 8783 | 9510 |

##### `IN` Filter for `ON` Clause {#docs:current:sql:statements:pivot::in-filter-for-on-clause}

To only create a separate column for specific values within a column in the `ON` clause, use an optional `IN` expression.
Let's say for example that we wanted to forget about the year 2020 for no particular reason...

```sql
PIVOT cities
ON year IN (2000, 2010)
USING sum(population)
GROUP BY country;
```

| country | 2000 | 2010 |
|---------|-----:|-----:|
| NL      | 1005 | 1065 |
| US      | 8579 | 8783 |

##### Multiple Expressions per Clause {#docs:current:sql:statements:pivot::multiple-expressions-per-clause}

Multiple columns can be specified in the `ON` and `GROUP BY` clauses, and multiple aggregate expressions can be included in the `USING` clause.

###### Multiple `ON` Columns and `ON` Expressions {#docs:current:sql:statements:pivot::multiple-on-columns-and-on-expressions}

Multiple columns can be pivoted out into their own columns.
DuckDB will find the distinct values in each `ON` clause column and create one new column for all combinations of those values (a Cartesian product).

In the below example, all combinations of unique countries and unique cities receive their own column.
Some combinations may not be present in the underlying data, so those columns are populated with `NULL` values.

```sql
PIVOT cities
ON country, name
USING sum(population);
```

| year | NL_Amsterdam | NL_New York City | NL_Seattle | US_Amsterdam | US_New York City | US_Seattle |
|-----:|-------------:|------------------|------------|--------------|-----------------:|-----------:|
| 2000 | 1005         | NULL             | NULL       | NULL         | 8015             | 564        |
| 2010 | 1065         | NULL             | NULL       | NULL         | 8175             | 608        |
| 2020 | 1158         | NULL             | NULL       | NULL         | 8772             | 738        |

To pivot only the combinations of values that are present in the underlying data, use an expression in the `ON` clause.
Multiple expressions and/or columns may be provided.

Here, `country` and `name` are concatenated together and the resulting concatenations each receive their own column.
Any arbitrary non-aggregating expression may be used.
In this case, concatenating with an underscore is used to imitate the naming convention the `PIVOT` clause uses when multiple `ON` columns are provided (like in the prior example).

```sql
PIVOT cities
ON country || '_' || name
USING sum(population);
```

| year | NL_Amsterdam | US_New York City | US_Seattle |
|-----:|-------------:|-----------------:|-----------:|
| 2000 | 1005         | 8015             | 564        |
| 2010 | 1065         | 8175             | 608        |
| 2020 | 1158         | 8772             | 738        |

###### Multiple `USING` Expressions {#docs:current:sql:statements:pivot::multiple-using-expressions}

An alias may also be included for each expression in the `USING` clause.
It will be appended to the generated column names after an underscore (` _`).
This makes the column naming convention much cleaner when multiple expressions are included in the `USING` clause.

In this example, both the `sum` and `max` of the population column are calculated for each year and are split into separate columns.

```sql
PIVOT cities
ON year
USING sum(population) AS total, max(population) AS max
GROUP BY country;
```

| country | 2000_total | 2000_max | 2010_total | 2010_max | 2020_total | 2020_max |
|---------|-----------:|---------:|-----------:|---------:|-----------:|---------:|
| US      | 8579       | 8015     | 8783       | 8175     | 9510       | 8772     |
| NL      | 1005       | 1005     | 1065       | 1065     | 1158       | 1158     |

###### Multiple `GROUP BY` Columns {#docs:current:sql:statements:pivot::multiple-group-by-columns}

Multiple `GROUP BY` columns may also be provided.
Note that column names must be used rather than column positions (1, 2, etc.), and that expressions are not supported in the `GROUP BY` clause.

```sql
PIVOT cities
ON year
USING sum(population)
GROUP BY country, name;
```

| country |     name      | 2000 | 2010 | 2020 |
|---------|---------------|-----:|-----:|-----:|
| NL      | Amsterdam     | 1005 | 1065 | 1158 |
| US      | Seattle       | 564  | 608  | 738  |
| US      | New York City | 8015 | 8175 | 8772 |

##### Using `PIVOT` within a `SELECT` Statement {#docs:current:sql:statements:pivot::using-pivot-within-a-select-statement}

The `PIVOT` statement may be included within a `SELECT` statement as a CTE ([a Common Table Expression, or `WITH` clause](#docs:current:sql:query_syntax:with)), or a subquery.
This allows for a `PIVOT` to be used alongside other SQL logic, as well as for multiple `PIVOT`s to be used in one query.

No `SELECT` is needed within the CTE, the `PIVOT` keyword can be thought of as taking its place.

```sql
WITH pivot_alias AS (
    PIVOT cities
    ON year
    USING sum(population)
    GROUP BY country
)
SELECT * FROM pivot_alias;
```

A `PIVOT` may be used in a subquery and must be wrapped in parentheses.
Note that this behavior is different than the SQL Standard Pivot, as illustrated in subsequent examples.

```sql
SELECT *
FROM (
    PIVOT cities
    ON year
    USING sum(population)
    GROUP BY country
) pivot_alias;
```

##### Multiple `PIVOT` Statements {#docs:current:sql:statements:pivot::multiple-pivot-statements}

Each `PIVOT` can be treated as if it were a `SELECT` node, so they can be joined together or manipulated in other ways.

For example, if two `PIVOT` statements share the same `GROUP BY` expression, they can be joined together using the columns in the `GROUP BY` clause into a wider pivot.

```sql
SELECT *
FROM (PIVOT cities ON year USING sum(population) GROUP BY country) year_pivot
JOIN (PIVOT cities ON name USING sum(population) GROUP BY country) name_pivot
USING (country);
```

| country | 2000 | 2010 | 2020 | Amsterdam | New York City | Seattle |
|---------|-----:|-----:|-----:|----------:|--------------:|--------:|
| NL      | 1005 | 1065 | 1158 | 3228      | NULL          | NULL    |
| US      | 8579 | 8783 | 9510 | NULL      | 24962         | 1910    |

#### Simplified `PIVOT` Full Syntax Diagram {#docs:current:sql:statements:pivot::simplified-pivot-full-syntax-diagram}

Below is the full syntax diagram of the `PIVOT` statement.



#### SQL Standard `PIVOT` Syntax {#docs:current:sql:statements:pivot::sql-standard-pivot-syntax}

The full syntax diagram is below, but the SQL Standard `PIVOT` syntax can be summarized as:

```sql
SELECT *
FROM ⟨dataset⟩
PIVOT (
    ⟨values⟩
    FOR
        ⟨column_1⟩ IN (⟨in_list⟩)
        ⟨column_2⟩ IN (⟨in_list⟩)
        ...
    GROUP BY ⟨rows⟩
);
```

Unlike the simplified syntax, the `IN` clause must be specified for each column to be pivoted.
If you are interested in dynamic pivoting, the simplified syntax is recommended.

Note that no commas separate the expressions in the `FOR` clause, but that `value` and `GROUP BY` expressions must be comma-separated!

#### Examples {#docs:current:sql:statements:pivot::examples}

This example uses a single value expression, a single column expression, and a single row expression:

```sql
SELECT *
FROM cities
PIVOT (
    sum(population)
    FOR
        year IN (2000, 2010, 2020)
    GROUP BY country
);
```

| country | 2000 | 2010 | 2020 |
|---------|-----:|-----:|-----:|
| NL      | 1005 | 1065 | 1158 |
| US      | 8579 | 8783 | 9510 |

This example is somewhat contrived, but serves as an example of using multiple value expressions and multiple columns in the `FOR` clause.

```sql
SELECT *
FROM cities
PIVOT (
    sum(population) AS total,
    count(population) AS count
    FOR
        year IN (2000, 2010)
        country IN ('NL', 'US')
);
```

|     name      | 2000_NL_total | 2000_NL_count | 2000_US_total | 2000_US_count | 2010_NL_total | 2010_NL_count | 2010_US_total | 2010_US_count |
|--|-:|-:|-:|-:|-:|-:|-:|-:|
| Amsterdam     | 1005          | 1             | NULL          | 0             | 1065          | 1             | NULL          | 0             |
| Seattle       | NULL          | 0             | 564           | 1             | NULL          | 0             | 608           | 1             |
| New York City | NULL          | 0             | 8015          | 1             | NULL          | 0             | 8175          | 1             |

##### SQL Standard `PIVOT` Full Syntax Diagram {#docs:current:sql:statements:pivot::sql-standard-pivot-full-syntax-diagram}

Below is the full syntax diagram of the SQL Standard version of the `PIVOT` statement.



#### Limitations {#docs:current:sql:statements:pivot::limitations}

`PIVOT` currently only accepts an aggregate function, expressions are not allowed.
For example, the following query attempts to get the population as the number of people instead of thousands of people (i.e., instead of 564, get 564000):

```sql
PIVOT cities
ON year
USING sum(population) * 1000;
```

However, it fails with the following error:

```console
Catalog Error:
* is not an aggregate function
```

To work around this limitation, perform the `PIVOT` with the aggregation only, then use the [`COLUMNS` expression](#docs:current:sql:expressions:star::columns-expression):

```sql
SELECT country, name, 1000 * COLUMNS(* EXCLUDE (country, name))
FROM (
    PIVOT cities
    ON year
    USING sum(population)
);
```

### Profiling Queries {#docs:current:sql:statements:profiling}

DuckDB supports profiling queries via the `EXPLAIN` and `EXPLAIN ANALYZE` statements.

#### `EXPLAIN` {#docs:current:sql:statements:profiling::explain}

To see the query plan of a query without executing it, run:

```sql
EXPLAIN ⟨query⟩;
```

The output of `EXPLAIN` contains the estimated cardinalities for each operator.

#### `EXPLAIN ANALYZE` {#docs:current:sql:statements:profiling::explain-analyze}

To profile a query, run:

```sql
EXPLAIN ANALYZE ⟨query⟩;
```

The `EXPLAIN ANALYZE` statement runs the query, and shows the actual cardinalities for each operator,
as well as the cumulative wall-clock time spent in each operator.

### SELECT Statement {#docs:current:sql:statements:select}

The `SELECT` statement retrieves rows from the database.

#### Examples {#docs:current:sql:statements:select::examples}

Select all columns from the table `tbl`:

```sql
SELECT * FROM tbl;
```

Select the rows from `tbl`:

```sql
SELECT j FROM tbl WHERE i = 3;
```

Perform an aggregate grouped by the column `i`:

```sql
SELECT i, sum(j) FROM tbl GROUP BY i;
```

Select only the top 3 rows from the `tbl`:

```sql
SELECT * FROM tbl ORDER BY i DESC LIMIT 3;
```

Join two tables together using the `USING` clause:

```sql
SELECT * FROM t1 JOIN t2 USING (a, b);
```

Use column indexes to select the first and third column from the table `tbl`:

```sql
SELECT #1, #3 FROM tbl;
```

Select all unique cities from the addresses table:

```sql
SELECT DISTINCT city FROM addresses;
```

Return a `STRUCT` by using a row variable:

```sql
SELECT d
FROM (SELECT 1 AS a, 2 AS b) d;
```

#### Syntax {#docs:current:sql:statements:select::syntax}

The `SELECT` statement retrieves rows from the database. The canonical order of a `SELECT` statement is as follows, with less common clauses being indented:

```sql
SELECT ⟨select_list⟩
FROM ⟨tables⟩
    USING SAMPLE ⟨sample_expression⟩
WHERE ⟨condition⟩
GROUP BY ⟨groups⟩
HAVING ⟨group_filter⟩
    WINDOW ⟨window_expression⟩
    QUALIFY ⟨qualify_filter⟩
ORDER BY ⟨order_expression⟩
LIMIT ⟨n⟩;
```

Optionally, the `SELECT` statement can be prefixed with a [`WITH` clause](#docs:current:sql:query_syntax:with).

As the `SELECT` statement is so complex, we have split up the syntax diagrams into several parts. The full syntax diagram can be found at the bottom of the page.

##### `SELECT` Clause {#docs:current:sql:statements:select::select-clause}



The [`SELECT` clause](#docs:current:sql:query_syntax:select) specifies the list of columns that will be returned by the query. While it appears first in the clause, *logically* the expressions here are executed only at the end. The `SELECT` clause can contain arbitrary expressions that transform the output, as well as aggregates and window functions. The `DISTINCT` keyword ensures that only unique tuples are returned.

> Column names are case-insensitive. See the [Rules for Case Sensitivity](#docs:current:sql:dialect:keywords_and_identifiers::rules-for-case-sensitivity) for more details.

##### `FROM` Clause {#docs:current:sql:statements:select::from-clause}



The [`FROM` clause](#docs:current:sql:query_syntax:from) specifies the *source* of the data on which the remainder of the query should operate. Logically, the `FROM` clause is where the query starts execution. The `FROM` clause can contain a single table, a combination of multiple tables that are joined together, or another `SELECT` query inside a subquery node.

##### `SAMPLE` Clause {#docs:current:sql:statements:select::sample-clause}



The [`SAMPLE` clause](#docs:current:sql:query_syntax:sample) allows you to run the query on a sample from the base table. This can significantly speed up processing of queries, at the expense of accuracy in the result. Samples can also be used to quickly see a snapshot of the data when exploring a dataset. The `SAMPLE` clause is applied right after anything in the `FROM` clause (i.e., after any joins, but before the where clause or any aggregates). See the [Samples](#docs:current:sql:samples) page for more information.

##### `WHERE` Clause {#docs:current:sql:statements:select::where-clause}



The [`WHERE` clause](#docs:current:sql:query_syntax:where) specifies any filters to apply to the data. This allows you to select only a subset of the data in which you are interested. Logically the `WHERE` clause is applied immediately after the `FROM` clause.

##### `GROUP BY` and `HAVING` Clauses {#docs:current:sql:statements:select::group-by-and-having-clauses}



The [`GROUP BY` clause](#docs:current:sql:query_syntax:groupby) specifies which grouping columns should be used to perform any aggregations in the `SELECT` clause. If the `GROUP BY` clause is specified, the query is always an aggregate query, even if no aggregations are present in the `SELECT` clause.

##### `WINDOW` Clause {#docs:current:sql:statements:select::window-clause}



The [`WINDOW` clause](#docs:current:sql:query_syntax:window) allows you to specify named windows that can be used within window functions. These are useful when you have multiple window functions, as they allow you to avoid repeating the same window clause.

##### `QUALIFY` Clause {#docs:current:sql:statements:select::qualify-clause}



The [`QUALIFY` clause](#docs:current:sql:query_syntax:qualify) is used to filter the result of [`WINDOW` functions](#docs:current:sql:functions:window_functions).

##### `ORDER BY`, `LIMIT` and `OFFSET` Clauses {#docs:current:sql:statements:select::order-by-limit-and-offset-clauses}



[`ORDER BY`](#docs:current:sql:query_syntax:orderby), [`LIMIT` and `OFFSET`](#docs:current:sql:query_syntax:limit) are output modifiers.
Logically they are applied at the very end of the query.
The `ORDER BY` clause sorts the rows on the sorting criteria in either ascending or descending order.
The `LIMIT` clause restricts the amount of rows fetched, while the `OFFSET` clause indicates at which position to start reading the values.

##### `VALUES` List {#docs:current:sql:statements:select::values-list}



[A `VALUES` list](#docs:current:sql:query_syntax:values) is a set of values that is supplied instead of a `SELECT` statement.

##### Row IDs {#docs:current:sql:statements:select::row-ids}

For each table, the [`rowid` pseudocolumn](https://docs.oracle.com/cd/B19306_01/server.102/b14200/pseudocolumns008.htm) returns the row identifiers based on the physical storage.

```sql
CREATE TABLE t (id INTEGER, content VARCHAR);
INSERT INTO t VALUES (42, 'hello'), (43, 'world');
SELECT rowid, id, content FROM t;
```

| rowid | id | content |
|------:|---:|---------|
| 0     | 42 | hello   |
| 1     | 43 | world   |

In the current storage, these identifiers are contiguous unsigned integers (0, 1, ...) if no rows were deleted.
Deletions introduce gaps in the rowids which may be reclaimed later:

```sql
CREATE OR REPLACE TABLE t AS (FROM range(10) r(i));
DELETE FROM t WHERE i % 2 = 0;
SELECT rowid FROM t;
```

| rowid |
|------:|
| 1     |
| 3     |
| 5     |
| 7     |
| 9     |

The `rowid` values are stable within a transaction.

> **Best practice.** It is strongly advised to avoid using rowids as identifiers.

> If there is a user-defined column named `rowid`, it shadows the `rowid` pseudocolumn.

##### Common Table Expressions {#docs:current:sql:statements:select::common-table-expressions}



#### Full Syntax Diagram {#docs:current:sql:statements:select::full-syntax-diagram}

Below is the full syntax diagram of the `SELECT` statement:


### SET and RESET Statements {#docs:current:sql:statements:set}

The `SET` statement modifies the provided DuckDB [configuration option](#docs:current:configuration:overview) at the specified scope.

#### Examples {#docs:current:sql:statements:set::examples}

Update the `memory_limit` configuration value:

```sql
SET memory_limit = '10GB';
```

Configure the system to use `1` thread:

```sql
SET threads = 1;
```

Or use the `TO` keyword:

```sql
SET threads TO 1;
```

Change configuration option to default value:

```sql
RESET threads;
```

Retrieve configuration value:

```sql
SELECT current_setting('threads');
```

Set the default collation for the session:

```sql
SET SESSION default_collation = 'nocase';
```

##### Set a Global Variable {#docs:current:sql:statements:set::set-a-global-variable}

Set the default sort order globally:

```sql
SET GLOBAL sort_order = 'desc';
```

Set the default threads globally: 

```sql
SET GLOBAL threads = 4;
```

#### Syntax {#docs:current:sql:statements:set::syntax}



`SET` updates a DuckDB configuration option to the provided value.

#### `RESET` {#docs:current:sql:statements:set::reset}



The `RESET` statement changes the given DuckDB configuration option to the default value.

#### Scopes {#docs:current:sql:statements:set::scopes}

Configuration options can have different scopes:

* `GLOBAL`: Configuration value is used (or reset) across the entire DuckDB instance.
* `SESSION`: Configuration value is used (or reset) only for the current session attached to a DuckDB instance.
* `LOCAL`: Not yet implemented.

When not specified, the default scope for the configuration option is used. For most options this is `GLOBAL`.

#### Configuration {#docs:current:sql:statements:set::configuration}

See the [Configuration](#docs:current:configuration:overview) page for the full list of configuration options.

### SET VARIABLE and RESET VARIABLE Statements {#docs:current:sql:statements:set_variable}

DuckDB supports the definition of SQL-level variables using the `SET VARIABLE` and `RESET VARIABLE` statements.

#### Variable Scopes {#docs:current:sql:statements:set_variable::variable-scopes}

DuckDB supports two levels of variable scopes:

| Scope | Description |
|---|---|
| `SESSION` | Variables with a `SESSION` scope are local to you and only affect the current session. | 
| `GLOBAL` | Variables with a `GLOBAL` scope are specific [configuration option variables](https://duckdb.org/docs/lts/configuration/overview.html#global-configuration-options) that affect the entire DuckDB instance and all sessions. For example, see [Set a Global Variable](#docs:current:sql:statements:set::set-a-global-variable). |

#### `SET VARIABLE` {#docs:current:sql:statements:set_variable::set-variable}

The `SET VARIABLE` statement assigns a value to a variable, which can be accessed using the `getvariable` call:

```sql
SET VARIABLE my_var = 30;
SELECT 20 + getvariable('my_var') AS total;
```

| total |
|------:|
| 50    |

If `SET VARIABLE` is invoked on an existing variable, it will overwrite its value:

```sql
SET VARIABLE my_var = 30;
SET VARIABLE my_var = 100;
SELECT 20 + getvariable('my_var') AS total;
```

| total |
|------:|
| 120   |

Variables can have different types:

```sql
SET VARIABLE my_date = DATE '2018-07-13';
SET VARIABLE my_string = 'Hello world';
SET VARIABLE my_map = MAP {'k1': 10, 'k2': 20};
```

Variables can also be assigned to results of queries:

```sql
-- write some CSV files
COPY (SELECT 42 AS a) TO 'test1.csv';
COPY (SELECT 84 AS a) TO 'test2.csv';

-- add a list of CSV files to a table
CREATE TABLE csv_files (file VARCHAR);
INSERT INTO csv_files VALUES ('test1.csv'), ('test2.csv');

-- initialize a variable with the list of csv files
SET VARIABLE list_of_files = (SELECT list(file) FROM csv_files);

-- read the CSV files
SELECT * FROM read_csv(getvariable('list_of_files'), filename := True);
```

| a    | filename    |
|-----:|------------:|
| 42   | test.csv    |
| 84   | test2.csv   |

If a variable is not set, the `getvariable` function returns `NULL`:

```sql
SELECT getvariable('undefined_var') AS result;
```

| result |
|--------|
| NULL   |

The `getvariable` function can also be used in a [`COLUMNS` expression](#docs:current:sql:expressions:star::columns-expression):

```sql
SET VARIABLE column_to_exclude = 'col1';
CREATE TABLE tbl AS SELECT 12 AS col0, 34 AS col1, 56 AS col2;
SELECT COLUMNS(c -> c != getvariable('column_to_exclude')) FROM tbl;
```

| col0 | col2 |
|-----:|-----:|
| 12   | 56   |

##### Syntax {#docs:current:sql:statements:set_variable::syntax}



#### `RESET VARIABLE` {#docs:current:sql:statements:set_variable::reset-variable}

The `RESET VARIABLE` statement unsets a variable.

```sql
SET VARIABLE my_var = 30;
RESET VARIABLE my_var;
SELECT getvariable('my_var') AS my_var;
```

| my_var |
|--------|
| NULL   |

##### Syntax {#docs:current:sql:statements:set_variable::syntax}


### SHOW, SHOW DATABASES, and SHOW SCHEMAS Statements {#docs:current:sql:statements:show}

#### `SHOW` Statement {#docs:current:sql:statements:show::show-statement}

The `SHOW` statement is an alias for [`DESCRIBE`](#docs:current:sql:statements:describe).
It shows the schema of a table, view or query.

#### `SHOW DATABASES` Statement {#docs:current:sql:statements:show::show-databases-statement}

The `SHOW DATABASES` statement shows a list of all attached databases:

```sql
ATTACH 'my.duckdb' AS my_database;
SHOW DATABASES;
```

| database_name |
|---------------|
| memory        |
| my_database   |

```sql
DETACH my_database;
SHOW DATABASES;
```

| database_name |
|---------------|
| memory        |

#### `SHOW SCHEMAS` Statement {#docs:current:sql:statements:show::show-schemas-statement}

> This statement was introduced in DuckDB v1.5.

The `SHOW SCHEMAS` statement shows a list of all schemas across non-internal databases:

```sql
SHOW SCHEMAS;
```

| database_name | schema_name        | current |
|---------------|--------------------|---------|
| memory        | main               | true    |
| memory        | pg_catalog         | false   |
| memory        | information_schema | false   |

The `current` column indicates which schema is the default schema (set via the [`USE` statement](#docs:current:sql:statements:use)).

### SUMMARIZE Statement {#docs:current:sql:statements:summarize}

The `SUMMARIZE` statement returns summary statistics for a table, view or a query.

#### Usage {#docs:current:sql:statements:summarize::usage}

```sql
SUMMARIZE tbl;
```

To summarize a query, prepend `SUMMARIZE` to a query.

```sql
SUMMARIZE SELECT * FROM tbl;
```

#### See Also {#docs:current:sql:statements:summarize::see-also}

For more examples, see the [guide on `SUMMARIZE`](#docs:current:guides:meta:summarize).

### Transaction Management {#docs:current:sql:statements:transactions}

DuckDB supports [ACID database transactions](https://en.wikipedia.org/wiki/Database_transaction).
Transactions provide isolation, i.e., changes made by a transaction are not visible from concurrent transactions until it is committed.
A transaction can also be aborted, which discards any changes it made so far.

#### Statements {#docs:current:sql:statements:transactions::statements}

DuckDB provides the following statements for transaction management.

##### Starting a Transaction {#docs:current:sql:statements:transactions::starting-a-transaction}

To start a transaction, run:

```sql
BEGIN TRANSACTION;
```

##### Committing a Transaction {#docs:current:sql:statements:transactions::committing-a-transaction}

You can commit a transaction to make it visible to other transactions and to write it to persistent storage (if using DuckDB in persistent mode).
To commit a transaction, run:

```sql
COMMIT;
```

If you are not in an active transaction, the `COMMIT` statement will fail.

##### Rolling Back a Transaction {#docs:current:sql:statements:transactions::rolling-back-a-transaction}

You can abort a transaction.
This operation, also known as rolling back, will discard any changes the transaction made to the database.
To abort a transaction, run:

```sql
ROLLBACK;
```

You can also use the abort command, which has an identical behavior:

```sql
ABORT;
```

If you are not in an active transaction, the `ROLLBACK` and `ABORT` statements will fail.

#### Multi-Statement Transactions {#docs:current:sql:statements:transactions::multi-statement-transactions}

When multiple SQL statements are submitted together (e.g., separated by semicolons), they are executed within a single implicit transaction. If any statement fails, all preceding statements in the batch are rolled back. This also applies to `PRAGMA` commands that decompose into multiple internal operations, such as `COPY FROM DATABASE`.

#### Isolation Level {#docs:current:sql:statements:transactions::isolation-level}

DuckDB's concurrency model guarantees snapshot isolation. Transactions that violate this isolation level are aborted.

Using [PostgreSQL's transaction isolation levels](https://www.postgresql.org/docs/current/transaction-iso.html), DuckDB guarantees *repeatable reads*.

#### Example {#docs:current:sql:statements:transactions::example}

We illustrate the use of transactions through a simple example.

```sql
CREATE TABLE person (name VARCHAR, age BIGINT);

BEGIN TRANSACTION;
INSERT INTO person VALUES ('Ada', 52);
COMMIT;

BEGIN TRANSACTION;
DELETE FROM person WHERE name = 'Ada';
INSERT INTO person VALUES ('Bruce', 39);
ROLLBACK;

SELECT * FROM person;
```

The first transaction (inserting “Ada”) was committed but the second (deleting “Ada” and inserting “Bruce”) was aborted.
Therefore, the resulting table will only contain `<'Ada', 52>`.

### UNPIVOT Statement {#docs:current:sql:statements:unpivot}

The `UNPIVOT` statement allows multiple columns to be stacked into fewer columns.
In the basic case, multiple columns are stacked into two columns: a `NAME` column (which contains the name of the source column) and a `VALUE` column (which contains the value from the source column).

DuckDB implements both the SQL Standard `UNPIVOT` syntax and a simplified `UNPIVOT` syntax.
Both can utilize a [`COLUMNS` expression](#docs:current:sql:expressions:star::columns) to automatically detect the columns to unpivot.
`PIVOT_LONGER` may also be used in place of the `UNPIVOT` keyword.

For details on how the `UNPIVOT` statement is implemented, see the [Pivot Internals site](#docs:current:internals:pivot::unpivot).

> The [`PIVOT` statement](#docs:current:sql:statements:pivot) is the inverse of the `UNPIVOT` statement.

#### Simplified `UNPIVOT` Syntax {#docs:current:sql:statements:unpivot::simplified-unpivot-syntax}

The full syntax diagram is below, but the simplified `UNPIVOT` syntax can be summarized using spreadsheet pivot table naming conventions as:

```sql
UNPIVOT ⟨dataset⟩
ON ⟨column(s)⟩
INTO
    NAME ⟨name_column_name⟩
    VALUE ⟨value_column_name(s)⟩
ORDER BY ⟨column(s)_with_order_direction(s)⟩
LIMIT ⟨number_of_rows⟩;
```

##### Example Data {#docs:current:sql:statements:unpivot::example-data}

All examples use the dataset produced by the queries below:

```sql
CREATE OR REPLACE TABLE monthly_sales
    (empid INTEGER, dept TEXT, Jan INTEGER, Feb INTEGER, Mar INTEGER, Apr INTEGER, May INTEGER, Jun INTEGER);
INSERT INTO monthly_sales VALUES
    (1, 'electronics', 1, 2, 3, 4, 5, 6),
    (2, 'clothes', 10, 20, 30, 40, 50, 60),
    (3, 'cars', 100, 200, 300, 400, 500, 600);
```

```sql
FROM monthly_sales;
```

| empid |    dept     | Jan | Feb | Mar | Apr | May | Jun |
|------:|-------------|----:|----:|----:|----:|----:|----:|
| 1     | electronics | 1   | 2   | 3   | 4   | 5   | 6   |
| 2     | clothes     | 10  | 20  | 30  | 40  | 50  | 60  |
| 3     | cars        | 100 | 200 | 300 | 400 | 500 | 600 |



##### `UNPIVOT` Manually {#docs:current:sql:statements:unpivot::unpivot-manually}

The most typical `UNPIVOT` transformation is to take already pivoted data and re-stack it into a column each for the name and value.
In this case, all months will be stacked into a `month` column and a `sales` column.

```sql
UNPIVOT monthly_sales
ON jan, feb, mar, apr, may, jun
INTO
    NAME month
    VALUE sales;
```

| empid |    dept     | month | sales |
|------:|-------------|-------|------:|
| 1     | electronics | Jan   | 1     |
| 1     | electronics | Feb   | 2     |
| 1     | electronics | Mar   | 3     |
| 1     | electronics | Apr   | 4     |
| 1     | electronics | May   | 5     |
| 1     | electronics | Jun   | 6     |
| 2     | clothes     | Jan   | 10    |
| 2     | clothes     | Feb   | 20    |
| 2     | clothes     | Mar   | 30    |
| 2     | clothes     | Apr   | 40    |
| 2     | clothes     | May   | 50    |
| 2     | clothes     | Jun   | 60    |
| 3     | cars        | Jan   | 100   |
| 3     | cars        | Feb   | 200   |
| 3     | cars        | Mar   | 300   |
| 3     | cars        | Apr   | 400   |
| 3     | cars        | May   | 500   |
| 3     | cars        | Jun   | 600   |

##### `UNPIVOT` Dynamically Using `COLUMNS` Expression {#docs:current:sql:statements:unpivot::unpivot-dynamically-using-columns-expression}

In many cases, the number of columns to unpivot is not easy to predetermine ahead of time.
In the case of this dataset, the query above would have to change each time a new month is added.
The [`COLUMNS` expression](#docs:current:sql:expressions:star::columns-expression) can be used to select all columns that are not `empid` or `dept`.
This enables dynamic unpivoting that will work regardless of how many months are added.
The query below returns identical results to the one above.

```sql
UNPIVOT monthly_sales
ON COLUMNS(* EXCLUDE (empid, dept))
INTO
    NAME month
    VALUE sales;
```

| empid |    dept     | month | sales |
|------:|-------------|-------|------:|
| 1     | electronics | Jan   | 1     |
| 1     | electronics | Feb   | 2     |
| 1     | electronics | Mar   | 3     |
| 1     | electronics | Apr   | 4     |
| 1     | electronics | May   | 5     |
| 1     | electronics | Jun   | 6     |
| 2     | clothes     | Jan   | 10    |
| 2     | clothes     | Feb   | 20    |
| 2     | clothes     | Mar   | 30    |
| 2     | clothes     | Apr   | 40    |
| 2     | clothes     | May   | 50    |
| 2     | clothes     | Jun   | 60    |
| 3     | cars        | Jan   | 100   |
| 3     | cars        | Feb   | 200   |
| 3     | cars        | Mar   | 300   |
| 3     | cars        | Apr   | 400   |
| 3     | cars        | May   | 500   |
| 3     | cars        | Jun   | 600   |

##### `UNPIVOT` into Multiple Value Columns {#docs:current:sql:statements:unpivot::unpivot-into-multiple-value-columns}

The `UNPIVOT` statement has additional flexibility: more than 2 destination columns are supported.
This can be useful when the goal is to reduce the extent to which a dataset is pivoted, but not completely stack all pivoted columns.
To demonstrate this, the query below will generate a dataset with a separate column for the number of each month within the quarter (month 1, 2, or 3), and a separate row for each quarter.
Since there are fewer quarters than months, this does make the dataset longer, but not as long as the above.

To accomplish this, multiple sets of columns are included in the `ON` clause.
The `q1` and `q2` aliases are optional.
The number of columns in each set of columns in the `ON` clause must match the number of columns in the `VALUE` clause.

```sql
UNPIVOT monthly_sales
    ON (jan, feb, mar) AS q1, (apr, may, jun) AS q2
    INTO
        NAME quarter
        VALUE month_1_sales, month_2_sales, month_3_sales;
```

| empid |    dept     | quarter | month_1_sales | month_2_sales | month_3_sales |
|------:|-------------|---------|--------------:|--------------:|--------------:|
| 1     | electronics | q1      | 1             | 2             | 3             |
| 1     | electronics | q2      | 4             | 5             | 6             |
| 2     | clothes     | q1      | 10            | 20            | 30            |
| 2     | clothes     | q2      | 40            | 50            | 60            |
| 3     | cars        | q1      | 100           | 200           | 300           |
| 3     | cars        | q2      | 400           | 500           | 600           |

##### Using `UNPIVOT` within a `SELECT` Statement {#docs:current:sql:statements:unpivot::using-unpivot-within-a-select-statement}

The `UNPIVOT` statement may be included within a `SELECT` statement as a CTE ([a Common Table Expression, or WITH clause](#docs:current:sql:query_syntax:with)), or a subquery.
This allows for an `UNPIVOT` to be used alongside other SQL logic, as well as for multiple `UNPIVOT`s to be used in one query.

No `SELECT` is needed within the CTE, the `UNPIVOT` keyword can be thought of as taking its place.

```sql
WITH unpivot_alias AS (
    UNPIVOT monthly_sales
    ON COLUMNS(* EXCLUDE (empid, dept))
    INTO
        NAME month
        VALUE sales
)
SELECT * FROM unpivot_alias;
```

An `UNPIVOT` may be used in a subquery and must be wrapped in parentheses.
Note that this behavior is different than the SQL Standard Unpivot, as illustrated in subsequent examples.

```sql
SELECT *
FROM (
    UNPIVOT monthly_sales
    ON COLUMNS(* EXCLUDE (empid, dept))
    INTO
        NAME month
        VALUE sales
) unpivot_alias;
```

##### Expressions within `UNPIVOT` Statements {#docs:current:sql:statements:unpivot::expressions-within-unpivot-statements}

DuckDB allows expressions within the `UNPIVOT` statements, provided that they only involve a single column. These can be used to perform computations as well as [explicit casts](#docs:current:sql:data_types:typecasting::explicit-casting). For example:

```sql
UNPIVOT
    (SELECT 42 AS col1, 'woot' AS col2)
    ON
        (col1 * 2)::VARCHAR,
        col2;
```

| name | value |
|------|-------|
| col1 | 84    |
| col2 | woot  |

##### Simplified `UNPIVOT` Full Syntax Diagram {#docs:current:sql:statements:unpivot::simplified-unpivot-full-syntax-diagram}

Below is the full syntax diagram of the `UNPIVOT` statement.



#### SQL Standard `UNPIVOT` Syntax {#docs:current:sql:statements:unpivot::sql-standard-unpivot-syntax}

The full syntax diagram is below, but the SQL Standard `UNPIVOT` syntax can be summarized as:

```sql
FROM [dataset]
UNPIVOT [INCLUDE NULLS] (
    [value-column-name(s)]
    FOR [name-column-name] IN [column(s)]
);
```

Note that only one column can be included in the `name-column-name` expression.

##### SQL Standard `UNPIVOT` Manually {#docs:current:sql:statements:unpivot::sql-standard-unpivot-manually}

To complete the basic `UNPIVOT` operation using the SQL standard syntax, only a few additions are needed.

```sql
FROM monthly_sales UNPIVOT (
    sales
    FOR month IN (jan, feb, mar, apr, may, jun)
);
```

| empid |    dept     | month | sales |
|------:|-------------|-------|------:|
| 1     | electronics | Jan   | 1     |
| 1     | electronics | Feb   | 2     |
| 1     | electronics | Mar   | 3     |
| 1     | electronics | Apr   | 4     |
| 1     | electronics | May   | 5     |
| 1     | electronics | Jun   | 6     |
| 2     | clothes     | Jan   | 10    |
| 2     | clothes     | Feb   | 20    |
| 2     | clothes     | Mar   | 30    |
| 2     | clothes     | Apr   | 40    |
| 2     | clothes     | May   | 50    |
| 2     | clothes     | Jun   | 60    |
| 3     | cars        | Jan   | 100   |
| 3     | cars        | Feb   | 200   |
| 3     | cars        | Mar   | 300   |
| 3     | cars        | Apr   | 400   |
| 3     | cars        | May   | 500   |
| 3     | cars        | Jun   | 600   |

##### SQL Standard `UNPIVOT` Dynamically Using the `COLUMNS` Expression {#docs:current:sql:statements:unpivot::sql-standard-unpivot-dynamically-using-the-columns-expression}

The [`COLUMNS` expression](#docs:current:sql:expressions:star::columns) can be used to determine the `IN` list of columns dynamically.
This will continue to work even if additional `month` columns are added to the dataset.
It produces the same result as the query above.

```sql
FROM monthly_sales UNPIVOT (
    sales
    FOR month IN (columns(* EXCLUDE (empid, dept)))
);
```

##### SQL Standard `UNPIVOT` into Multiple Value Columns {#docs:current:sql:statements:unpivot::sql-standard-unpivot-into-multiple-value-columns}

The `UNPIVOT` statement has additional flexibility: more than 2 destination columns are supported.
This can be useful when the goal is to reduce the extent to which a dataset is pivoted, but not completely stack all pivoted columns.
To demonstrate this, the query below will generate a dataset with a separate column for the number of each month within the quarter (month 1, 2, or 3), and a separate row for each quarter.
Since there are fewer quarters than months, this does make the dataset longer, but not as long as the above.

To accomplish this, multiple columns are included in the `value-column-name` portion of the `UNPIVOT` statement.
Multiple sets of columns are included in the `IN` clause.
The `q1` and `q2` aliases are optional.
The number of columns in each set of columns in the `IN` clause must match the number of columns in the `value-column-name` portion.

```sql
FROM monthly_sales
UNPIVOT (
    (month_1_sales, month_2_sales, month_3_sales)
    FOR quarter IN (
        (jan, feb, mar) AS q1,
        (apr, may, jun) AS q2
    )
);
```

| empid |    dept     | quarter | month_1_sales | month_2_sales | month_3_sales |
|------:|-------------|---------|--------------:|--------------:|--------------:|
| 1     | electronics | q1      | 1             | 2             | 3             |
| 1     | electronics | q2      | 4             | 5             | 6             |
| 2     | clothes     | q1      | 10            | 20            | 30            |
| 2     | clothes     | q2      | 40            | 50            | 60            |
| 3     | cars        | q1      | 100           | 200           | 300           |
| 3     | cars        | q2      | 400           | 500           | 600           |

##### SQL Standard `UNPIVOT` Full Syntax Diagram {#docs:current:sql:statements:unpivot::sql-standard-unpivot-full-syntax-diagram}

Below is the full syntax diagram of the SQL Standard version of the `UNPIVOT` statement.


### UPDATE Statement {#docs:current:sql:statements:update}

The `UPDATE` statement modifies the values of rows in a table.

#### Examples {#docs:current:sql:statements:update::examples}

For every row where `i` is `NULL`, set the value to 0 instead:

```sql
UPDATE tbl
SET i = 0
WHERE i IS NULL;
```

Set all values of `i` to 1 and all values of `j` to 2:

```sql
UPDATE tbl
SET i = 1, j = 2;
```

#### Syntax {#docs:current:sql:statements:update::syntax}



`UPDATE` changes the values of the specified columns in all rows that satisfy the condition. Only the columns to be modified need be mentioned in the `SET` clause; columns not explicitly modified retain their previous values.

#### Update from Other Table {#docs:current:sql:statements:update::update-from-other-table}

A table can be updated based upon values from another table. This can be done by specifying a table in a `FROM` clause, or using a sub-select statement. Both approaches have the benefit of completing the `UPDATE` operation in bulk for increased performance.

```sql
CREATE OR REPLACE TABLE original AS
    SELECT 1 AS key, 'original value' AS value
    UNION ALL
    SELECT 2 AS key, 'original value 2' AS value;

CREATE OR REPLACE TABLE new AS
    SELECT 1 AS key, 'new value' AS value
    UNION ALL
    SELECT 2 AS key, 'new value 2' AS value;

SELECT *
FROM original;
```

| key |      value       |
|-----|------------------|
| 1   | original value   |
| 2   | original value 2 |

```sql
UPDATE original
    SET value = new.value
    FROM new
    WHERE original.key = new.key;
```

Or:

```sql
UPDATE original
    SET value = (
        SELECT
            new.value
        FROM new
        WHERE original.key = new.key
    );
```

```sql
SELECT *
FROM original;
```

| key |    value    |
|-----|-------------|
| 1   | new value   |
| 2   | new value 2 |

#### Update from Same Table {#docs:current:sql:statements:update::update-from-same-table}

The only difference between this case and the above is that a different table alias must be specified on both the target table and the source table.
In this example `AS true_original` and `AS new` are both required.

```sql
UPDATE original AS true_original
    SET value = (
        SELECT
            new.value || ' a change!' AS value
        FROM original AS new
        WHERE true_original.key = new.key
    );
```

#### Update Using Joins {#docs:current:sql:statements:update::update-using-joins}

To select the rows to update, `UPDATE` statements can use the `FROM` clause and express joins via the `WHERE` clause. For example:

```sql
CREATE TABLE city (name VARCHAR, revenue BIGINT, country_code VARCHAR);
CREATE TABLE country (code VARCHAR, name VARCHAR);
INSERT INTO city VALUES ('Paris', 700, 'FR'), ('Lyon', 200, 'FR'), ('Brussels', 400, 'BE');
INSERT INTO country VALUES ('FR', 'France'), ('BE', 'Belgium');
```

To increase the revenue of all cities in France, join the `city` and the `country` tables, and filter on the latter:

```sql
UPDATE city
SET revenue = revenue + 100
FROM country
WHERE city.country_code = country.code
  AND country.name = 'France';
```

```sql
SELECT *
FROM city;
```

|   name   | revenue | country_code |
|----------|--------:|--------------|
| Paris    | 800     | FR           |
| Lyon     | 300     | FR           |
| Brussels | 400     | BE           |

#### Upsert (Insert or Update) {#docs:current:sql:statements:update::upsert-insert-or-update}

See the [Insert documentation](#docs:current:sql:statements:insert::on-conflict-clause) for details.

### USE Statement {#docs:current:sql:statements:use}

The `USE` statement selects a database and optional schema, or just a schema to use as the default.

#### Examples {#docs:current:sql:statements:use::examples}

```sql
--- Sets the 'memory' database as the default. Will use 'main' schema implicitly or error
--- if it does not exist.
USE memory;
--- Sets the 'duck.main' database and schema as the default
USE duck.main;
-- Sets the `main` schema of the currently selected database as the default, in this case 'duck.main'
USE main;
```

#### Syntax {#docs:current:sql:statements:use::syntax}



The `USE` statement sets a default database, schema or database/schema combination to use for
future operations. For instance, tables created without providing a fully qualified
table name will be created in the default database.

### VACUUM Statement {#docs:current:sql:statements:vacuum}

The `VACUUM` statement only has basic support in DuckDB and is mostly provided for PostgreSQL-compatibility.

Some variants of it, such as when calling for a given column, recompute the distinct statistics (the number of distinct entities) if they have become stale due to updates.

> **Warning.** The behavior of `VACUUM` is not consistent with PostgreSQL semantics and it is likely going to change in the future.

#### Examples {#docs:current:sql:statements:vacuum::examples}

No-op:

```sql
VACUUM;
```

No-op:

```sql
VACUUM ANALYZE;
```

Calling `VACUUM` on a given table-column pair rebuilds statistics for the table and column:

```sql
VACUUM my_table(my_column);
```

Rebuild statistics for the table and column:

```sql
VACUUM ANALYZE my_table(my_column);
```

The following operation is not supported:

```sql
VACUUM FULL;
```

```console
Not implemented Error:
Full vacuum option
```

#### Vacuum with Indexes {#docs:current:sql:statements:vacuum::vacuum-with-indexes}

> **Warning.** This feature is experimental.

By default, `VACUUM` skips tables that have ART indexes. The `vacuum_rebuild_indexes` setting enables vacuum to compact row groups on tables with indexes by rebuilding the indexes afterward. The setting specifies a row count threshold: tables exceeding the threshold are skipped. Set to `0` to disable (the default).

```sql
SET vacuum_rebuild_indexes = 1000000;
```

#### Reclaiming Space {#docs:current:sql:statements:vacuum::reclaiming-space}

The `VACUUM` statement does not reclaim space.
For instructions on reclaiming space, refer to the [“Reclaiming space” page](#docs:current:operations_manual:footprint_of_duckdb:reclaiming_space).

#### Syntax {#docs:current:sql:statements:vacuum::syntax}


## Query Syntax {#sql:query_syntax}

### SELECT Clause {#docs:current:sql:query_syntax:select}

The `SELECT` clause specifies the list of columns that will be returned by the query. While it appears first in the clause, *logically* the expressions here are executed only at the end. The `SELECT` clause can contain arbitrary expressions that transform the output, as well as aggregates and window functions.

#### Examples {#docs:current:sql:query_syntax:select::examples}

Select all columns from the table called `tbl`:

```sql
SELECT * FROM tbl;
```

Perform arithmetic on the columns in a table, and provide an alias:

```sql
SELECT col1 + col2 AS res, sqrt(col1) AS root FROM tbl;
```

Use prefix aliases:

```sql
SELECT
    res: col1 + col2,
    root: sqrt(col1)
FROM tbl;
```

Select all unique cities from the `addresses` table:

```sql
SELECT DISTINCT city FROM addresses;
```

Return the total number of rows in the `addresses` table:

```sql
SELECT count(*) FROM addresses;
```

Select all columns except the city column from the `addresses` table:

```sql
SELECT * EXCLUDE (city) FROM addresses;
```

Select all columns from the `addresses` table, but replace `city` with `lower(city)`:

```sql
SELECT * REPLACE (lower(city) AS city) FROM addresses;
```

Select all columns matching the given regular expression from the table:

```sql
SELECT COLUMNS('number\d+') FROM addresses;
```

Compute a function on all given columns of a table:

```sql
SELECT min(COLUMNS(*)) FROM addresses;
```

To select columns with spaces or special characters, use double quotes (` "`):

```sql
SELECT "Some Column Name" FROM tbl;
```

#### Syntax {#docs:current:sql:query_syntax:select::syntax}



#### `SELECT` List {#docs:current:sql:query_syntax:select::select-list}

The `SELECT` clause contains a list of expressions that specify the result of a query. The select list can refer to any columns in the `FROM` clause, and combine them using expressions. As the output of a SQL query is a table – every expression in the `SELECT` clause also has a name. The expressions can be explicitly named using the `AS` clause (e.g., `expr AS name`). If a name is not provided by the user the expressions are named automatically by the system.

> Column names are case-insensitive. See the [Rules for Case Sensitivity](#docs:current:sql:dialect:keywords_and_identifiers::rules-for-case-sensitivity) for more details.

##### Star Expressions {#docs:current:sql:query_syntax:select::star-expressions}

Select all columns from the table called `tbl`:

```sql
SELECT *
FROM tbl;
```

Select all columns matching the given regular expression from the table:

```sql
SELECT COLUMNS('number\d+')
FROM addresses;
```

The [star expression](#docs:current:sql:expressions:star) is a special expression that expands to *multiple expressions* based on the contents of the `FROM` clause. In the simplest case, `*` expands to **all** expressions in the `FROM` clause. Columns can also be selected using regular expressions or lambda functions. See the [star expression page](#docs:current:sql:expressions:star) for more details.

##### `DISTINCT` Clause {#docs:current:sql:query_syntax:select::distinct-clause}

Select all unique cities from the addresses table:

```sql
SELECT DISTINCT city
FROM addresses;
```

The `DISTINCT` clause can be used to return **only** the unique rows in the result – so that any duplicate rows are filtered out.

> Queries starting with `SELECT DISTINCT` run deduplication, which is an expensive operation. Therefore, only use `DISTINCT` if necessary.

##### `DISTINCT ON` Clause {#docs:current:sql:query_syntax:select::distinct-on-clause}

Select only the highest population city for each country:

```sql
SELECT DISTINCT ON(country) city, population
FROM cities
ORDER BY population DESC;
```

The `DISTINCT ON` clause returns only one row per unique value in the set of expressions as defined in the `ON` clause. If an `ORDER BY` clause is present, the row that is returned is the first row that is encountered as per the `ORDER BY` criteria. If an `ORDER BY` clause is not present, the first row that is encountered is not defined and can be any row in the table.

> When querying large datasets, using `DISTINCT` on all columns can be expensive. Therefore, consider using `DISTINCT ON` on a column (or a set of columns) which guarantees a sufficient degree of uniqueness for your results. For example, using `DISTINCT ON` on the key column(s) of a table guarantees full uniqueness.

##### Aggregates {#docs:current:sql:query_syntax:select::aggregates}

Return the total number of rows in the addresses table:

```sql
SELECT count(*)
FROM addresses;
```

Return the total number of rows in the addresses table grouped by city:

```sql
SELECT city, count(*)
FROM addresses
GROUP BY city;
```

[Aggregate functions](#docs:current:sql:functions:aggregates) are special functions that *combine* multiple rows into a single value. When aggregate functions are present in the `SELECT` clause, the query is turned into an aggregate query. In an aggregate query, **all** expressions must either be part of an aggregate function, or part of a group (as specified by the [`GROUP BY clause`](#docs:current:sql:query_syntax:groupby)).

##### Window Functions {#docs:current:sql:query_syntax:select::window-functions}

Generate a `row_number` column containing incremental identifiers for each row:

```sql
SELECT row_number() OVER ()
FROM sales;
```

Compute the difference between the current amount, and the previous amount, by order of time:

```sql
SELECT amount - lag(amount) OVER (ORDER BY time)
FROM sales;
```

[Window functions](#docs:current:sql:functions:window_functions) are special functions that allow the computation of values relative to *other rows* in a result. Window functions are marked by the `OVER` clause which contains the *window specification*. The window specification defines the frame or context in which the window function is computed. See the [window functions page](#docs:current:sql:functions:window_functions) for more information.

##### `unnest` Function {#docs:current:sql:query_syntax:select::unnest-function}

Unnest an array by one level:

```sql
SELECT unnest([1, 2, 3]);
```

Unnest a struct by one level:

```sql
SELECT unnest({'a': 42, 'b': 84});
```

The [`unnest`](#docs:current:sql:query_syntax:unnest) function is a special function that can be used together with [arrays](#docs:current:sql:data_types:array), [lists](#docs:current:sql:data_types:list), or [structs](#docs:current:sql:data_types:struct). The unnest function strips one level of nesting from the type. For example, `INTEGER[]` is transformed into `INTEGER`. `STRUCT(a INTEGER, b INTEGER)` is transformed into `a INTEGER, b INTEGER`. The unnest function can be used to transform nested types into regular scalar types, which makes them easier to operate on.

### FROM and JOIN Clauses {#docs:current:sql:query_syntax:from}

The `FROM` clause specifies the *source* of the data on which the remainder of the query should operate. Logically, the `FROM` clause is where the query starts execution. The `FROM` clause can contain a single table, a combination of multiple tables that are joined together using `JOIN` clauses, or another `SELECT` query inside a subquery node. DuckDB also has an optional `FROM`-first syntax which enables you to also query without a `SELECT` statement.

#### Examples {#docs:current:sql:query_syntax:from::examples}

Select all columns from the table called `tbl`:

```sql
SELECT *
FROM tbl;
```

Select all columns from the table using the `FROM`-first syntax:

```sql
FROM tbl
SELECT *;
```

Select all columns using the `FROM`-first syntax and omitting the `SELECT` clause:

```sql
FROM tbl;
```

Select all columns from the table called `tbl` through an alias `tn`:

```sql
SELECT tn.*
FROM tbl tn;
```

Use a prefix alias:

```sql
SELECT tn.*
FROM tn: tbl;
```

Select all columns from the table `tbl` in the schema `schema_name`:

```sql
SELECT *
FROM schema_name.tbl;
```

Select the column `i` from the table function `range`, where the first column of the range function is renamed to `i`:

```sql
SELECT t.i
FROM range(100) AS t(i);
```

Select all columns from the CSV file called `test.csv`:

```sql
SELECT *
FROM 'test.csv';
```

Select all columns from a subquery:

```sql
SELECT *
FROM (SELECT * FROM tbl);
```

Select the entire row of the table as a struct:

```sql
SELECT t
FROM t;
```

Select the entire row of the subquery as a struct (i.e., a single column):

```sql
SELECT t
FROM (SELECT unnest(generate_series(41, 43)) AS x, 'hello' AS y) t;
```

Join two tables together:

```sql
SELECT *
FROM tbl
JOIN other_table
  ON tbl.key = other_table.key;
```

Select a 10% sample from a table:

```sql
SELECT *
FROM tbl
TABLESAMPLE 10%;
```

Select a sample of 10 rows from a table:

```sql
SELECT *
FROM tbl
TABLESAMPLE 10 ROWS;
```

Use the `FROM`-first syntax with `WHERE` clause and aggregation:

```sql
FROM range(100) AS t(i)
SELECT sum(t.i)
WHERE i % 2 = 0;
```

##### Table Functions {#docs:current:sql:query_syntax:from::table-functions}

Some functions in DuckDB return entire tables rather than individual values. These functions are accordingly called _table functions_ and can be used with a `FROM` clause like regular table references. 
Examples include [`read_csv`](#docs:lts:data:csv:overview::csv-functions), [`read_parquet`](#docs:lts:data:parquet:overview::read_parquet-function), [`range`](#docs:current:sql:functions:list::rangestart-stop-step), [`generate_series`](#docs:current:sql:functions:list::generate_seriesstart-stop-step), [`repeat`](#docs:current:sql:functions:utility::repeat_rowvarargs-num_rows), [`unnest`](#docs:current:sql:query_syntax:unnest), and [`glob`](#docs:lts:sql:functions:utility::globsearch_path) (note that some of the examples here can be used as both scalar and table functions). 

For example,

```sql
SELECT *
FROM 'test.csv';
```

is implicitly translated to a call of the `read_csv` table function:


```sql
SELECT *
FROM read_csv('test.csv');
```

All table functions support a `WITH ORDINALITY` suffix, which extends the returned table by an integer column `ordinality` that enumerates the generated rows starting at `1`.

```sql
SELECT * 
FROM read_csv('test.csv') WITH ORDINALITY;
```

Note that the same result could be achieved using the [`row_number` window function](#docs:current:sql:functions:window_functions::row_numberorder-by-ordering).
In the presence of [joins](#::joins), however, `WITH ORDINALITY` allows enumerating one side of the join instead of the final result set, without having to resort to sub-queries.

#### Joins {#docs:current:sql:query_syntax:from::joins}

Joins are a fundamental relational operation used to connect two tables or relations horizontally.
The relations are referred to as the _left_ and _right_ sides of the join
based on how they are written in the join clause.
Each result row has the columns from both relations.

A join uses a rule to match pairs of rows from each relation.
Often this is a predicate, but there are other implied rules that may be specified.

##### Outer Joins {#docs:current:sql:query_syntax:from::outer-joins}

Rows that do not have any matches can still be returned if an `OUTER` join is specified.
Outer joins can be one of:

* `LEFT` (All rows from the left relation appear at least once)
* `RIGHT` (All rows from the right relation appear at least once)
* `FULL` (All rows from both relations appear at least once)

A join that is not `OUTER` is `INNER` (only rows that get paired are returned).

When an unpaired row is returned, the attributes from the other table are set to `NULL`.

##### Cross Product Joins (Cartesian Product) {#docs:current:sql:query_syntax:from::cross-product-joins-cartesian-product}

The simplest type of join is a `CROSS JOIN`.
There are no conditions for this type of join,
and it just returns all the possible pairs.

Return all pairs of rows:

```sql
SELECT a.*, b.*
FROM a
CROSS JOIN b;
```

This is equivalent to omitting the `JOIN` clause:

```sql
SELECT a.*, b.*
FROM a, b;
```

##### Conditional Joins {#docs:current:sql:query_syntax:from::conditional-joins}

Most joins are specified by a predicate that connects
attributes from one side to attributes from the other side.
The conditions can be explicitly specified using an `ON` clause
with the join (clearer) or implied by the `WHERE` clause (old-fashioned).

We use the `l_regions` and the `l_nations` tables from the TPC-H schema:

```sql
CREATE TABLE l_regions (
    r_regionkey INTEGER NOT NULL PRIMARY KEY,
    r_name      CHAR(25) NOT NULL,
    r_comment   VARCHAR(152)
);

CREATE TABLE l_nations (
    n_nationkey INTEGER NOT NULL PRIMARY KEY,
    n_name      CHAR(25) NOT NULL,
    n_regionkey INTEGER NOT NULL,
    n_comment   VARCHAR(152),
    FOREIGN KEY (n_regionkey) REFERENCES l_regions(r_regionkey)
);
```

Return the regions for the nations:

```sql
SELECT n.*, r.*
FROM l_nations n
JOIN l_regions r ON (n_regionkey = r_regionkey);
```

If the column names are the same and are required to be equal,
then the simpler `USING` syntax can be used:

```sql
CREATE TABLE l_regions (regionkey INTEGER NOT NULL PRIMARY KEY,
                        name      CHAR(25) NOT NULL,
                        comment   VARCHAR(152));

CREATE TABLE l_nations (nationkey INTEGER NOT NULL PRIMARY KEY,
                        name      CHAR(25) NOT NULL,
                        regionkey INTEGER NOT NULL,
                        comment   VARCHAR(152),
                        FOREIGN KEY (regionkey) REFERENCES l_regions(regionkey));
```

Return the regions for the nations:

```sql
SELECT n.*, r.*
FROM l_nations n
JOIN l_regions r USING (regionkey);
```

The expressions do not have to be equalities – any predicate can be used:

Return the pairs of jobs where one ran longer but cost less:

```sql
SELECT s1.t_id, s2.t_id
FROM west s1, west s2
WHERE s1.time > s2.time
  AND s1.cost < s2.cost;
```

##### Natural Joins {#docs:current:sql:query_syntax:from::natural-joins}

Natural joins join two tables based on attributes that share the same name.

For example, take the following example with cities, airport codes and airport names. Note that both tables are intentionally incomplete, i.e., they do not have a matching pair in the other table.

```sql
CREATE TABLE city_airport (city_name VARCHAR, iata VARCHAR);
CREATE TABLE airport_names (iata VARCHAR, airport_name VARCHAR);
INSERT INTO city_airport VALUES
    ('Amsterdam', 'AMS'),
    ('Rotterdam', 'RTM'),
    ('Eindhoven', 'EIN'),
    ('Groningen', 'GRQ');
INSERT INTO airport_names VALUES
    ('AMS', 'Amsterdam Airport Schiphol'),
    ('RTM', 'Rotterdam The Hague Airport'),
    ('MST', 'Maastricht Aachen Airport');
```

To join the tables on their shared [`IATA`](https://en.wikipedia.org/wiki/IATA_airport_code) attributes, run:

```sql
SELECT *
FROM city_airport
NATURAL JOIN airport_names;
```

This produces the following result:

| city_name | iata |        airport_name         |
|-----------|------|-----------------------------|
| Amsterdam | AMS  | Amsterdam Airport Schiphol  |
| Rotterdam | RTM  | Rotterdam The Hague Airport |

Note that only rows where the same `iata` attribute was present in both tables were included in the result.

We can also express this query using the vanilla `JOIN` clause with the `USING` keyword:

```sql
SELECT *
FROM city_airport
JOIN airport_names
USING (iata);
```

##### Semi and Anti Joins {#docs:current:sql:query_syntax:from::semi-and-anti-joins}

Semi joins return rows from the left table that have at least one match in the right table.
Anti joins return rows from the left table that have _no_ matches in the right table.
When using a semi or anti join the result will never have more rows than the left hand side table.
Semi joins provide the same logic as the [`IN` operator](#docs:current:sql:expressions:in) statement.
Anti joins provide the same logic as the `NOT IN` operator, except anti joins ignore `NULL` values from the right table.

###### Semi Join Example {#docs:current:sql:query_syntax:from::semi-join-example}

Return a list of city–airport code pairs from the `city_airport` table where the airport name **is available** in the `airport_names` table:

```sql
SELECT *
FROM city_airport
SEMI JOIN airport_names
    USING (iata);
```

| city_name | iata |
|-----------|------|
| Amsterdam | AMS  |
| Rotterdam | RTM  |

This query is equivalent to:

```sql
SELECT *
FROM city_airport
WHERE iata IN (SELECT iata FROM airport_names);
```

###### Anti Join Example {#docs:current:sql:query_syntax:from::anti-join-example}

Return a list of city–airport code pairs from the `city_airport` table where the airport name **is not available** in the `airport_names` table:

```sql
SELECT *
FROM city_airport
ANTI JOIN airport_names
    USING (iata);
```

| city_name | iata |
|-----------|------|
| Eindhoven | EIN  |
| Groningen | GRQ  |

This query is equivalent to:

```sql
SELECT *
FROM city_airport
WHERE iata NOT IN (SELECT iata FROM airport_names WHERE iata IS NOT NULL);
```

##### Lateral Joins {#docs:current:sql:query_syntax:from::lateral-joins}

The `LATERAL` keyword allows subqueries in the `FROM` clause to refer to previous subqueries. This feature is also known as a _lateral join_.

```sql
SELECT *
FROM range(3) t(i), LATERAL (SELECT i + 1) t2(j);
```



| i | j |
|--:|--:|
| 0 | 1 |
| 2 | 3 |
| 1 | 2 |

Lateral joins are a generalization of correlated subqueries, as they can return multiple values per input value rather than only a single value.

```sql
SELECT *
FROM
    generate_series(0, 1) t(i),
    LATERAL (SELECT i + 10 UNION ALL SELECT i + 100) t2(j);
```



| i |  j  |
|--:|----:|
| 0 | 10  |
| 1 | 11  |
| 0 | 100 |
| 1 | 101 |

It may be helpful to think about `LATERAL` as a loop where we iterate through the rows of the first subquery and use it as input to the second (` LATERAL`) subquery.
In the examples above, we iterate through table `t` and refer to its column `i` from the definition of table `t2`. The rows of `t2` form column `j` in the result.

It is possible to refer to multiple attributes from the `LATERAL` subquery. Using the table from the first example:

```sql
CREATE TABLE t1 AS
    SELECT *
    FROM range(3) t(i), LATERAL (SELECT i + 1) t2(j);

SELECT *
    FROM t1, LATERAL (SELECT i + j) t2(k)
    ORDER BY ALL;
```



| i | j | k |
|--:|--:|--:|
| 0 | 1 | 1 |
| 1 | 2 | 3 |
| 2 | 3 | 5 |

> DuckDB detects when `LATERAL` joins should be used, making the use of the `LATERAL` keyword optional.

##### Positional Joins {#docs:current:sql:query_syntax:from::positional-joins}

When working with data frames or other embedded tables of the same size,
the rows may have a natural correspondence based on their physical order.
In scripting languages, this is easily expressed using a loop:

```cpp
for (i = 0; i < n; i++) {
    f(t1.a[i], t2.b[i]);
}
```

It is difficult to express this in standard SQL because
relational tables are not ordered, but imported tables such as [data frames](#docs:current:clients:python:data_ingestion::pandas-dataframes-–-object-columns)
or disk files (like [CSVs](#docs:current:data:csv:overview) or [Parquet files](#docs:current:data:parquet:overview)) do have a natural ordering.

Connecting them using this ordering is called a _positional join:_

```sql
CREATE TABLE t1 (x INTEGER);
CREATE TABLE t2 (s VARCHAR);

INSERT INTO t1 VALUES (1), (2), (3);
INSERT INTO t2 VALUES ('a'), ('b');

SELECT *
FROM t1
POSITIONAL JOIN t2;
```



| x |  s   |
|--:|------|
| 1 | a    |
| 2 | b    |
| 3 | NULL |

Positional joins are always `FULL OUTER` joins, i.e., the resulting table has the length of the longer input table and the missing entries are filled with `NULL` values.

##### As-Of Joins {#docs:current:sql:query_syntax:from::as-of-joins}

A common operation when working with temporal or similarly-ordered data
is to find the nearest (first) event in a reference table (such as prices).
This is called an _as-of join:_

Attach prices to stock trades:

```sql
SELECT t.*, p.price
FROM trades t
ASOF JOIN prices p
       ON t.symbol = p.symbol AND t.when >= p.when;
```

The `ASOF` join requires at least one inequality condition on the ordering field.
The inequality can be any inequality condition (` >=`, `>`, `<=`, `<`)
on any data type, but the most common form is `>=` on a temporal type.
Any other conditions must be equalities (or `NOT DISTINCT`).
This means that the left/right order of the tables is significant.

`ASOF` joins each left side row with at most one right side row.
It can be specified as an `OUTER` join to find unpaired rows
(e.g., trades without prices or prices which have no trades.)

Attach prices or NULLs to stock trades:

```sql
SELECT *
FROM trades t
ASOF LEFT JOIN prices p
            ON t.symbol = p.symbol
           AND t.when >= p.when;
```

`ASOF` joins can also specify join conditions on matching column names with the `USING` syntax,
but the *last* attribute in the list must be the inequality,
which will be greater than or equal to (` >=`):

```sql
SELECT *
FROM trades t
ASOF JOIN prices p USING (symbol, "when");
```

Returns symbol, trades.when, price (but NOT prices.when):

If you combine `USING` with a `SELECT *` like this,
the query will return the left side (probe) column values for the matches,
not the right side (build) column values.
To get the `prices` times in the example, you will need to list the columns explicitly:

```sql
SELECT t.symbol, t.when AS trade_when, p.when AS price_when, price
FROM trades t
ASOF LEFT JOIN prices p USING (symbol, "when");
```

##### Self-Joins {#docs:current:sql:query_syntax:from::self-joins}

DuckDB allows self-joins for all types of joins.
Note that tables need to be aliased, using the same table name without aliases will result in an error:

```sql
CREATE TABLE t (x INTEGER);
SELECT * FROM t JOIN t USING(x);
```

```console
Binder Error:
Duplicate alias "t" in query!
```

Adding the aliases allows the query to parse successfully:

```sql
SELECT * FROM t AS t1 JOIN t AS t2 USING(x);
```

##### Shorthands in the `JOIN` Clause {#docs:current:sql:query_syntax:from::shorthands-in-the-join-clause}

You can specify column names in the `JOIN` clause:

```sql
CREATE TABLE t1 (x INTEGER);
CREATE TABLE t2 (y INTEGER);
INSERT INTO t1 VALUES (1), (2), (4);
INSERT INTO t2 VALUES (2), (3);
SELECT * FROM t1 NATURAL JOIN t2 t2(x);
```

| x |
|--:|
| 2 |

You can also use the `VALUES` clause in the `JOIN` clause:

```sql
SELECT * FROM t1 NATURAL JOIN (VALUES (2), (4)) _(x);
```

| x |
|--:|
| 2 |
| 4 |

#### `FROM`-First Syntax {#docs:current:sql:query_syntax:from::from-first-syntax}

DuckDB's SQL supports the `FROM`-first syntax, i.e., it allows putting the `FROM` clause before the `SELECT` clause or completely omitting the `SELECT` clause. We use the following example to demonstrate it:

```sql
CREATE TABLE tbl AS
    SELECT *
    FROM (VALUES ('a'), ('b')) t1(s), range(1, 3) t2(i);
```

##### `FROM`-First Syntax with a `SELECT` Clause {#docs:current:sql:query_syntax:from::from-first-syntax-with-a-select-clause}

The following statement demonstrates the use of the `FROM`-first syntax:

```sql
FROM tbl
SELECT i, s;
```

This is equivalent to:

```sql
SELECT i, s
FROM tbl;
```



| i | s |
|--:|---|
| 1 | a |
| 2 | a |
| 1 | b |
| 2 | b |

##### `FROM`-First Syntax without a `SELECT` Clause {#docs:current:sql:query_syntax:from::from-first-syntax-without-a-select-clause}

The following statement demonstrates the use of the optional `SELECT` clause:

```sql
FROM tbl;
```

This is equivalent to:

```sql
SELECT *
FROM tbl;
```



| s | i |
|---|--:|
| a | 1 |
| a | 2 |
| b | 1 |
| b | 2 |

#### Syntax {#docs:current:sql:query_syntax:from::syntax}


### WHERE Clause {#docs:current:sql:query_syntax:where}

The `WHERE` clause specifies any filters to apply to the data. This allows you to select only a subset of the data in which you are interested. Logically the `WHERE` clause is applied immediately after the `FROM` clause.

#### Examples {#docs:current:sql:query_syntax:where::examples}

Select all rows where the `id` is equal to 3:

```sql
SELECT *
FROM tbl
WHERE id = 3;
```

Select all rows that match the given **case-sensitive** `LIKE` expression:

```sql
SELECT *
FROM tbl
WHERE name LIKE '%mark%';
```

Select all rows that match the given **case-insensitive** expression formulated with the `ILIKE` operator:

```sql
SELECT *
FROM tbl
WHERE name ILIKE '%mark%';
```

Select all rows that match the given composite expression:

```sql
SELECT *
FROM tbl
WHERE id = 3 OR id = 7;
```

#### Syntax {#docs:current:sql:query_syntax:where::syntax}


### GROUP BY Clause {#docs:current:sql:query_syntax:groupby}

The `GROUP BY` clause specifies which grouping columns should be used to perform any aggregations in the `SELECT` clause.
If the `GROUP BY` clause is specified, the query is always an aggregate query, even if no aggregations are present in the `SELECT` clause.

When a `GROUP BY` clause is specified, all tuples that have matching data in the grouping columns (i.e., all tuples that belong to the same group) will be combined.
The values of the grouping columns themselves are unchanged, and any other columns can be combined using an [aggregate function](#docs:current:sql:functions:aggregates) (such as `count`, `sum`, `avg`, etc).

#### `GROUP BY ALL` {#docs:current:sql:query_syntax:groupby::group-by-all}

Use `GROUP BY ALL` to `GROUP BY` all columns in the `SELECT` statement that are not wrapped in aggregate functions.
This simplifies the syntax by allowing the columns list to be maintained in a single location, and prevents bugs by keeping the `SELECT` granularity aligned to the `GROUP BY` granularity (e.g., it prevents duplication).
See examples below and additional examples in the [“Friendlier SQL with DuckDB” blog post](https://duckdb.org/2022/05/04/friendlier-sql#group-by-all).

#### Multiple Dimensions {#docs:current:sql:query_syntax:groupby::multiple-dimensions}

Normally, the `GROUP BY` clause groups along a single dimension.
Using the [`GROUPING SETS`, `CUBE` or `ROLLUP` clauses](#docs:current:sql:query_syntax:grouping_sets) it is possible to group along multiple dimensions.
See the [`GROUPING SETS`](#docs:current:sql:query_syntax:grouping_sets) page for more information.

#### Examples {#docs:current:sql:query_syntax:groupby::examples}

Count the number of entries in the `addresses` table that belong to each different city:

```sql
SELECT city, count(*)
FROM addresses
GROUP BY city;
```

Compute the average income per city per street_name:

```sql
SELECT city, street_name, avg(income)
FROM addresses
GROUP BY city, street_name;
```

##### `GROUP BY ALL` Examples {#docs:current:sql:query_syntax:groupby::group-by-all-examples}

Group by city and street_name to remove any duplicate values:

```sql
SELECT city, street_name
FROM addresses
GROUP BY ALL;
```

Compute the average income per city per street_name. Since income is wrapped in an aggregate function, do not include it in the `GROUP BY`:

```sql
SELECT city, street_name, avg(income)
FROM addresses
GROUP BY ALL;
-- GROUP BY city, street_name:
```

#### Syntax {#docs:current:sql:query_syntax:groupby::syntax}


### GROUPING SETS {#docs:current:sql:query_syntax:grouping_sets}

`GROUPING SETS`, `ROLLUP` and `CUBE` can be used in the `GROUP BY` clause to perform a grouping over multiple dimensions within the same query.
Note that this syntax is not compatible with [`GROUP BY ALL`](#docs:current:sql:query_syntax:groupby::group-by-all).

#### Examples {#docs:current:sql:query_syntax:grouping_sets::examples}

Compute the average income along the provided four different dimensions:

```sql
-- the syntax () denotes the empty set (i.e., computing an ungrouped aggregate)
SELECT city, street_name, avg(income)
FROM addresses
GROUP BY GROUPING SETS ((city, street_name), (city), (street_name), ());
```

Compute the average income along the same dimensions:

```sql
SELECT city, street_name, avg(income)
FROM addresses
GROUP BY CUBE (city, street_name);
```

Compute the average income along the dimensions `(city, street_name)`, `(city)` and `()`:

```sql
SELECT city, street_name, avg(income)
FROM addresses
GROUP BY ROLLUP (city, street_name);
```

#### Description {#docs:current:sql:query_syntax:grouping_sets::description}

`GROUPING SETS` performs the same aggregate across different `GROUP BY` clauses in a single query.

```sql
CREATE TABLE students (course VARCHAR, type VARCHAR);
INSERT INTO students (course, type)
VALUES
    ('CS', 'Bachelor'), ('CS', 'Bachelor'), ('CS', 'PhD'), ('Math', 'Masters'),
    ('CS', NULL), ('CS', NULL), ('Math', NULL);
```

```sql
SELECT course, type, count(*)
FROM students
GROUP BY GROUPING SETS ((course, type), course, type, ());
```

| course |   type   | count_star() |
|--------|----------|-------------:|
| Math   | NULL     | 1            |
| NULL   | NULL     | 7            |
| CS     | PhD      | 1            |
| CS     | Bachelor | 2            |
| Math   | Masters  | 1            |
| CS     | NULL     | 2            |
| Math   | NULL     | 2            |
| CS     | NULL     | 5            |
| NULL   | NULL     | 3            |
| NULL   | Masters  | 1            |
| NULL   | Bachelor | 2            |
| NULL   | PhD      | 1            |

In the above query, we group across four different sets: `course, type`, `course`, `type` and `()` (the empty group). The result contains `NULL` for a group which is not in the grouping set for the result, i.e., the above query is equivalent to the following statement of `UNION ALL` clauses:

```sql
-- Group by course, type:
SELECT course, type, count(*)
FROM students
GROUP BY course, type
UNION ALL
-- Group by type:
SELECT NULL AS course, type, count(*)
FROM students
GROUP BY type
UNION ALL
-- Group by course:
SELECT course, NULL AS type, count(*)
FROM students
GROUP BY course
UNION ALL
-- Group by nothing:
SELECT NULL AS course, NULL AS type, count(*)
FROM students;
```

`CUBE` and `ROLLUP` are syntactic sugar to easily produce commonly used grouping sets.

The `ROLLUP` clause will produce all “sub-groups” of a grouping set, e.g., `ROLLUP (country, city, zip)` produces the grouping sets `(country, city, zip), (country, city), (country), ()`. This can be useful for producing different levels of detail of a group by clause. This produces `n+1` grouping sets where n is the amount of terms in the `ROLLUP` clause.

`CUBE` produces grouping sets for all combinations of the inputs, e.g., `CUBE (country, city, zip)` will produce `(country, city, zip), (country, city), (country, zip), (city, zip), (country), (city), (zip), ()`. This produces `2^n` grouping sets.

#### Identifying Grouping Sets with `GROUPING_ID()` {#docs:current:sql:query_syntax:grouping_sets::identifying-grouping-sets-with-grouping_id}

The super-aggregate rows generated by `GROUPING SETS`, `ROLLUP` and `CUBE` can often be identified by `NULL`-values returned for the respective column in the grouping. But if the columns used in the grouping can themselves contain actual `NULL`-values, then it can be challenging to distinguish whether the value in the resultset is a “real” `NULL`-value coming out of the data itself, or a `NULL`-value generated by the grouping construct. The `GROUPING_ID()` or `GROUPING()` function is designed to identify which groups generated the super-aggregate rows in the result.

`GROUPING_ID()` is an aggregate function that takes the column expressions that make up the grouping(s). It returns a `BIGINT` value. The return value is `0` for the rows that are not super-aggregate rows. But for the super-aggregate rows, it returns an integer value that identifies the combination of expressions that make up the group for which the super-aggregate is generated. At this point, an example might help. Consider the following query:

```sql
WITH days AS (
    SELECT
        year("generate_series")    AS y,
        quarter("generate_series") AS q,
        month("generate_series")   AS m
    FROM generate_series(DATE '2023-01-01', DATE '2023-12-31', INTERVAL 1 DAY)
)
SELECT y, q, m, GROUPING_ID(y, q, m) AS "grouping_id()"
FROM days
GROUP BY GROUPING SETS (
    (y, q, m),
    (y, q),
    (y),
    ()
)
ORDER BY y, q, m;
```

These are the results:

|  y   |  q   |  m   | grouping_id() |
|-----:|-----:|-----:|--------------:|
| 2023 | 1    | 1    | 0             |
| 2023 | 1    | 2    | 0             |
| 2023 | 1    | 3    | 0             |
| 2023 | 1    | NULL | 1             |
| 2023 | 2    | 4    | 0             |
| 2023 | 2    | 5    | 0             |
| 2023 | 2    | 6    | 0             |
| 2023 | 2    | NULL | 1             |
| 2023 | 3    | 7    | 0             |
| 2023 | 3    | 8    | 0             |
| 2023 | 3    | 9    | 0             |
| 2023 | 3    | NULL | 1             |
| 2023 | 4    | 10   | 0             |
| 2023 | 4    | 11   | 0             |
| 2023 | 4    | 12   | 0             |
| 2023 | 4    | NULL | 1             |
| 2023 | NULL | NULL | 3             |
| NULL | NULL | NULL | 7             |

In this example, the lowest level of grouping is at the month level, defined by the grouping set `(y, q, m)`. Result rows corresponding to that level are simply aggregate rows and the `GROUPING_ID(y, q, m)` function returns `0` for those. The grouping set `(y, q)` results in super-aggregate rows over the month level, leaving a `NULL`-value for the `m` column, and for which `GROUPING_ID(y, q, m)` returns `1`. The grouping set `(y)` results in super-aggregate rows over the quarter level, leaving `NULL`-values for the `m` and `q` column, for which `GROUPING_ID(y, q, m)` returns `3`. Finally, the `()` grouping set results in one super-aggregate row for the entire resultset, leaving `NULL`-values for `y`, `q` and `m` and for which `GROUPING_ID(y, q, m)` returns `7`.

To understand the relationship between the return value and the grouping set, you can think of `GROUPING_ID(y, q, m)` writing to a bitfield, where the first bit corresponds to the last expression passed to `GROUPING_ID()`, the second bit to the one-but-last expression passed to `GROUPING_ID()`, and so on. This may become clearer by casting `GROUPING_ID()` to `BIT`:

```sql
WITH days AS (
    SELECT
        year("generate_series")    AS y,
        quarter("generate_series") AS q,
        month("generate_series")   AS m
    FROM generate_series(DATE '2023-01-01', DATE '2023-12-31', INTERVAL 1 DAY)
)
SELECT
    y, q, m,
    GROUPING_ID(y, q, m) AS "grouping_id(y, q, m)",
    right(GROUPING_ID(y, q, m)::BIT::VARCHAR, 3) AS "y_q_m_bits"
FROM days
GROUP BY GROUPING SETS (
    (y, q, m),
    (y, q),
    (y),
    ()
)
ORDER BY y, q, m;
```

Which returns these results:

|  y   |  q   |  m   | grouping_id(y, q, m) | y_q_m_bits |
|-----:|-----:|-----:|---------------------:|------------|
| 2023 | 1    | 1    | 0                    | 000        |
| 2023 | 1    | 2    | 0                    | 000        |
| 2023 | 1    | 3    | 0                    | 000        |
| 2023 | 1    | NULL | 1                    | 001        |
| 2023 | 2    | 4    | 0                    | 000        |
| 2023 | 2    | 5    | 0                    | 000        |
| 2023 | 2    | 6    | 0                    | 000        |
| 2023 | 2    | NULL | 1                    | 001        |
| 2023 | 3    | 7    | 0                    | 000        |
| 2023 | 3    | 8    | 0                    | 000        |
| 2023 | 3    | 9    | 0                    | 000        |
| 2023 | 3    | NULL | 1                    | 001        |
| 2023 | 4    | 10   | 0                    | 000        |
| 2023 | 4    | 11   | 0                    | 000        |
| 2023 | 4    | 12   | 0                    | 000        |
| 2023 | 4    | NULL | 1                    | 001        |
| 2023 | NULL | NULL | 3                    | 011        |
| NULL | NULL | NULL | 7                    | 111        |

Note that the number of expressions passed to `GROUPING_ID()`, or the order in which they are passed is independent from the actual group definitions appearing in the `GROUPING SETS`-clause (or the groups implied by `ROLLUP` and `CUBE`). As long as the expressions passed to `GROUPING_ID()` are expressions that appear somewhere in the `GROUPING SETS`-clause, `GROUPING_ID()` will set a bit corresponding to the position of the expression whenever that expression is rolled up to a super-aggregate.

#### Syntax {#docs:current:sql:query_syntax:grouping_sets::syntax}


### HAVING Clause {#docs:current:sql:query_syntax:having}

The `HAVING` clause can be used after the `GROUP BY` clause to provide filter criteria *after* the grouping has been completed. In terms of syntax the `HAVING` clause is identical to the `WHERE` clause, but while the `WHERE` clause occurs before the grouping, the `HAVING` clause occurs after the grouping.

#### Examples {#docs:current:sql:query_syntax:having::examples}

Count the number of entries in the `addresses` table that belong to each different `city`, filtering out cities with a count below 50:

```sql
SELECT city, count(*)
FROM addresses
GROUP BY city
HAVING count(*) >= 50;
```

Compute the average income per city per `street_name`, filtering out cities with an average `income` bigger than twice the median `income`:

```sql
SELECT city, street_name, avg(income)
FROM addresses
GROUP BY city, street_name
HAVING avg(income) > 2 * median(income);
```

#### Syntax {#docs:current:sql:query_syntax:having::syntax}


### ORDER BY Clause {#docs:current:sql:query_syntax:orderby}

`ORDER BY` is an output modifier. Logically it is applied near the very end of the query (just prior to [`LIMIT`](#docs:current:sql:query_syntax:limit) or [`OFFSET`](#docs:current:sql:query_syntax:limit), if present).
The `ORDER BY` clause sorts the rows on the sorting criteria in either ascending or descending order.
In addition, every order clause can specify whether `NULL` values should be moved to the beginning or to the end.

The `ORDER BY` clause may contain one or more expressions, separated by commas.
An error will be thrown if no expressions are included, since the `ORDER BY` clause should be removed in that situation.
The expressions may begin with either an arbitrary scalar expression (which could be a column name), a column position number (where the indexing starts from 1), or the keyword `ALL`.
Each expression can optionally be followed by an order modifier (` ASC` or `DESC`, default is `ASC`), and/or a `NULL` order modifier (` NULLS FIRST` or `NULLS LAST`, default is `NULLS LAST`).

#### `ORDER BY ALL` {#docs:current:sql:query_syntax:orderby::order-by-all}

The `ALL` keyword indicates that the output should be sorted by every column in order from left to right.
The direction of this sort may be modified using either `ORDER BY ALL ASC` or `ORDER BY ALL DESC` and/or `NULLS FIRST` or `NULLS LAST`.
Note that `ALL` may not be used in combination with other expressions in the `ORDER BY` clause – it must be by itself.
See examples below.

#### `NULL` Order Modifier {#docs:current:sql:query_syntax:orderby::null-order-modifier}

By default, DuckDB sorts `ASC` and `NULLS LAST`, i.e., the values are sorted in ascending order and `NULL` values are placed last.
This is identical to the default sort order of PostgreSQL.
The default sort order can be changed with the following configuration options.

Use the `default_null_order` option to change the default `NULL` sorting order to either `NULLS_FIRST`, `NULLS_LAST`, `NULLS_FIRST_ON_ASC_LAST_ON_DESC` or `NULLS_LAST_ON_ASC_FIRST_ON_DESC`:

```sql
SET default_null_order = 'NULLS_FIRST';
```

Use the `default_order` option to change the direction of the default sorting order to either `DESC` or `ASC`:

```sql
SET default_order = 'DESC';
```

#### Collations {#docs:current:sql:query_syntax:orderby::collations}

Text is sorted using the binary comparison collation by default, which means values are sorted on their binary UTF-8 values.
While this works well for ASCII text (e.g., for English language data), the sorting order can be incorrect for other languages.
For this purpose, DuckDB provides collations.
For more information on collations, see the [Collation page](#docs:current:sql:expressions:collations).

#### Examples {#docs:current:sql:query_syntax:orderby::examples}

All examples use this example table:

```sql
CREATE OR REPLACE TABLE addresses AS
    SELECT '123 Quack Blvd' AS address, 'DuckTown' AS city, '11111' AS zip
    UNION ALL
    SELECT '111 Duck Duck Goose Ln', 'DuckTown', '11111'
    UNION ALL
    SELECT '111 Duck Duck Goose Ln', 'Duck Town', '11111'
    UNION ALL
    SELECT '111 Duck Duck Goose Ln', 'Duck Town', '11111-0001';
```

Select the addresses, ordered by city name using the default `NULL` order and default order:

```sql
SELECT *
FROM addresses
ORDER BY city;
```

Select the addresses, ordered by city name in descending order with nulls at the end:

```sql
SELECT *
FROM addresses
ORDER BY city DESC NULLS LAST;
```

Order by city and then by zip code, both using the default orderings:

```sql
SELECT *
FROM addresses
ORDER BY city, zip;
```

Order by city using German collation rules:

```sql
SELECT *
FROM addresses
ORDER BY city COLLATE DE;
```

##### `ORDER BY ALL` Examples {#docs:current:sql:query_syntax:orderby::order-by-all-examples}

Order from left to right (by address, then by city, then by zip) in ascending order:

```sql
SELECT *
FROM addresses
ORDER BY ALL;
```

|        address         |   city    |    zip     |
|------------------------|-----------|------------|
| 111 Duck Duck Goose Ln | Duck Town | 11111      |
| 111 Duck Duck Goose Ln | Duck Town | 11111-0001 |
| 111 Duck Duck Goose Ln | DuckTown  | 11111      |
| 123 Quack Blvd         | DuckTown  | 11111      |

Order from left to right (by address, then by city, then by zip) in descending order:

```sql
SELECT *
FROM addresses
ORDER BY ALL DESC;
```

|        address         |   city    |    zip     |
|------------------------|-----------|------------|
| 123 Quack Blvd         | DuckTown  | 11111      |
| 111 Duck Duck Goose Ln | DuckTown  | 11111      |
| 111 Duck Duck Goose Ln | Duck Town | 11111-0001 |
| 111 Duck Duck Goose Ln | Duck Town | 11111      |

#### Syntax {#docs:current:sql:query_syntax:orderby::syntax}


### LIMIT and OFFSET Clauses {#docs:current:sql:query_syntax:limit}

`LIMIT` is an output modifier. Logically it is applied at the very end of the query. The `LIMIT` clause restricts the amount of rows fetched. The `OFFSET` clause indicates at which position to start reading the values, i.e., the first `OFFSET` values are ignored.

Note that while `LIMIT` can be used without an `ORDER BY` clause, the results might not be deterministic without the `ORDER BY` clause. This can still be useful, however, for example when you want to inspect a quick snapshot of the data.

#### Examples {#docs:current:sql:query_syntax:limit::examples}

Select the first 5 rows from the addresses table:

```sql
SELECT *
FROM addresses
LIMIT 5;
```

Select the 5 rows from the addresses table, starting at position 5 (i.e., ignoring the first 5 rows):

```sql
SELECT *
FROM addresses
LIMIT 5
OFFSET 5;
```

Select the top 5 cities with the highest population:

```sql
SELECT city, count(*) AS population
FROM addresses
GROUP BY city
ORDER BY population DESC
LIMIT 5;
```

Select 10% of the rows from the addresses table:

```sql
SELECT *
FROM addresses
LIMIT 10%;
```

#### Syntax {#docs:current:sql:query_syntax:limit::syntax}


### SAMPLE Clause {#docs:current:sql:query_syntax:sample}

The `SAMPLE` clause allows you to run the query on a sample from the base table. This can significantly speed up processing of queries, at the expense of accuracy in the result. Samples can also be used to quickly see a snapshot of the data when exploring a dataset. The sample clause is applied right after anything in the `FROM` clause (i.e., after any joins, but before the `WHERE` clause or any aggregates). See the [`SAMPLE`](#docs:current:sql:samples) page for more information.

#### Examples {#docs:current:sql:query_syntax:sample::examples}

Select a sample of 1% of the addresses table using default (system) sampling:

```sql
SELECT *
FROM addresses
USING SAMPLE 1%;
```

Select a sample of 1% of the addresses table using bernoulli sampling:

```sql
SELECT *
FROM addresses
USING SAMPLE 1% (bernoulli);
```

Select a sample of 10 rows from the subquery:

```sql
SELECT *
FROM (SELECT * FROM addresses)
USING SAMPLE 10 ROWS;
```

#### Syntax {#docs:current:sql:query_syntax:sample::syntax}


### Unnesting {#docs:current:sql:query_syntax:unnest}

#### Examples {#docs:current:sql:query_syntax:unnest::examples}

Unnest a list, generating 3 rows (1, 2, 3):

```sql
SELECT unnest([1, 2, 3]);
```

Unnesting a struct, generating two columns (a, b):

```sql
SELECT unnest({'a': 42, 'b': 84});
```

Recursive unnest of a list of structs:

```sql
SELECT unnest([{'a': 42, 'b': 84}, {'a': 100, 'b': NULL}], recursive := true);
```

Limit depth of recursive unnest using `max_depth`:

```sql
SELECT unnest([[[1, 2], [3, 4]], [[5, 6], [7, 8, 9], []], [[10, 11]]], max_depth := 2);
```

The `unnest` special function is used to unnest lists or structs by one level. The function can be used as a regular scalar function, but only in the `SELECT` clause. Invoking `unnest` with the `recursive` parameter will unnest lists and structs of multiple levels. The depth of unnesting can be limited using the `max_depth` parameter (which assumes `recursive` unnesting by default).

##### Unnesting Lists {#docs:current:sql:query_syntax:unnest::unnesting-lists}

Unnest a list, generating 3 rows (1, 2, 3):

```sql
SELECT unnest([1, 2, 3]);
```

Unnest a list, generating 3 rows ((1, 10), (2, 10), (3, 10)):

```sql
SELECT unnest([1, 2, 3]), 10;
```

Unnest two lists of different sizes, generating 3 rows ((1, 10), (2, 11), (3, NULL)):

```sql
SELECT unnest([1, 2, 3]), unnest([10, 11]);
```
Unnest a list column from a subquery:

```sql
SELECT unnest(l) + 10 FROM (VALUES ([1, 2, 3]), ([4, 5])) tbl(l);
```

Empty result:

```sql
SELECT unnest([]);
```

Empty result:

```sql
SELECT unnest(NULL);
```

Using `unnest` on a list emits one row per list entry. Regular scalar expressions in the same `SELECT` clause are repeated for every emitted row. When multiple lists are unnested in the same `SELECT` clause, the lists are unnested side-by-side. If one list is longer than the other, the shorter list is padded with `NULL` values.

Empty and `NULL` lists both unnest to zero rows.

##### Unnesting Structs {#docs:current:sql:query_syntax:unnest::unnesting-structs}

Unnesting a struct, generating two columns (a, b):

```sql
SELECT unnest({'a': 42, 'b': 84});
```

Unnesting a struct, generating two columns (a, b):

```sql
SELECT unnest({'a': 42, 'b': {'x': 84}});
```

`unnest` on a struct will emit one column per entry in the struct.

##### Recursive Unnest {#docs:current:sql:query_syntax:unnest::recursive-unnest}

Unnesting a list of lists recursively, generating 5 rows (1, 2, 3, 4, 5):

```sql
SELECT unnest([[1, 2, 3], [4, 5]], recursive := true);
```

Unnesting a list of structs recursively, generating two rows of two columns (a, b):

```sql
SELECT unnest([{'a': 42, 'b': 84}, {'a': 100, 'b': NULL}], recursive := true);
```

Unnesting a struct, generating two columns (a, b):

```sql
SELECT unnest({'a': [1, 2, 3], 'b': 88}, recursive := true);
```

Calling `unnest` with the `recursive` setting will fully unnest lists, followed by fully unnesting structs. This can be useful to fully flatten columns that contain lists within lists, or lists of structs. Note that lists *within* structs are not unnested.

##### Setting the Maximum Depth of Unnesting {#docs:current:sql:query_syntax:unnest::setting-the-maximum-depth-of-unnesting}

The `max_depth` parameter allows limiting the maximum depth of recursive unnesting (which is assumed by default and does not have to be specified separately).
For example, unnesting to `max_depth` of 2 yields the following:

```sql
SELECT unnest([[[1, 2], [3, 4]], [[5, 6], [7, 8, 9], []], [[10, 11]]], max_depth := 2) AS x;
```

|     x     |
|-----------|
| [1, 2]    |
| [3, 4]    |
| [5, 6]    |
| [7, 8, 9] |
| []        |
| [10, 11]  |

Meanwhile, unnesting to `max_depth` of 3 results in:

```sql
SELECT unnest([[[1, 2], [3, 4]], [[5, 6], [7, 8, 9], []], [[10, 11]]], max_depth := 3) AS x;
```

| x  |
|---:|
| 1  |
| 2  |
| 3  |
| 4  |
| 5  |
| 6  |
| 7  |
| 8  |
| 9  |
| 10 |
| 11 |

##### Keeping Track of List Entry Positions {#docs:current:sql:query_syntax:unnest::keeping-track-of-list-entry-positions}

To keep track of each entry's position within the original list, `unnest` may be combined with [`generate_subscripts`](#docs:current:sql:functions:list::generate_subscripts):

```sql
SELECT unnest(l) AS x, generate_subscripts(l, 1) AS index
FROM (VALUES ([1, 2, 3]), ([4, 5])) tbl(l);
```

| x | index |
|--:|------:|
| 1 | 1     |
| 2 | 2     |
| 3 | 3     |
| 4 | 1     |
| 5 | 2     |

##### Keep Column Names When Recursively Unnesting {#docs:current:sql:query_syntax:unnest::keep-column-names-when-recursively-unnesting}

The `keep_parent_names` parameter can be used to retain the parent column names when recursively unnesting a named struct. For example, unnesting the following query with `keep_parent_names` enabled:

```sql
SELECT unnest([{'a': 0, 'b': {'bb': {'bbb': 1}}}], recursive := true, keep_parent_names := true);
```

yields the following result:
 
|  a  | b.bb.bbb |
|----:|---------:|
|  0  |    1     |

In this case, the field names are preserved, showing the path to the innermost value. This is particularly useful when working with complex nested data structures, as it maintains the structure and naming convention of the original data. The parameter can also be used in conjunction with the `max_depth` parameter, allowing more control and enabling more precise management of nested structures.

### WITH Clause {#docs:current:sql:query_syntax:with}

The `WITH` clause allows you to specify common table expressions (CTEs).
Regular (non-recursive) common-table-expressions are essentially views that are limited in scope to a particular query.
CTEs can reference each other and can be nested. [Recursive CTEs](#::recursive-ctes) can reference themselves.

#### Basic CTE Examples {#docs:current:sql:query_syntax:with::basic-cte-examples}

Create a CTE called `cte` and use it in the main query:

```sql
WITH cte AS (SELECT 42 AS x)
SELECT * FROM cte;
```

| x  |
|---:|
| 42 |

Create two CTEs `cte1` and `cte2`, where the second CTE references the first CTE:

```sql
WITH
    cte1 AS (SELECT 42 AS i),
    cte2 AS (SELECT i * 100 AS x FROM cte1)
SELECT * FROM cte2;
```

|  x   |
|-----:|
| 4200 |

You can specify column names for CTEs:

```sql
WITH cte(j) AS (SELECT 42 AS i)
FROM cte;
```

#### CTE Materialization {#docs:current:sql:query_syntax:with::cte-materialization}

DuckDB handles CTEs as _materialized_ by default, meaning that the CTE is evaluated
once and the result is stored in a temporary table. However, under certain conditions,
DuckDB can _inline_ the CTE into the main query, which means that the CTE is not
materialized and its definition is duplicated in each place it is referenced.
Inlining is done using the following heuristics:
* The CTE is not referenced more than once.
* The CTE does not contain a `VOLATILE` function.
* The CTE is using `AS NOT MATERIALIZED` and does not use `AS MATERIALIZED`.
* The CTE does not perform a grouped aggregation.

Materialization can be explicitly activated by defining the CTE using `AS MATERIALIZED` and disabled by using `AS NOT MATERIALIZED`. Note that inlining is not always possible, even if the heuristics are met. For example, if the CTE contains a `read_csv` function, it cannot be inlined.

Take the following query for example, which invokes the same CTE three times:

```sql
WITH t(x) AS (⟨complex_query⟩)
SELECT *
FROM
    t AS t1,
    t AS t2,
    t AS t3;
```

Inlining duplicates the definition of `t` for each reference which results in the following query:

```sql
SELECT *
FROM
    (⟨complex_query⟩) AS t1(x),
    (⟨complex_query⟩) AS t2(x),
    (⟨complex_query⟩) AS t3(x);
```

If `complex_query` is expensive, materializing it with the `MATERIALIZED` keyword can improve performance. In this case, `complex_query` is evaluated only once.

```sql
WITH t(x) AS MATERIALIZED (⟨complex_query⟩)
SELECT *
FROM
    t AS t1,
    t AS t2,
    t AS t3;
```

If one wants to disable materialization, use `NOT MATERIALIZED`:

```sql
WITH t(x) AS NOT MATERIALIZED (⟨complex_query⟩)
SELECT *
FROM
    t AS t1,
    t AS t2,
    t AS t3;
```

Generally, it is not recommended to use explicit materialization hints, as DuckDB's query optimizer is capable of deciding when to materialize or inline a CTE based on the query structure and the heuristics mentioned above. However, in some cases, it may be beneficial to use `MATERIALIZED` or `NOT MATERIALIZED` to control the behavior explicitly.

#### Recursive CTEs {#docs:current:sql:query_syntax:with::recursive-ctes}

`WITH RECURSIVE` allows the definition of CTEs which can refer to themselves. Note that the query must be formulated in a way that ensures termination, otherwise, it may run into an infinite loop.

##### Example: Fibonacci Sequence {#docs:current:sql:query_syntax:with::example-fibonacci-sequence}

`WITH RECURSIVE` can be used to make recursive calculations. For example, here is how `WITH RECURSIVE` could be used to calculate the first ten Fibonacci numbers:

```sql
WITH RECURSIVE FibonacciNumbers (
    RecursionDepth, FibonacciNumber, NextNumber
) AS (
        -- Base case
        SELECT
            0 AS RecursionDepth,
            0 AS FibonacciNumber,
            1 AS NextNumber
        UNION ALL
        -- Recursive step
        SELECT
            fib.RecursionDepth + 1 AS RecursionDepth,
            fib.NextNumber AS FibonacciNumber,
            fib.FibonacciNumber + fib.NextNumber AS NextNumber
        FROM
            FibonacciNumbers fib
        WHERE
            fib.RecursionDepth + 1 < 10
    )
SELECT
    fn.RecursionDepth AS FibonacciNumberIndex,
    fn.FibonacciNumber
FROM
    FibonacciNumbers fn;
```

| FibonacciNumberIndex | FibonacciNumber |
|---------------------:|----------------:|
| 0                    | 0               |
| 1                    | 1               |
| 2                    | 1               |
| 3                    | 2               |
| 4                    | 3               |
| 5                    | 5               |
| 6                    | 8               |
| 7                    | 13              |
| 8                    | 21              |
| 9                    | 34              |

##### Example: Tree Traversal {#docs:current:sql:query_syntax:with::example-tree-traversal}

`WITH RECURSIVE` can be used to traverse trees. For example, take a hierarchy of tags:

<img src="/images/examples/with-recursive-tree-example-light.svg" alt="Example graph" style="width: 700px; text-align: center" class="lightmode-img">


```sql
CREATE TABLE tag (id INTEGER, name VARCHAR, subclassof INTEGER);
INSERT INTO tag VALUES
    (1, 'U2',     5),
    (2, 'Blur',   5),
    (3, 'Oasis',  5),
    (4, '2Pac',   6),
    (5, 'Rock',   7),
    (6, 'Rap',    7),
    (7, 'Music',  9),
    (8, 'Movies', 9),
    (9, 'Art', NULL);
```

The following query returns the path from the node `Oasis` to the root of the tree (` Art`).

```sql
WITH RECURSIVE tag_hierarchy(id, source, path) AS (
        SELECT id, name, [name] AS path
        FROM tag
        WHERE subclassof IS NULL
    UNION ALL
        SELECT tag.id, tag.name, list_prepend(tag.name, tag_hierarchy.path)
        FROM tag, tag_hierarchy
        WHERE tag.subclassof = tag_hierarchy.id
    )
SELECT path
FROM tag_hierarchy
WHERE source = 'Oasis';
```

|           path            |
|---------------------------|
| [Oasis, Rock, Music, Art] |

##### Graph Traversal {#docs:current:sql:query_syntax:with::graph-traversal}

The `WITH RECURSIVE` clause can be used to express graph traversal on arbitrary graphs. However, if the graph has cycles, the query must perform cycle detection to prevent infinite loops.
One way to achieve this is to store the path of a traversal in a [list](#docs:current:sql:data_types:list) and, before extending the path with a new edge, check whether its endpoint has been visited before (see the example later).

Take the following directed graph from the [LDBC Graphalytics benchmark](https://arxiv.org/pdf/2011.15028.pdf):

<img src="/images/examples/with-recursive-graph-example-light.svg" alt="Example graph" style="width: 700px; text-align: center" class="lightmode-img">


```sql
CREATE TABLE edge (node1id INTEGER, node2id INTEGER);
INSERT INTO edge VALUES
    (1, 3), (1, 5), (2, 4), (2, 5), (2, 10), (3, 1),
    (3, 5), (3, 8), (3, 10), (5, 3), (5, 4), (5, 8),
    (6, 3), (6, 4), (7, 4), (8, 1), (9, 4);
```

Note that the graph contains directed cycles, e.g., between nodes 1, 5 and 8.

###### Enumerate All Paths from a Node {#docs:current:sql:query_syntax:with::enumerate-all-paths-from-a-node}

The following query returns **all paths** starting in node 1:

```sql
WITH RECURSIVE paths(startNode, endNode, path) AS (
        SELECT -- Define the path as the first edge of the traversal
            node1id AS startNode,
            node2id AS endNode,
            [node1id, node2id] AS path
        FROM edge
        WHERE startNode = 1
        UNION ALL
        SELECT -- Concatenate new edge to the path
            paths.startNode AS startNode,
            node2id AS endNode,
            array_append(path, node2id) AS path
        FROM paths
        JOIN edge ON paths.endNode = node1id
        -- Prevent adding a repeated node to the path.
        -- This ensures that no cycles occur.
        WHERE list_position(paths.path, node2id) IS NULL
    )
SELECT startNode, endNode, path
FROM paths
ORDER BY length(path), path;
```

| startNode | endNode |     path      |
|----------:|--------:|---------------|
| 1         | 3       | [1, 3]        |
| 1         | 5       | [1, 5]        |
| 1         | 5       | [1, 3, 5]     |
| 1         | 8       | [1, 3, 8]     |
| 1         | 10      | [1, 3, 10]    |
| 1         | 3       | [1, 5, 3]     |
| 1         | 4       | [1, 5, 4]     |
| 1         | 8       | [1, 5, 8]     |
| 1         | 4       | [1, 3, 5, 4]  |
| 1         | 8       | [1, 3, 5, 8]  |
| 1         | 8       | [1, 5, 3, 8]  |
| 1         | 10      | [1, 5, 3, 10] |

Note that the result of this query is not restricted to shortest paths, e.g., for node 5, the results include paths `[1, 5]` and `[1, 3, 5]`.

###### Enumerate Unweighted Shortest Paths from a Node {#docs:current:sql:query_syntax:with::enumerate-unweighted-shortest-paths-from-a-node}

In most cases, enumerating all paths is not practical or feasible. Instead, only the **(unweighted) shortest paths** are of interest. To find these, the second half of the `WITH RECURSIVE` query should be adjusted such that it only includes a node if it has not yet been visited. This is implemented by using a subquery that checks if any of the previous paths includes the node:

```sql
WITH RECURSIVE paths(startNode, endNode, path) AS (
        SELECT -- Define the path as the first edge of the traversal
            node1id AS startNode,
            node2id AS endNode,
            [node1id, node2id] AS path
        FROM edge
        WHERE startNode = 1
        UNION ALL
        SELECT -- Concatenate new edge to the path
            paths.startNode AS startNode,
            node2id AS endNode,
            array_append(path, node2id) AS path
        FROM paths
        JOIN edge ON paths.endNode = node1id
        -- Prevent adding a node that was visited previously by any path.
        -- This ensures that (1) no cycles occur and (2) only nodes that
        -- were not visited by previous (shorter) paths are added to a path.
        WHERE NOT EXISTS (
                FROM paths previous_paths
                WHERE list_contains(previous_paths.path, node2id)
              )
    )
SELECT startNode, endNode, path
FROM paths
ORDER BY length(path), path;
```

| startNode | endNode |    path    |
|----------:|--------:|------------|
| 1         | 3       | [1, 3]     |
| 1         | 5       | [1, 5]     |
| 1         | 8       | [1, 3, 8]  |
| 1         | 10      | [1, 3, 10] |
| 1         | 4       | [1, 5, 4]  |
| 1         | 8       | [1, 5, 8]  |

###### Enumerate Unweighted Shortest Paths between Two Nodes {#docs:current:sql:query_syntax:with::enumerate-unweighted-shortest-paths-between-two-nodes}

`WITH RECURSIVE` can also be used to find **all (unweighted) shortest paths between two nodes**. To ensure that the recursive query is stopped as soon as we reach the end node, we use a [window function](#docs:current:sql:functions:window_functions) which checks whether the end node is among the newly added nodes.

The following query returns all unweighted shortest paths between nodes 1 (start node) and 8 (end node):

```sql
WITH RECURSIVE paths(startNode, endNode, path, endReached) AS (
   SELECT -- Define the path as the first edge of the traversal
        node1id AS startNode,
        node2id AS endNode,
        [node1id, node2id] AS path,
        (node2id = 8) AS endReached
     FROM edge
     WHERE startNode = 1
   UNION ALL
   SELECT -- Concatenate new edge to the path
        paths.startNode AS startNode,
        node2id AS endNode,
        array_append(path, node2id) AS path,
        max(CASE WHEN node2id = 8 THEN 1 ELSE 0 END)
            OVER (ROWS BETWEEN UNBOUNDED PRECEDING
                           AND UNBOUNDED FOLLOWING) AS endReached
     FROM paths
     JOIN edge ON paths.endNode = node1id
    WHERE NOT EXISTS (
            FROM paths previous_paths
            WHERE list_contains(previous_paths.path, node2id)
          )
      AND paths.endReached = 0
)
SELECT startNode, endNode, path
FROM paths
WHERE endNode = 8
ORDER BY length(path), path;
```

| startNode | endNode |   path    |
|----------:|--------:|-----------|
| 1         | 8       | [1, 3, 8] |
| 1         | 8       | [1, 5, 8] |

##### Accessing the Union Table with `recurring` {#docs:current:sql:query_syntax:with::accessing-the-union-table-with-recurring}

Within the recursive term of a `WITH RECURSIVE` CTE, the CTE name (e.g., `counter`) refers to the rows produced by the *last iteration*. To access *all rows accumulated so far* (the union table), use the `recurring` schema prefix:

```sql
WITH RECURSIVE counter(i) AS (
    SELECT 1
        UNION ALL
    SELECT i + 1
    FROM counter
    WHERE (SELECT max(i) FROM recurring.counter) < 5
)
SELECT *
FROM counter;
```

| i |
|--:|
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |

Here, `recurring.counter` gives access to all rows accumulated across all previous iterations, while `counter` in the `FROM` clause only contains the rows from the most recent iteration. This is useful when termination conditions or calculations depend on the full accumulated result rather than just the previous iteration.

#### Recursive CTEs with `USING KEY` {#docs:current:sql:query_syntax:with::recursive-ctes-with-using-key}

> **Deprecated.** DuckDB 1.5.0 deprecated the use of recursive `UNION`s for
> `USING KEY` CTEs in favor of recursive `UNION ALL`s.
> 
> The recursive `UNION`s imply that not all rows that are produced in one
> iteration are passed to the next, as would be the case for regular recursive
> CTEs. Since the opposite is true, i.e., all rows are passed from one iteration
> to the next, going forward DuckDB's `USING KEY` CTEs will require recursive
> `UNION ALL`s instead.
>
> DuckDB 1.5.0 also introduces a new setting to configure the `USING KEY` syntax.
>
> ```sql
> SET deprecated_using_key_syntax = 'DEFAULT';
> SET deprecated_using_key_syntax = 'UNION_AS_UNION_ALL';
> ```
>
> Currently, `DEFAULT` enables both syntax styles, i.e., allows both recursive
> `UNION`s and recursive `UNION ALL`s in `USING KEY` CTEs.
>
> DuckDB 1.5.0 will be the last release supporting the `UNION` syntax without
> explicitly enabling it.
>
> DuckDB 2.0.0 disables the `UNION` syntax by default.
>
> DuckDB 2.1.0 removes the `deprecated_using_key_syntax` flag and fully
> deprecates the `UNION` syntax.

`USING KEY` alters the behavior of a regular recursive CTE.

In each iteration, a regular recursive CTE appends result rows to the union table, which ultimately defines the overall result of the CTE. In contrast, a CTE with `USING KEY` has the ability to update rows that have been placed in the union table in an earlier iteration: if the current iteration produces a row with key `k`, it replaces a row with the same key `k` in the union table (like a dictionary). If no such row exists in the union table yet, the new row is appended to the union table as usual.

This allows a CTE to exercise fine-grained control over the union table contents. Avoiding the append-only behavior can lead to significantly smaller union table sizes. This helps query runtime, memory consumption, and makes it feasible to access the union table while the iteration is still ongoing. In a CTE `WITH RECURSIVE T(...) USING KEY ...`, table `T` denotes the rows added by the last iteration (as is usual for recursive CTEs), while table `recurring.T` denotes the [union table built so far](#::accessing-the-union-table-with-recurring). References to `recurring.T` allow for the elegant and idiomatic translation of rather complex algorithms into readable SQL code.

##### Example: `USING KEY` {#docs:current:sql:query_syntax:with::example-using-key}

This is a recursive CTE where `USING KEY` has a key column (` a`) and a payload column (` b`).
The payload columns correspond to the columns to be overwritten.
In the first iteration we have two different keys, `1` and `2`.
These two keys will generate two new rows, `(1, 3)` and `(2, 4)`.
In the next iteration we produce a new key, `3`, which generates a new row.
We also generate the row `(2, 3)`, where `2` is a key that already exists from the previous iteration.
This will overwrite the old payload `4` with the new payload `3`.

```sql
WITH RECURSIVE tbl(a, b) USING KEY (a) AS (
    SELECT a, b
    FROM (VALUES (1, 3), (2, 4)) t(a, b)
        UNION ALL
    SELECT a + 1, b
    FROM tbl
    WHERE a < 3
)
SELECT *
FROM tbl;
```

| a | b |
|--:|--:|
| 1 | 3 |
| 2 | 3 |
| 3 | 3 |

#### Using `VALUES` {#docs:current:sql:query_syntax:with::using-values}

You can use the `VALUES` clause for the initial (anchor) part of the CTE:

```sql
WITH RECURSIVE tbl(a, b) USING KEY (a) AS (
    VALUES (1, 3), (2, 4)
        UNION ALL
    SELECT a + 1, b
    FROM tbl
    WHERE a < 3
)
SELECT *
FROM tbl;
```

##### Example: `USING KEY` References Union Table {#docs:current:sql:query_syntax:with::example-using-key-references-union-table}

As well as using the union table as a dictionary, we can now reference it in queries. This allows you to use results from not just the previous iteration, but also earlier ones. This new feature makes certain algorithms easier to implement.

One example is the connected components algorithm. For each node, the algorithm determines the node with the lowest ID to which it is connected. To achieve this, we use the entries in the union table to track the lowest ID found for a node. If a new incoming row contains a lower ID, we update this value.

<img src="/images/examples/using-key-graph-example-light.svg" alt="Example graph" style="width: 700px; text-align: center" class="lightmode-img">


```sql
CREATE TABLE nodes (id INTEGER);
INSERT INTO nodes VALUES (1), (2), (3), (4), (5), (6), (7), (8);
CREATE TABLE edges (node1id INTEGER, node2id INTEGER);
INSERT INTO edges VALUES
    (1, 3), (2, 3), (3, 7), (7, 8), (5, 4), (6, 4);
```

```sql
WITH RECURSIVE connected_components(id, comp) USING KEY (id) AS (
    SELECT n.id, n.id AS comp
    FROM nodes AS n
        UNION ALL (
    SELECT DISTINCT ON (previous_iter.id) previous_iter.id, initial_iter.comp
    FROM 
        recurring.connected_components AS previous_iter,
        connected_components AS initial_iter,
        edges AS e
    WHERE ((e.node1id, e.node2id) = (previous_iter.id, initial_iter.id)
       OR (e.node2id, e.node1id) = (previous_iter.id, initial_iter.id))
      AND initial_iter.comp < previous_iter.comp
    ORDER BY initial_iter.id ASC, previous_iter.comp ASC)
)
TABLE connected_components
ORDER BY id;
```

| id | comp |
|---:|-----:|
| 1  | 1    |
| 2  | 1    |
| 3  | 1    |
| 4  | 4    |
| 5  | 4    |
| 6  | 4    |
| 7  | 1    |
| 8  | 1    |

#### Limitations {#docs:current:sql:query_syntax:with::limitations}

DuckDB does not support mutually recursive CTEs. See the [related issue and discussion in the DuckDB repository](https://github.com/duckdb/duckdb/issues/14716#issuecomment-2467952456).

#### Syntax {#docs:current:sql:query_syntax:with::syntax}


### WINDOW Clause {#docs:current:sql:query_syntax:window}

The `WINDOW` clause allows you to specify named windows that can be used within [window functions](#docs:current:sql:functions:window_functions). These are useful when you have multiple window functions, as they allow you to avoid repeating the same window clause.

#### Syntax {#docs:current:sql:query_syntax:window::syntax}


### QUALIFY Clause {#docs:current:sql:query_syntax:qualify}

The `QUALIFY` clause is used to filter the results of [`WINDOW` functions](#docs:current:sql:functions:window_functions). This filtering of results is similar to how a [`HAVING` clause](#docs:current:sql:query_syntax:having) filters the results of aggregate functions applied based on the [`GROUP BY` clause](#docs:current:sql:query_syntax:groupby).

The `QUALIFY` clause avoids the need for a subquery or [`WITH` clause](#docs:current:sql:query_syntax:with) to perform this filtering (much like `HAVING` avoids a subquery). An example using a `WITH` clause instead of `QUALIFY` is included below the `QUALIFY` examples.

Note that this is filtering based on [`WINDOW` functions](#docs:current:sql:functions:window_functions), not necessarily based on the [`WINDOW` clause](#docs:current:sql:query_syntax:window). The `WINDOW` clause is optional and can be used to simplify the creation of multiple `WINDOW` function expressions.

The position of where to specify a `QUALIFY` clause is following the [`WINDOW` clause](#docs:current:sql:query_syntax:window) in a `SELECT` statement (` WINDOW` does not need to be specified), and before the [`ORDER BY`](#docs:current:sql:query_syntax:orderby).

#### Examples {#docs:current:sql:query_syntax:qualify::examples}

Each of the following examples produce the same output, located below.

Filter based on a window function defined in the `QUALIFY` clause:

```sql
SELECT
    schema_name,
    function_name,
    -- In this example the function_rank column in the select clause is for reference
    row_number() OVER (PARTITION BY schema_name ORDER BY function_name) AS function_rank
FROM duckdb_functions()
QUALIFY
    row_number() OVER (PARTITION BY schema_name ORDER BY function_name) < 3;
```

Filter based on a window function defined in the `SELECT` clause:

```sql
SELECT
    schema_name,
    function_name,
    row_number() OVER (PARTITION BY schema_name ORDER BY function_name) AS function_rank
FROM duckdb_functions()
QUALIFY
    function_rank < 3;
```

Filter based on a window function defined in the `QUALIFY` clause, but using the `WINDOW` clause:

```sql
SELECT
    schema_name,
    function_name,
    -- In this example the function_rank column in the select clause is for reference
    row_number() OVER my_window AS function_rank
FROM duckdb_functions()
WINDOW
    my_window AS (PARTITION BY schema_name ORDER BY function_name)
QUALIFY
    row_number() OVER my_window < 3;
```

Filter based on a window function defined in the `SELECT` clause, but using the `WINDOW` clause:

```sql
SELECT
    schema_name,
    function_name,
    row_number() OVER my_window AS function_rank
FROM duckdb_functions()
WINDOW
    my_window AS (PARTITION BY schema_name ORDER BY function_name)
QUALIFY
    function_rank < 3;
```

Equivalent query based on a `WITH` clause (without a `QUALIFY` clause):

```sql
WITH ranked_functions AS (
    SELECT
        schema_name,
        function_name,
        row_number() OVER (PARTITION BY schema_name ORDER BY function_name) AS function_rank
    FROM duckdb_functions()
)
SELECT
    *
FROM ranked_functions
WHERE
    function_rank < 3;
```

| schema_name |  function_name  | function_rank |
|:---|:---|:---|
| main        | !__postfix      | 1             |
| main        | !~~             | 2             |
| pg_catalog  | col_description | 1             |
| pg_catalog  | format_pg_type  | 2             |

#### Syntax {#docs:current:sql:query_syntax:qualify::syntax}


### VALUES Clause {#docs:current:sql:query_syntax:values}

The `VALUES` clause is used to specify a fixed number of rows. The `VALUES` clause can be used as a stand-alone statement, as part of the `FROM` clause, or as input to an `INSERT INTO` statement.

#### Examples {#docs:current:sql:query_syntax:values::examples}

Generate two rows and directly return them:

```sql
VALUES ('Amsterdam', 1), ('London', 2);
```

Generate two rows as part of a `FROM` clause, and rename the columns:

```sql
SELECT *
FROM (VALUES ('Amsterdam', 1), ('London', 2)) cities(name, id);
```

Generate two rows and insert them into a table:

```sql
INSERT INTO cities
VALUES ('Amsterdam', 1), ('London', 2);
```

Create a table directly from a `VALUES` clause:

```sql
CREATE TABLE cities AS
    SELECT *
    FROM (VALUES ('Amsterdam', 1), ('London', 2)) cities(name, id);
```

#### Syntax {#docs:current:sql:query_syntax:values::syntax}


### FILTER Clause {#docs:current:sql:query_syntax:filter}

The `FILTER` clause may optionally follow an aggregate function in a `SELECT` statement. This will filter the rows of data that are fed into the aggregate function in the same way that a `WHERE` clause filters rows, but localized to the specific aggregate function.

There are multiple types of situations where this is useful, including when evaluating multiple aggregates with different filters, and when creating a pivoted view of a dataset. `FILTER` provides a cleaner syntax for pivoting data when compared with the more traditional `CASE WHEN` approach discussed below.

Some aggregate functions also do not filter out `NULL` values, so using a `FILTER` clause will return valid results when at times the `CASE WHEN` approach will not. This occurs with the functions `first` and `last`, which are desirable in a non-aggregating pivot operation where the goal is to simply re-orient the data into columns rather than re-aggregate it. `FILTER` also improves `NULL` handling when using the `list` and `array_agg` functions, as the `CASE WHEN` approach will include `NULL` values in the list result, while the `FILTER` clause will remove them.

#### Examples {#docs:current:sql:query_syntax:filter::examples}

Return the following:

* The total number of rows
* The number of rows where `i <= 5`
* The number of rows where `i` is odd

```sql
SELECT
    count() AS total_rows,
    count() FILTER (i <= 5) AS lte_five,
    count() FILTER (i % 2 = 1) AS odds
FROM generate_series(1, 10) tbl(i);
```



| total_rows | lte_five | odds |
|:---|:---|:---|
| 10 | 5 | 5 |

> Simply counting rows that satisfy a condition can also be achieved without the `FILTER` clause, using the boolean `sum` aggregate function, e.g., `sum(i <= 5)`.

Different aggregate functions may be used, and multiple `WHERE` expressions are also permitted:

```sql
SELECT
    sum(i) FILTER (i <= 5) AS lte_five_sum,
    median(i) FILTER (i % 2 = 1) AS odds_median,
    median(i) FILTER (i % 2 = 1 AND i <= 5) AS odds_lte_five_median
FROM generate_series(1, 10) tbl(i);
```



| lte_five_sum | odds_median | odds_lte_five_median |
|:---|:---|:---|
| 15 | 5.0 | 3.0 |

The `FILTER` clause can also be used to pivot data from rows into columns. This is a static pivot, as columns must be defined prior to runtime in SQL. However, this kind of statement can be dynamically generated in a host programming language to leverage DuckDB's SQL engine for rapid, larger than memory pivoting.

First generate an example dataset:

```sql
CREATE TEMP TABLE stacked_data AS
    SELECT
        i,
        CASE WHEN i <= rows * 0.25  THEN 2022
             WHEN i <= rows * 0.5   THEN 2023
             WHEN i <= rows * 0.75  THEN 2024
             WHEN i <= rows * 0.875 THEN 2025
             ELSE NULL
             END AS year
    FROM (
        SELECT
            i,
            count(*) OVER () AS rows
        FROM generate_series(1, 100_000_000) tbl(i)
    ) tbl;
```

“Pivot” the data out by year (move each year out to a separate column):

```sql
SELECT
    count(i) FILTER (year = 2022) AS "2022",
    count(i) FILTER (year = 2023) AS "2023",
    count(i) FILTER (year = 2024) AS "2024",
    count(i) FILTER (year = 2025) AS "2025",
    count(i) FILTER (year IS NULL) AS "NULLs"
FROM stacked_data;
```

This syntax produces the same results as the `FILTER` clauses above:

```sql
SELECT
    count(CASE WHEN year = 2022 THEN i END) AS "2022",
    count(CASE WHEN year = 2023 THEN i END) AS "2023",
    count(CASE WHEN year = 2024 THEN i END) AS "2024",
    count(CASE WHEN year = 2025 THEN i END) AS "2025",
    count(CASE WHEN year IS NULL THEN i END) AS "NULLs"
FROM stacked_data;
```



|   2022   |   2023   |   2024   |   2025   |  NULLs   |
|:---|:---|:---|:---|:---|
| 25000000 | 25000000 | 25000000 | 12500000 | 12500000 |

However, the `CASE WHEN` approach will not work as expected when using an aggregate function that does not ignore `NULL` values. The `first` function falls into this category, so `FILTER` is preferred in this case.

“Pivot” the data out by year (move each year out to a separate column):

```sql
SELECT
    first(i) FILTER (year = 2022) AS "2022",
    first(i) FILTER (year = 2023) AS "2023",
    first(i) FILTER (year = 2024) AS "2024",
    first(i) FILTER (year = 2025) AS "2025",
    first(i) FILTER (year IS NULL) AS "NULLs"
FROM stacked_data;
```



|   2022   |   2023   |   2024   |   2025   |  NULLs   |
|:---|:---|:---|:---|:---|
| 1474561 | 25804801 | 50749441 | 76431361 | 87500001 |

This will produce `NULL` values whenever the first evaluation of the `CASE WHEN` clause returns a `NULL`:

```sql
SELECT
    first(CASE WHEN year = 2022 THEN i END) AS "2022",
    first(CASE WHEN year = 2023 THEN i END) AS "2023",
    first(CASE WHEN year = 2024 THEN i END) AS "2024",
    first(CASE WHEN year = 2025 THEN i END) AS "2025",
    first(CASE WHEN year IS NULL THEN i END) AS "NULLs"
FROM stacked_data;
```



|   2022   |   2023   |   2024   |   2025   |  NULLs   |
|:---|:---|:---|:---|:---|
| 1228801 | NULL | NULL | NULL | NULL  |

#### Aggregate Function Syntax (Including `FILTER` Clause) {#docs:current:sql:query_syntax:filter::aggregate-function-syntax-including-filter-clause}


### Set Operations {#docs:current:sql:query_syntax:setops}

Set operations allow queries to be combined according to [set operation semantics](https://en.wikipedia.org/wiki/Set_(mathematics)#Basic_operations). Set operations refer to the [`UNION [ALL]`](#union), [`INTERSECT [ALL]`](#intersect) and [`EXCEPT [ALL]`](#except) clauses. The vanilla variants use set semantics, i.e., they eliminate duplicates, while the variants with `ALL` use bag semantics.

Traditional set operations unify queries **by column position**, and require the to-be-combined queries to have the same number of input columns. If the columns are not of the same type, casts may be added. The result will use the column names from the first query.

DuckDB also supports [`UNION [ALL] BY NAME`](#union-all-by-name), which joins columns by name instead of by position. `UNION BY NAME` does not require the inputs to have the same number of columns. `NULL` values will be added in case of missing columns.

#### `UNION` {#docs:current:sql:query_syntax:setops::union}

The `UNION` clause can be used to combine rows from multiple queries. The queries are required to return the same number of columns. [Implicit casting](https://duckdb.org/docs/sql/data_types/typecasting#implicit-casting) to one of the returned types is performed to combine columns of different types where necessary. If this is not possible, the `UNION` clause throws an error.

##### Vanilla `UNION` (Set Semantics) {#docs:current:sql:query_syntax:setops::vanilla-union-set-semantics}

The vanilla `UNION` clause follows set semantics, therefore it performs duplicate elimination, i.e., only unique rows will be included in the result.

```sql
SELECT * FROM range(2) t1(x)
UNION
SELECT * FROM range(3) t2(x);
```

| x |
|--:|
| 2 |
| 1 |
| 0 |

##### `UNION ALL` (Bag Semantics) {#docs:current:sql:query_syntax:setops::union-all-bag-semantics}

`UNION ALL` returns all rows of both queries following bag semantics, i.e., *without* duplicate elimination.

```sql
SELECT * FROM range(2) t1(x)
UNION ALL
SELECT * FROM range(3) t2(x);
```

| x |
|--:|
| 0 |
| 1 |
| 0 |
| 1 |
| 2 |

##### `UNION [ALL] BY NAME` {#docs:current:sql:query_syntax:setops::union-all-by-name}

The `UNION [ALL] BY NAME` clause can be used to combine rows from different tables by name, instead of by position. `UNION BY NAME` does not require both queries to have the same number of columns. Any columns that are only found in one of the queries are filled with `NULL` values for the other query.

Take the following tables for example:

```sql
CREATE TABLE capitals (city VARCHAR, country VARCHAR);
INSERT INTO capitals VALUES
    ('Amsterdam', 'NL'),
    ('Berlin', 'Germany');
CREATE TABLE weather (city VARCHAR, degrees INTEGER, date DATE);
INSERT INTO weather VALUES
    ('Amsterdam', 10, '2022-10-14'),
    ('Seattle', 8, '2022-10-12');
```

```sql
SELECT * FROM capitals
UNION BY NAME
SELECT * FROM weather;
```

|   city    | country | degrees |    date    |
|-----------|---------|--------:|------------|
| Seattle   | NULL    | 8       | 2022-10-12 |
| Amsterdam | NL      | NULL    | NULL       |
| Berlin    | Germany | NULL    | NULL       |
| Amsterdam | NULL    | 10      | 2022-10-14 |

`UNION BY NAME` follows set semantics (therefore it performs duplicate elimination), whereas `UNION ALL BY NAME` follows bag semantics.

#### `INTERSECT` {#docs:current:sql:query_syntax:setops::intersect}

The `INTERSECT` clause can be used to select all rows that occur in the result of **both** queries.

##### Vanilla `INTERSECT` (Set Semantics) {#docs:current:sql:query_syntax:setops::vanilla-intersect-set-semantics}

Vanilla `INTERSECT` performs duplicate elimination, so only unique rows are returned.

```sql
SELECT * FROM range(2) t1(x)
INTERSECT
SELECT * FROM range(6) t2(x);
```

| x |
|--:|
| 0 |
| 1 |

##### `INTERSECT ALL` (Bag Semantics) {#docs:current:sql:query_syntax:setops::intersect-all-bag-semantics}

`INTERSECT ALL` follows bag semantics, so duplicates are returned.

```sql
SELECT unnest([5, 5, 6, 6, 6, 6, 7, 8]) AS x
INTERSECT ALL
SELECT unnest([5, 6, 6, 7, 7, 9]);
```

| x |
|--:|
| 5 |
| 6 |
| 6 |
| 7 |

#### `EXCEPT` {#docs:current:sql:query_syntax:setops::except}

The `EXCEPT` clause can be used to select all rows that **only** occur in the left query.

##### Vanilla `EXCEPT` (Set Semantics) {#docs:current:sql:query_syntax:setops::vanilla-except-set-semantics}

Vanilla `EXCEPT` follows set semantics, therefore, it performs duplicate elimination, so only unique rows are returned.

```sql
SELECT * FROM range(5) t1(x)
EXCEPT
SELECT * FROM range(2) t2(x);
```

| x |
|--:|
| 2 |
| 3 |
| 4 |

##### `EXCEPT ALL` (Bag Semantics) {#docs:current:sql:query_syntax:setops::except-all-bag-semantics}

`EXCEPT ALL` uses bag semantics:

```sql
SELECT unnest([5, 5, 6, 6, 6, 6, 7, 8]) AS x
EXCEPT ALL
SELECT unnest([5, 6, 6, 7, 7, 9]);
```

| x |
|--:|
| 5 |
| 8 |
| 6 |
| 6 |

#### Syntax {#docs:current:sql:query_syntax:setops::syntax}


### Prepared Statements {#docs:current:sql:query_syntax:prepared_statements}

DuckDB supports prepared statements where parameters are substituted when the query is executed.
This can improve readability and is useful for preventing [SQL injections](https://en.wikipedia.org/wiki/SQL_injection).

#### Syntax {#docs:current:sql:query_syntax:prepared_statements::syntax}

There are three syntaxes for denoting parameters in prepared statements:
auto-incremented (` ?`),
positional (` $1`),
and named (` $param`).
Note that not all clients support all of these syntaxes, e.g., the [JDBC client](#docs:current:clients:java) only supports auto-incremented parameters in prepared statements.

##### Example Dataset {#docs:current:sql:query_syntax:prepared_statements::example-dataset}

In the following, we introduce the three different syntaxes and illustrate them with examples using the following table.

```sql
CREATE TABLE person (name VARCHAR, age BIGINT);
INSERT INTO person VALUES ('Alice', 37), ('Ana', 35), ('Bob', 41), ('Bea', 25);
```

In our example query, we'll look for people whose name starts with a `B` and are at least 40 years old.
This will return a single row `<'Bob', 41>`.

##### Auto-Incremented Parameters: `?` {#docs:current:sql:query_syntax:prepared_statements::auto-incremented-parameters-}

DuckDB supports using prepared statements with auto-incremented indexing,
i.e., the position of the parameters in the query corresponds to their position in the execution statement.
For example:

```sql
PREPARE query_person AS
    SELECT *
    FROM person
    WHERE starts_with(name, ?)
      AND age >= ?;
```

Using the CLI client, the statement is executed as follows.

```sql
EXECUTE query_person('B', 40);
```

##### Positional Parameters: `$1` {#docs:current:sql:query_syntax:prepared_statements::positional-parameters-1}

Prepared statements can use positional parameters, where parameters are denoted with an integer (` $1`, `$2`).
For example:

```sql
PREPARE query_person AS
    SELECT *
    FROM person
    WHERE starts_with(name, $2)
      AND age >= $1;
```

Using the CLI client, the statement is executed as follows.
Note that the first parameter corresponds to `$1`, the second to `$2`, and so on.

```sql
EXECUTE query_person(40, 'B');
```

##### Named Parameters: `$parameter` {#docs:current:sql:query_syntax:prepared_statements::named-parameters-parameter}

DuckDB also supports named parameters where parameters are denoted with `$parameter_name`.
For example:

```sql
PREPARE query_person AS
    SELECT *
    FROM person
    WHERE starts_with(name, $name_start_letter)
      AND age >= $minimum_age;
```

Using the CLI client, the statement is executed as follows.

```sql
EXECUTE query_person(name_start_letter := 'B', minimum_age := 40);
```

#### Dropping Prepared Statements: `DEALLOCATE` {#docs:current:sql:query_syntax:prepared_statements::dropping-prepared-statements-deallocate}

To drop a prepared statement, use the `DEALLOCATE` statement:

```sql
DEALLOCATE query_person;
```

Alternatively, use:

```sql
DEALLOCATE PREPARE query_person;
```

## Data Types {#sql:data_types}

### Data Types {#docs:current:sql:data_types:overview}

#### General-Purpose Data Types {#docs:current:sql:data_types:overview::general-purpose-data-types}

The table below shows all the built-in general-purpose data types. The alternatives listed in the aliases column can be used to refer to these types as well, however, note that the aliases are not part of the SQL standard and hence might not be accepted by other database engines.

| Name                       | Aliases                            | Description                                                                                                |
| :------------------------- | :--------------------------------- | :--------------------------------------------------------------------------------------------------------- |
| `BIGINT`                   | `INT8`, `LONG`                     | Signed eight-byte integer                                                                                  |
| `BIT`                      | `BITSTRING`                        | String of 1s and 0s                                                                                        |
| `BLOB`                     | `BYTEA`, `BINARY`, `VARBINARY`     | Variable-length binary data                                                                                |
| `BIGNUM`                   |                                    | Variable-length integer                                                                                    |
| `BOOLEAN`                  | `BOOL`, `LOGICAL`                  | Logical Boolean (` true` / `false`)                                                                         |
| `DATE`                     |                                    | Calendar date (year, month, day)                                                                           |
| `DECIMAL(prec, scale)`     | `NUMERIC(prec, scale)`             | Fixed-precision number with the given width (precision) and scale, defaults to `prec = 18` and `scale = 3` |
| `DOUBLE`                   | `FLOAT8`                           | Double precision floating-point number (8 bytes)                                                           |
| `FLOAT`                    | `FLOAT4`, `REAL`                   | Single precision floating-point number (4 bytes)                                                           |
| `HUGEINT`                  |                                    | Signed sixteen-byte integer                                                                                |
| `INTEGER`                  | `INT4`, `INT`, `SIGNED`            | Signed four-byte integer                                                                                   |
| `INTERVAL`                 |                                    | Date / time delta                                                                                          |
| `JSON`                     |                                    | JSON object (via the [`json` extension](#docs:current:data:json:overview))                    |
| `SMALLINT`                 | `INT2`, `SHORT`                    | Signed two-byte integer                                                                                    |
| `TIME`                     |                                    | Time of day (no time zone)                                                                                 |
| `TIMESTAMP WITH TIME ZONE` | `TIMESTAMPTZ`                      | Combination of time and date that uses the current time zone                                               |
| `TIMESTAMP`                | `DATETIME`                         | Combination of time and date                                                                               |
| `TINYINT`                  | `INT1`                             | Signed one-byte integer                                                                                    |
| `UBIGINT`                  |                                    | Unsigned eight-byte integer                                                                                |
| `UHUGEINT`                 |                                    | Unsigned sixteen-byte integer                                                                              |
| `UINTEGER`                 |                                    | Unsigned four-byte integer                                                                                 |
| `USMALLINT`                |                                    | Unsigned two-byte integer                                                                                  |
| `UTINYINT`                 |                                    | Unsigned one-byte integer                                                                                  |
| `UUID`                     |                                    | [UUID data type](#docs:current:sql:data_types:numeric::universally-unique-identifiers-uuids)   |
| `VARCHAR`                  | `CHAR`, `BPCHAR`, `TEXT`, `STRING` | Variable-length character string                                                                           |

Implicit and explicit typecasting is possible between numerous types, see the [Typecasting](#docs:current:sql:data_types:typecasting) page for details.

#### Nested / Composite Types {#docs:current:sql:data_types:overview::nested--composite-types}

DuckDB supports five nested data types: `ARRAY`, `LIST`, `MAP`, `STRUCT` and `UNION`. Each supports different use cases and has a different structure.

| Name                                                         | Description                                                                                                                                                                                     | Rules when used in a column                                                                                    | Build from values         | Define in DDL/CREATE               |
| :----------------------------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------- | :------------------------ | :--------------------------------- |
| [`ARRAY`](#docs:current:sql:data_types:array)   | An ordered, fixed-length sequence of data values of the same type.                                                                                                                              | Each row must have the same data type within each instance of the `ARRAY` and the same number of elements.     | `[1, 2, 3]`               | `INTEGER[3]`                       |
| [`LIST`](#docs:current:sql:data_types:list)     | An ordered sequence of data values of the same type.                                                                                                                                            | Each row must have the same data type within each instance of the `LIST`, but can have any number of elements. | `[1, 2, 3]`               | `INTEGER[]`                        |
| [`MAP`](#docs:current:sql:data_types:map)       | A dictionary of multiple named values, each key having the same type and each value having the same type. Keys and values can be any type and can be different types from one another.          | Rows may have different keys.                                                                                  | `map([1, 2], ['a', 'b'])` | `MAP(INTEGER, VARCHAR)`            |
| [`STRUCT`](#docs:current:sql:data_types:struct) | A dictionary of multiple named values, where each key is a string, but the value can be a different type for each key.                                                                          | Each row must have the same keys.                                                                              | `{'i': 42, 'j': 'a'}`     | `STRUCT(i INTEGER, j VARCHAR)`     |
| [`UNION`](#docs:current:sql:data_types:union)       | A union of multiple alternative data types, storing one of them in each value at a time. A union also contains a discriminator “tag” value to inspect and access the currently set member type. | Rows may be set to different member types of the union.                                                        | `union_value(num := 2)`   | `UNION(num INTEGER, text VARCHAR)` |
| [`VARIANT`](#docs:current:sql:data_types:variant) | A semi-structured type where each value is self-contained with its own type information.                                                                                                        | Each row may hold a value of a different type.                                                                 | `42::VARIANT`             | `VARIANT`                          |

##### Rules for Case Sensitivity {#docs:current:sql:data_types:overview::rules-for-case-sensitivity}

The keys of `MAP`s are case-sensitive, while keys of `UNION`s and `STRUCT`s are case-insensitive.
For examples, see the [Rules for Case Sensitivity section](#docs:current:sql:dialect:overview::case-sensitivity-of-keys-in-nested-data-structures).

##### Updating Values of Nested Types {#docs:current:sql:data_types:overview::updating-values-of-nested-types}

When performing _updates_ on values of nested types, DuckDB performs a _delete_ operation followed by an _insert_ operation.
When used in a table with ART indexes (either via explicit indexes or primary keys/unique constraints), this can lead to [unexpected constraint violations](#docs:current:sql:indexes::constraint-checking-in-update-statements).

#### Nesting {#docs:current:sql:data_types:overview::nesting}

`ARRAY`, `LIST`, `MAP`, `STRUCT` and `UNION` types can be arbitrarily nested to any depth, so long as the type rules are observed.

Struct with `LIST`s:

```sql
SELECT {'birds': ['duck', 'goose', 'heron'], 'aliens': NULL, 'amphibians': ['frog', 'toad']};
```

Struct with list of `MAP`s:

```sql
SELECT {'test': [MAP([1, 5], [42.1, 45]), MAP([1, 5], [42.1, 45])]};
```

A list of `UNION`s:

```sql
SELECT [union_value(num := 2), union_value(str := 'ABC')::UNION(str VARCHAR, num INTEGER)];
```

#### Performance Implications {#docs:current:sql:data_types:overview::performance-implications}

The choice of data types can have a strong effect on performance. Please consult the [Performance Guide](#docs:current:guides:performance:schema) for details.

### Array Type {#docs:current:sql:data_types:array}

An `ARRAY` column stores fixed-sized arrays. All fields in the column must have the same length and the same underlying type. Arrays are typically used to store arrays of numbers, but can contain any uniform data type, including `ARRAY`, [`LIST`](#docs:current:sql:data_types:list) and [`STRUCT`](#docs:current:sql:data_types:struct) types.

Arrays can be used to store vectors such as [word embeddings](https://en.wikipedia.org/wiki/Word_embedding) or image embeddings.

To store variable-length lists, use the [`LIST` type](#docs:current:sql:data_types:list). See the [data types overview](#docs:current:sql:data_types:overview) for a comparison between nested data types.

> The `ARRAY` type in PostgreSQL allows variable-length fields. DuckDB's `ARRAY` type is fixed-length.

#### Creating Arrays {#docs:current:sql:data_types:array::creating-arrays}

Arrays can be created using the [`array_value(expr, ...)` function](#docs:current:sql:functions:array::array_valueindex).

Construct with the `array_value` function:

```sql
SELECT array_value(1, 2, 3);
```

You can always implicitly cast an array to a list (and use list functions, like `list_extract`, `[i]`):

```sql
SELECT array_value(1, 2, 3)[2];
```

You can cast from a list to an array (the dimensions have to match):

```sql
SELECT [3, 2, 1]::INTEGER[3];
```

Arrays can be nested:

```sql
SELECT array_value(array_value(1, 2), array_value(3, 4), array_value(5, 6));
```

Arrays can store structs:

```sql
SELECT array_value({'a': 1, 'b': 2}, {'a': 3, 'b': 4});
```

#### Defining an Array Field {#docs:current:sql:data_types:array::defining-an-array-field}

Arrays can be created using the `⟨TYPE_NAME⟩[⟨LENGTH⟩]`{:.language-sql .highlight} syntax. For example, to create an array field for 3 integers, run:

```sql
CREATE TABLE array_table (id INTEGER, arr INTEGER[3]);
INSERT INTO array_table VALUES (10, [1, 2, 3]), (20, [4, 5, 6]);
```

#### Retrieving Values from Arrays {#docs:current:sql:data_types:array::retrieving-values-from-arrays}

Retrieving one or more values from an array can be accomplished using brackets and slicing notation, or through [list functions](#docs:current:sql:functions:list::list-functions) like `list_extract` and `array_extract`. Using the example in [Defining an Array Field](#::defining-an-array-field).

The following queries for extracting the first element of an array are equivalent:

```sql
SELECT id, arr[1] AS element FROM array_table;
SELECT id, list_extract(arr, 1) AS element FROM array_table;
SELECT id, array_extract(arr, 1) AS element FROM array_table;
```

| id | element |
|---:|--------:|
| 10 | 1       |
| 20 | 4       |

Using the slicing notation returns a `LIST`:

```sql
SELECT id, arr[1:2] AS elements FROM array_table;
```

| id | elements |
|---:|----------|
| 10 | [1, 2]   |
| 20 | [4, 5]   |

#### Functions {#docs:current:sql:data_types:array::functions}

All [`LIST` functions](#docs:current:sql:functions:list) work with the `ARRAY` type. Additionally, several `ARRAY`-native functions are also supported.
See the [`ARRAY` functions](#docs:current:sql:functions:array::array-native-functions).

#### Examples {#docs:current:sql:data_types:array::examples}

Create sample data:

```sql
CREATE TABLE x (i INTEGER, v FLOAT[3]);
CREATE TABLE y (i INTEGER, v FLOAT[3]);
INSERT INTO x VALUES (1, array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT));
INSERT INTO y VALUES (1, array_value(2.0::FLOAT, 3.0::FLOAT, 4.0::FLOAT));
```

Compute cross product:

```sql
SELECT array_cross_product(x.v, y.v)
FROM x, y
WHERE x.i = y.i;
```

Compute cosine similarity:

```sql
SELECT array_cosine_similarity(x.v, y.v)
FROM x, y
WHERE x.i = y.i;
```

#### Ordering {#docs:current:sql:data_types:array::ordering}

The ordering of `ARRAY` instances is defined using a lexicographical order. `NULL` values compare greater than all other values and are considered equal to each other.

#### See Also {#docs:current:sql:data_types:array::see-also}

For more functions, see [List Functions](#docs:current:sql:functions:list).

### Bitstring Type {#docs:current:sql:data_types:bitstring}

| Name | Aliases | Description |
|:---|:---|:---|
| `BITSTRING` | `BIT` | Variable-length strings of 1s and 0s |

Bitstrings are strings of 1s and 0s. The bit type data is of variable length. A bitstring value requires 1 byte for each group of 8 bits, plus a fixed amount to store some metadata.

By default bitstrings will not be padded with zeroes.
Bitstrings can be very large, having the same size restrictions as `BLOB`s.

#### Creating a Bitstring {#docs:current:sql:data_types:bitstring::creating-a-bitstring}

A string encoding a bitstring can be cast to a `BITSTRING`:

```sql
SELECT '101010'::BITSTRING AS b;
```



|   b    |
|--------|
| 101010 |

Creating a `BITSTRING` with a predefined length is possible with the `bitstring` function. The resulting bitstring will be left-padded with zeroes.

```sql
SELECT bitstring('0101011', 12) AS b;
```

|      b       |
|--------------|
| 000000101011 |

Numeric values (integer and float values) can also be converted to a `BITSTRING` via casting. For example:

```sql
SELECT 123::BITSTRING AS b;
```



|                b                 |
|----------------------------------|
| 00000000000000000000000001111011 |

#### Functions {#docs:current:sql:data_types:bitstring::functions}

See [Bitstring Functions](#docs:current:sql:functions:bitstring).

### Blob Type {#docs:current:sql:data_types:blob}

| Name | Aliases | Description |
|:---|:---|:---|
| `BLOB` | `BYTEA`, `BINARY`, `VARBINARY` | Variable-length binary data |

The blob (**B**inary **L**arge **OB**ject) type represents an arbitrary binary object stored in the database system. The blob type can contain any type of binary data with no restrictions. What the actual bytes represent is opaque to the database system.

Create a `BLOB` value with a single byte (170):

```sql
SELECT '\xAA'::BLOB;
```

Create a `BLOB` value with three bytes (170, 171, 172):

```sql
SELECT '\xAA\xAB\xAC'::BLOB;
```

Create a `BLOB` value with two bytes (65, 66):

```sql
SELECT 'AB'::BLOB;
```

Blobs are typically used to store non-textual objects that the database does not provide explicit support for, such as images. While blobs can hold objects up to 4 GB in size, typically it is not recommended to store very large objects within the database system. In many situations it is better to store the large file on the file system, and store the path to the file in the database system in a `VARCHAR` field.

#### Functions {#docs:current:sql:data_types:blob::functions}

See [Blob Functions](#docs:current:sql:functions:blob).

### Boolean Type {#docs:current:sql:data_types:boolean}

| Name | Aliases | Description |
|:---|:---|:---|
| `BOOLEAN` | `BOOL` | Logical Boolean (` true` / `false`) |

The `BOOLEAN` type represents a statement of truth (“true” or “false”). In SQL, the `BOOLEAN` field can also have a third state “unknown” which is represented by the SQL `NULL` value.

Select the three possible values of a `BOOLEAN` column:

```sql
SELECT true, false, NULL::BOOLEAN;
```

Boolean values can be explicitly created using the literals `true` and `false`. However, they are most often created as a result of comparisons or conjunctions. For example, the comparison `i > 10` results in a Boolean value. Boolean values can be used in the `WHERE` and `HAVING` clauses of a SQL statement to filter out tuples from the result. In this case, tuples for which the predicate evaluates to `true` will pass the filter, and tuples for which the predicate evaluates to `false` or `NULL` will be filtered out. Consider the following example:

Create a table with the values 5, 15 and `NULL`:

```sql
CREATE TABLE integers (i INTEGER);
INSERT INTO integers VALUES (5), (15), (NULL);
```

Select all entries where `i > 10`:

```sql
SELECT * FROM integers WHERE i > 10;
```

In this case 5 and `NULL` are filtered out (` 5 > 10` is `false` and `NULL > 10` is `NULL`):

| i  |
|---:|
| 15 |

#### Conjunctions {#docs:current:sql:data_types:boolean::conjunctions}

The `AND` / `OR` conjunctions can be used to combine Boolean values.

Below is the truth table for the `AND` conjunction (i.e., `x AND y`).



| `X` | `X AND true` | `X AND false` | `X AND NULL` |
|-------|-------|-------|-------|
| true  | true  | false | NULL  |
| false | false | false | false |
| NULL  | NULL  | false | NULL  |

Below is the truth table for the `OR` conjunction (i.e., `x OR y`).



| `X` | `X OR true` | `X OR false` | `X OR NULL` |
|-------|------|-------|------|
| true  | true | true  | true |
| false | true | false | NULL |
| NULL  | true | NULL  | NULL |

#### Expressions {#docs:current:sql:data_types:boolean::expressions}

See [Logical Operators](#docs:current:sql:expressions:logical_operators) and [Comparison Operators](#docs:current:sql:expressions:comparison_operators).

### Date Types {#docs:current:sql:data_types:date}

| Name   | Aliases | Description                     |
|:-------|:--------|:--------------------------------|
| `DATE` |         | Calendar date (year, month, day) |

A date specifies a combination of year, month and day. DuckDB follows the SQL standard's lead by counting dates exclusively in the Gregorian calendar, even for years before that calendar was in use. Dates can be created using the `DATE` keyword, where the data must be formatted according to the ISO 8601 format (` YYYY-MM-DD`).

```sql
SELECT DATE '1992-09-20';
```

#### Special Values {#docs:current:sql:data_types:date::special-values}

There are also three special date values that can be used on input:

| Input string | Description                       |
|:-------------|:----------------------------------|
| epoch        | 1970-01-01 (Unix system day zero) |
| infinity     | Later than all other dates        |
| -infinity    | Earlier than all other dates      |

The values `infinity` and `-infinity` are specially represented inside the system and will be displayed unchanged;
but `epoch` is simply a notational shorthand that will be converted to the date value when read.

```sql
SELECT
    '-infinity'::DATE AS negative,
    'epoch'::DATE AS epoch,
    'infinity'::DATE AS positive;
```

| negative  |   epoch    | positive |
|-----------|------------|----------|
| -infinity | 1970-01-01 | infinity |

#### Functions {#docs:current:sql:data_types:date::functions}

See [Date Functions](#docs:current:sql:functions:date).

### Enum Data Type {#docs:current:sql:data_types:enum}

| Name | Description |
|:--|:-----|
| `ENUM` | Dictionary representing all possible string values of a column |

The enum type represents a dictionary data structure with all possible unique values of a column. For example, a column storing the days of the week can be an enum holding all possible days. Enums are particularly interesting for string columns with low cardinality (i.e., fewer distinct values). This is because the column only stores a numerical reference to the string in the enum dictionary, resulting in immense savings in disk storage and faster query performance.

#### Creating Enums {#docs:current:sql:data_types:enum::creating-enums}

You can create an enum using hardcoded values:

```sql
CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');
-- This statement will fail since enums cannot hold NULL values:
-- CREATE TYPE mood AS ENUM ('sad', NULL);
-- This statement will fail since enum values must be unique:
-- CREATE TYPE mood AS ENUM ('sad', 'sad');
```

You can create enums in a specific schema:

```sql
CREATE SCHEMA my_schema;
CREATE TYPE my_schema.mood AS ENUM ('sad', 'ok', 'happy');
```

Anonymous enums can be created on the fly during [casting](#docs:current:sql:expressions:cast):

```sql
SELECT 'clubs'::ENUM ('spades', 'hearts', 'diamonds', 'clubs');
```

You can also create an enum using a `SELECT` statement that returns a single column of `VARCHAR`s.
The set of values from the select statement will be deduplicated automatically,
and `NULL` values will be ignored:

```sql
CREATE TYPE region AS ENUM (SELECT region FROM sales_data);
```

If you are importing data from a file, you can create an enum for a `VARCHAR` column before importing:

```sql
CREATE TYPE region AS ENUM (SELECT region FROM 'sales_data.csv');
CREATE TABLE sales_data (amount INTEGER, region region);
COPY sales_data FROM 'sales_data.csv';
```

#### Using Enums {#docs:current:sql:data_types:enum::using-enums}

Enum values are case-sensitive, so 'maltese' and 'Maltese' are considered different values:

```sql
CREATE TYPE breed AS ENUM ('maltese', 'Maltese');
-- Will return false
SELECT 'maltese'::breed = 'Maltese'::breed;
-- Will error
SELECT 'MALTESE'::breed;
```

After an enum has been created, it can be used anywhere a standard built-in type is used.
For example, we can create a table with a column that references the enum.

```sql
CREATE TABLE person (
    name TEXT,
    current_mood mood
);
INSERT INTO person VALUES
    ('Pedro', 'happy'),
    ('Mark', NULL),
    ('Pagliacci', 'sad'),
    ('Mr. Mackey', 'ok');
```

The following query will fail since the mood type does not have a `quackity-quack` value.

```sql
INSERT INTO person VALUES ('Hannes', 'quackity-quack');
```

#### Enums vs. Strings {#docs:current:sql:data_types:enum::enums-vs-strings}

DuckDB enums are automatically cast to `VARCHAR` types whenever necessary.
This characteristic allows for comparisons between different enums, or an enum and a `VARCHAR` column.

It also allows for an enum to be used in any `VARCHAR` function. For example:

```sql
SELECT current_mood, regexp_matches(current_mood, '.*a.*') AS contains_a FROM person;
```

| current_mood | contains_a |
|:-------------|:-----------|
| happy        | true       |
| NULL         | NULL       |
| sad          | true       |
| ok           | false      |

When comparing two different enum types, DuckDB will cast both to strings and perform a string comparison:

```sql
CREATE TYPE new_mood AS ENUM ('happy', 'anxious');
SELECT * FROM person
WHERE current_mood = 'happy'::new_mood;
-- Equivalent to `WHERE current_mood::VARCHAR = 'happy'::VARCHAR`
```

|   name    | current_mood |
|:----------|:-------------|
| Pedro     | happy        |


When comparing an enum to a `VARCHAR`, DuckDB will cast the enum to `VARCHAR` and perform a string comparison:

```sql
SELECT * FROM person
WHERE current_mood = name;
-- Equivalent to `WHERE current_mood::VARCHAR = name`
-- No rows returned
```

When comparing against a constant string, DuckDB will perform an optimization
and `try_cast(⟨constant string⟩, enum_type)`{:.language-sql .highlight} so that physically
we are doing an integer comparison instead of a string comparison
(but logically it is still a string comparison):

```sql
SELECT * FROM person
WHERE current_mood = 'sad';
-- Equivalent to `WHERE current_mood::VARCHAR = 'sad'`
```

|   name    | current_mood |
|:----------|:-------------|
| Pagliacci | sad          |


> **Warning.** This means that comparing against a random (non-equivalent) string always results in `false` (and does not error):

```sql
SELECT * FROM person
WHERE current_mood = 'bogus';
-- Equivalent to `WHERE current_mood::VARCHAR = 'bogus'`
-- No rows returned
```

If you want to enforce type-safety, cast to the enum explicitly:

```sql
SELECT * FROM person
WHERE current_mood = 'bogus'::mood;
-- Conversion Error: Could not convert string 'bogus' to UINT8
```

#### Ordering of Enums {#docs:current:sql:data_types:enum::ordering-of-enums}

Enum values are ordered according to their order in the enum's definition. For example:

```sql
CREATE TYPE priority AS ENUM ('low', 'medium', 'high');
SELECT 'low'::priority < 'high'::priority AS comp;
-- note that 'low'::VARCHAR < 'high'::VARCHAR is false!
```

| comp |
|-----:|
| true |

```sql
SELECT unnest(['medium'::priority, 'high'::priority, 'low'::priority]) AS m
ORDER BY m;
```

|   m    |
|:-------|
| low    |
| medium |
| high   |

> **Warning.** If you compare an enum to a non-enum (e.g., a `VARCHAR` or a different enum type),
the enum will first be cast to a string (as described in the previous section),
and the comparison will be done lexicographically as with strings:

```sql
CREATE TABLE tasks (name TEXT, priority_level priority);
INSERT INTO tasks VALUES ('a', 'low'), ('b', 'medium'), ('c', 'high');
-- WARNING!
-- Equivalent to `WHERE priority_level::VARCHAR >= 'medium'`
SELECT * FROM tasks
WHERE priority_level >= 'medium';  
-- Misses the 'high' priority task!
```

| name | priority_level  |
|:-----|:----------------|
| b    | medium          |


So, if you want to e.g. "get all priorities at or above `medium`" then explicitly cast to the enum type:

```sql
SELECT * FROM tasks
WHERE priority_level >= 'medium'::priority;
```

| name | priority_level  |
|:-----|:----------------|
| b    | medium          |
| c    | high            |

#### Functions {#docs:current:sql:data_types:enum::functions}

See [Enum Functions](#docs:current:sql:functions:enum).

For example, show the available values in the `mood` enum using the `enum_range` function:

```sql
SELECT enum_range(NULL::mood) AS my_enum_range;
```

|  my_enum_range     |
|--------------------|
| `[sad, ok, happy]` |


#### Enum Removal {#docs:current:sql:data_types:enum::enum-removal}

Enum types are stored in the catalog, and a catalog dependency is added to each table that uses them. It is possible to drop an enum from the catalog using the following command:

```sql
DROP TYPE ⟨enum_name⟩;
```

Currently, it is possible to drop enums that are used in tables without affecting the tables.

> **Warning.** This behavior of the enum removal feature is subject to change. In future releases, it is expected that any dependent columns must be removed before dropping the enum, or the enum must be dropped with the additional `CASCADE` parameter.

### Geometry Data Type {#docs:current:sql:data_types:geometry}

| Name | Description |
|:--|:-----|
| `GEOMETRY` | Geospatial entity |

The `GEOMETRY` data type is used to store and manipulate geometric objects, such as points, lines, and polygons.

The `GEOMETRY` type was part of the [`spatial` extension](#docs:current:core_extensions:spatial:overview) but became a built-in data type in DuckDB v1.5. Most of the benefits of having `GEOMETRY` as a built-in type (e.g., storage optimizations, statistics, etc.) are therefore only available in databases using [storage version v1.5](#docs:current:internals:storage) and above. However, almost all of the associated functions for working with geometries (e.g., calculating distances, areas, intersections) are still part of `spatial`.

#### Types of Geometries {#docs:current:sql:data_types:geometry::types-of-geometries}

Conceptually, the `GEOMETRY` type follows the core data model defined in the [Simple Features](https://en.wikipedia.org/wiki/Simple_Features) standard, which is widely used in geospatial databases and GIS software. A `GEOMETRY` value can therefore represent 7 types of shapes:

| Geometry Type | Description |
|:--|:--|
| **Point** | A single location in space, defined by its coordinates (e.g., longitude and latitude). |
| **LineString** | A sequence of points connected by straight lines, representing a path or route. |
| **Polygon** | A set of closed rings defined by a sequence of points, representing an area such as a country border or a building footprint. The first ring is the "shell", and "interior" rings represent holes in the polygon. |
| **MultiPoint** | A collection of points. |
| **MultiLineString** | A collection of LineStrings. |
| **MultiPolygon** | A collection of Polygons. |
| **GeometryCollection** | A collection of different geometry types, allowing for complex geometries that combine points, lines, and polygons or even other nested geometry collections. |

The textual representation of geometries uses ["Well-Known Text" (WKT)](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry) format. Geometries can be cast to and from WKT strings, so you can use string literals to create geometries directly in SQL statements.

In the following example, we create a `GEOMETRY` column with the 7 different types of supported geometries:

```sql
CREATE TABLE geometries (
    id INTEGER,
    geom GEOMETRY
);

INSERT INTO geometries VALUES
  (1, 'POINT (30 10)'),
  (2, 'LINESTRING (30 10, 10 30, 40 40)'),
  (3, 'POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))'),
  (4, 'MULTIPOINT ((10 40), (40 30), (20 20), (30 10))'),
  (5, 'MULTILINESTRING ((10 10, 20 20, 10 40), (40 40, 30 30, 40 20))'),
  (6, 'MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10,15 5)))'),
  (7, 'GEOMETRYCOLLECTION (POINT(40 10), LINESTRING(10 10,20 20,10 40), POLYGON((40 40,20 45,45 30,40 40)))');

SELECT * FROM geometries;
----
┌───────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│  id   │                                                     geom                                                     │
│ int32 │                                                   geometry                                                   │
├───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│     1 │ POINT (30 10)                                                                                                │
│     2 │ LINESTRING (30 10, 10 30, 40 40)                                                                             │
│     3 │ POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))                                                                │
│     4 │ MULTIPOINT (10 40, 40 30, 20 20, 30 10)                                                                      │
│     5 │ MULTILINESTRING ((10 10, 20 20, 10 40), (40 40, 30 30, 40 20))                                               │
│     6 │ MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5)))                            │
│     7 │ GEOMETRYCOLLECTION (POINT (40 10), LINESTRING (10 10, 20 20, 10 40), POLYGON ((40 40, 20 45, 45 30, 40 40))) │
└───────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
```

#### Multi-Dimensional Geometries {#docs:current:sql:data_types:geometry::multi-dimensional-geometries}

The `GEOMETRY` type is primarily used to model shapes in two dimensions (e.g. `X`/`Y` or `longitude`/`latitude`), but it also supports shapes with additional vertex dimensions such as `Z` for elevation or `M` for "measure", or both.

The vertex dimensions of a `GEOMETRY` value must be consistent across all vertices. For example, if one vertex has `X`, `Y`, and `Z` coordinates, then all other vertices in that geometry must also have `X`, `Y`, and `Z` coordinates. This means that you cannot have a mix of 2D and 3D vertices within the same geometry. This also applies for collections of geometries, such as `MULTIPOINT` or `GEOMETRYCOLLECTION`, where all geometries within the collection must have the same vertex dimensions.

Functions that operate on `GEOMETRY` values typically ignore any additional dimensions beyond the `X` and `Y` unless explicitly specified, but they can still be stored and can be retrieved if needed.

In the following example, we create a `GEOMETRY` table with 2D, 3D(Z), 3D(M) and 4D(ZM) points:

```sql
CREATE TABLE points (
    id INTEGER,
    geom GEOMETRY
);

INSERT INTO points VALUES
  (1, 'POINT (30 10)'),
  (2, 'POINT Z (30 10 5)'),
  (3, 'POINT M (30 10 1)'),
  (4, 'POINT ZM (30 10 5 1)');

SELECT * FROM points;
----
┌───────┬──────────────────────┐
│  id   │         geom         │
│ int32 │       geometry       │
├───────┼──────────────────────┤
│     1 │ POINT (30 10)        │
│     2 │ POINT Z (30 10 5)    │
│     3 │ POINT M (30 10 1)    │
│     4 │ POINT ZM (30 10 5 1) │
└───────┴──────────────────────┘

-- But we cannot mix different vertex dimensions within the same geometry!
INSERT INTO points VALUES
  (5, 'MULTIPOINT (POINT (30 10), POINT Z (30 10 5))');
----
Invalid Input Error:
Geometry has inconsistent Z/M dimension
```

#### Empty Geometries {#docs:current:sql:data_types:geometry::empty-geometries}

Geometries can also be "empty" (e.g., `POINT EMPTY`, `LINESTRING EMPTY`, `MULTIPOLYGON EMPTY`, etc.) which means they don't contain any vertices. Empty geometries are still valid geometries and can be used in spatial operations, but they are mostly useful for representing the result of topological operations that don't have a valid geometrical representation (e.g., the intersection of two non-overlapping geometries is an empty geometry).

#### Geometry Storage {#docs:current:sql:data_types:geometry::geometry-storage}

Internally `GEOMETRY` values are stored as a sequence of bytes, similarly to DuckDB's `BLOB` types. The exact binary format is not yet stabilized and may change in a future release, but as of DuckDB [storage version v1.5](#docs:current:internals:storage) it is based on little-endian [Well-Known Binary (WKB)](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Well-known_binary), which is a standard binary encoding for geometries. In older storage versions, geometries were stored in a different custom binary format used by the `spatial` extension, but this conversion is performed automatically at the storage layer and is not visible to the execution engine or the user.

##### Shredding and Compression {#docs:current:sql:data_types:geometry::shredding-and-compression}

The `GEOMETRY` type supports a storage optimization called "shredding", which improves compression for geometry columns where all values share the same geometry type and vertex dimensions.

When a row group qualifies, DuckDB splits the geometry segment within the row group into primitive `STRUCT`, `LIST`, and `DOUBLE` segments that can be compressed independently using lightweight algorithms - far more efficiently than storing variable-size binary blobs.

The shredded layout depends on the geometry type:

- `POINT` - STRUCT(X DOUBLE, Y DOUBLE) (and/or Z, M)
- `LINESTRING` - STRUCT(X DOUBLE, Y DOUBLE)[]
- `POLYGON` - STRUCT(X DOUBLE, Y DOUBLE)[][]
- `MULTIPOINT`, `MULTILINESTRING`, `MULTIPOLYGON` - same as above, with one additional level of list nesting
 
Row groups are not shredded if they contain `GEOMETRYCOLLECTION`s, any `EMPTY` geometries, or multiple geometry sub-types.

Additionally, row groups are not shredded if they fall below the minimum size threshold (default: ~25% of the maximum row group size, i.e., 30,000 rows).

This threshold is configurable via the `geometry_minimum_shredding_size` setting. Set it to `0` to always shred, or `-1` to disable shredding entirely.

```sql
-- Disable shredding for geometry columns
SET geometry_minimum_shredding_size = -1;

-- Always shred geometry columns regardless of row group size
SET geometry_minimum_shredding_size = 0;
```

The primary benefit of shredding is significantly improved compression, but in the future we plan to add ways to expose the shredded representation directly to the execution engine without having to "reassemble" the geometry back into binary again.

The following example illustrates the effects of shredding on the storage footprint of a `GEOMETRY` column.

```sql
-- Attach a persistent database with storage version v1.5
ATTACH 'geometry_db.db' as geometry_db (STORAGE_VERSION 'v1.5.0');

USE geometry_db;

-- Disable shredding completely and create a table with 1 million 2D points
SET geometry_minimum_shredding_size = -1;

CREATE OR REPLACE TABLE points AS SELECT printf('POINT (%d %d)', x, y)::GEOMETRY AS geom 
FROM range(0, 1000) AS rx(x), range(0, 1000) AS ry(y);

-- Checkpoint the database to persist the data and storage layout to disk
CHECKPOINT;

-- Attach a second database
ATTACH 'shredded_db.db' as shredded_db (STORAGE_VERSION 'v1.5.0');

USE shredded_db;

-- This time, set the minimum shredding size to 0 to always shred geometry columns,
-- and create the same table with 1 million 2D points
SET geometry_minimum_shredding_size = 0;

CREATE OR REPLACE TABLE points AS SELECT printf('POINT (%d %d)', x, y)::GEOMETRY AS geom 
FROM range(0, 1000) AS rx(x), range(0, 1000) AS ry(y);

-- Checkpoint to persist the data and storage layout to disk, and apply shredding
CHECKPOINT;

-- Now check the storage layout and memory usage of the geometry column in both attached databases
SELECT database_name, database_size FROM pragma_database_size();
----
┌───────────────┬───────────────┐
│ database_name │ database_size │
│    varchar    │    varchar    │
├───────────────┼───────────────┤
│ shredded_db   │ 2.2 MiB       │ -- Almost 3x smaller storage thanks to shredding!
│ geometry_db   │ 6.5 MiB       │ 
│ memory        │ 0 bytes       │
└───────────────┴───────────────┘

-- We can inspect what type of segments are used to store the geometry column 
-- in each database using the `pragma_storage_info` function. 

-- The geometry column in `geometry_db` is stored as regular GEOMETRY segments
SELECT DISTINCT(segment_type) FROM pragma_storage_info('geometry_db.points');
----
┌──────────────┐
│ segment_type │
│   varchar    │
├──────────────┤
│ GEOMETRY     │
│ VALIDITY     │
└──────────────┘

-- While the geometry column in `shredded_db` is decomposed into primitive DOUBLE segments,
-- which can be compressed much more efficiently!
SELECT DISTINCT(segment_type) FROM pragma_storage_info('shredded_db.points');
----
┌──────────────┐
│ segment_type │
│   varchar    │
├──────────────┤
│ VALIDITY     │
│ DOUBLE       │
└──────────────┘
```

##### Geometry Statistics  {#docs:current:sql:data_types:geometry::geometry-statistics-}

`GEOMETRY` columns contain geometry-specific statistics that track the bounding box of the geometries in each row group, as well as the set of geometry types and vertex dimensions that are present within the row group.

You can inspect the statistics of a column using the `stats()` function:

```sql
CREATE TABLE geometries as select 'POINT Z (30 10 5)'::GEOMETRY as geom;

SELECT stats(geom) AS geom_stats FROM geometries;
----
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                             geom_stats                                                                                              │
│                                                                                               varchar                                                                                               │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ [Extent: [X: [30.000000, 30.000000], Y: [10.000000, 10.000000], Z: [5.000000, 5.000000], M: [inf, -inf]], Types: [point_z], Flags: [Has Empty Geom: false, Has No Empty Geom: true, Has Empty Part: │
│  false, Has No Empty Part: true]][Has Null: false, Has No Null: true][Approx Unique: 1]                                                                                                             │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
```

These statistics can be used by the query optimizer to skip row groups that don't match the geometry type or vertex dimensions required by a query, or to speed up spatial predicates by first checking if the bounding box of the geometries in the row group overlaps with the bounding box of the query geometry.

Currently, only the `&&` operator, which is used to check if the bounding box of a geometry intersects the bounding box of another geometry, can take advantage of geometry statistics when used in a `WHERE` clause. There is ongoing work to add support for more statistics-based optimizations to the functions in the `spatial` extension, such as `ST_Intersects`, `ST_Distance`, etc.

Persisting geometry statistics is only possible in storage versions v1.5 and above, and so if you are using an older storage version, the geometry statistics will turn into "unknown" statistics when checkpointing. In other words, the bounding box will be set to an infinitely large bounding box and all geometry types and vertex dimensions will be marked as maybe present, which means that the execution engine will not be able to do any optimizations based on the geometry statistics.

#### Coordinate Reference Systems {#docs:current:sql:data_types:geometry::coordinate-reference-systems}

As far as the execution engine is concerned, geometries are considered to exist in a Cartesian coordinate system. In practice, however, most geospatial data is associated with a specific **Coordinate Reference System** (CRS) that defines how the coordinates relate to real-world locations on the Earth's surface.

A helpful analogy is to think of CRSs as the equivalent of "time zones", but for geospatial data. Just like how time zones define how local time relates to a standard reference time (e.g., UTC), CRSs define how the coordinates of a geometry relate to a standard reference system (e.g., WGS 84). CRSs are usually either geographic (e.g., WGS 84, which uses latitude and longitude) or projected (e.g., UTM, which uses linear units like meters).

When working with geospatial data, it's important to be aware of the CRS associated with different datasets. Performing spatial operations on geometries in different CRSs without proper transformation will most likely lead to incorrect results.

##### How are Coordinate Reference Systems stored in DuckDB? {#docs:current:sql:data_types:geometry::how-are-coordinate-reference-systems-stored-in-duckdb}

To avoid these kinds of mistakes, DuckDB makes it possible to explicitly associate a CRS with a `GEOMETRY` column. 

This is done by passing a CRS "identifier" as a parameter of the `GEOMETRY` type. For example, a column of type `GEOMETRY('OGC:CRS84')` stores geometries that are associated with the "OGC CRS84" coordinate reference system. 

CRS identifiers in DuckDB are always strings. `OGC:CRS84` is the identifier for a common geographic coordinate system spanning the whole globe where the `X` coordinate represents longitude and the `Y` coordinate represents latitude. DuckDB only knows this because the identifier 'OGC:CRS84' is registered as a _known_ CRS in the system catalog.

By default, only a handful of common CRSs are registered as known, but extensions can also register additional known CRSs. In particular, the `spatial` extension registers over 7000 CRSs from the [EPSG Geodetic Parameter Dataset](https://epsg.org/home.html), which is arguably the most widely used database of coordinate reference systems. 

You can list all available CRSs known to DuckDB using the `duckdb_coordinate_systems()` function:

```sql
SELECT * FROM duckdb_coordinate_systems();
----
┌───────────────┬──────────────┬─────────────┬────────────┬─────────┬───────────┬───────────┬───────────┬───────────────────────────────────────┬───────────────────────────────────────┐
│ database_name │ database_oid │ schema_name │ schema_oid │ crs_oid │ crs_name  │ auth_name │ auth_code │               projjson                │               wkt2_2019               │
│    varchar    │    int64     │   varchar   │   int64    │  int64  │  varchar  │  varchar  │  varchar  │                varchar                │                varchar                │
├───────────────┼──────────────┼─────────────┼────────────┼─────────┼───────────┼───────────┼───────────┼───────────────────────────────────────┼───────────────────────────────────────┤
│ system        │            0 │ main        │          0 │    1354 │ OGC:CRS83 │ OGC       │ CRS83     │ {"$schema":"https://proj.org/schemas… │ GEOGCRS["NAD83 (CRS83)",DATUM["North… │
│ system        │            0 │ main        │          0 │    1353 │ OGC:CRS84 │ OGC       │ CRS84     │ {"$schema":"https://proj.org/schemas… │ GEOGCRS["WGS 84 (CRS84)",ENSEMBLE["W… │
└───────────────┴──────────────┴─────────────┴────────────┴─────────┴───────────┴───────────┴───────────┴───────────────────────────────────────┴───────────────────────────────────────┘
```

##### Handling Unknown Coordinate Reference Systems {#docs:current:sql:data_types:geometry::handling-unknown-coordinate-reference-systems}

As mentioned above, only coordinate systems that are registered in the system catalog (and therefore "known" to DuckDB) can be used when creating `GEOMETRY` columns.
If you try to create a `GEOMETRY` column with an unknown CRS identifier, either manually or by importing an external geospatial dataset, the statement will fail with an error.

```sql
SELECT 'POINT(1 2)'::GEOMETRY('DUCK:1337') AS my_point;
----
Binder Error:
Encountered unrecognized coordinate system 'DUCK:1337' when trying to create GEOMETRY type
The coordinate system definition may be incomplete or invalid ...
```

This restriction exists because DuckDB needs the complete CRS definition, not just an identifier, to perform coordinate transformations and to export to formats that embed CRS metadata, such as GeoParquet. Without a system catalog entry, there is no way to resolve an identifier to its full definition.

You can set the `ignore_unknown_crs` configuration option to `true` to simply skip any unknown CRSs and create `GEOMETRY` columns without CRS instead.

```sql

-- Ignore any unknown CRS identifiers
SET ignore_unknown_crs = true;

select 'POINT(1 2)'::GEOMETRY('DUCK:1337') AS my_point;
----
┌─────────────┐
│  my_point   │
│  geometry   │ -- The geometry is created, but the CRS is dropped from the type!
├─────────────┤
│ POINT (1 2) │
└─────────────┘
```

Alternatively, if you are trying to define a `GEOMETRY` column yourself, you can provide a complete CRS definition in WKT or PROJJSON format instead of a shorthand identifier as the CRS parameter. However, as complete CRS definitions are usually very large, this gets unwieldy very quickly and is not recommended for interactive use.

It is currently not possible to define a custom CRS from within SQL, or to persist custom CRS definitions in a database such that DuckDB can use them to resolve CRS identifiers for geometry columns, but this is something we are considering for the future.

##### Working with Geometries in Different Coordinate Reference Systems {#docs:current:sql:data_types:geometry::working-with-geometries-in-different-coordinate-reference-systems}

One benefit of tracking CRSs as part of the type system is that it prevents a lot of common mistakes that can occur when working with geometries from different coordinate systems. Most spatial functions that operate on multiple `GEOMETRY` values verify that all input expressions have the same CRS before performing the operation. Similarly, `GEOMETRY` columns can only be implicitly cast to and from other `GEOMETRY` columns if the source or the target don't have a CRS specified.

To convert a geometry from one CRS to another, you can use the `ST_Transform(geom, crs)` function from the `spatial` extension. 

```sql

LOAD spatial;

SELECT ST_Transform('POINT(4.897070 52.377956)'::GEOMETRY('OGC:CRS84'), 'EPSG:3857') AS transformed;
----
┌────────────────────────────────────────────┐
│                transformed                 │
│           geometry('epsg:3857')            │
├────────────────────────────────────────────┤
│ POINT (545139.3387790163 6868755.38408516) │
└────────────────────────────────────────────┘
```

You can also use the `ST_SetCRS(geom, crs)` function to assign a CRS to a geometry that doesn't have one, or to reassign a CRS without transforming coordinates (e.g., when the data is already in the correct coordinate system but lacks the correct CRS).

```sql
SELECT ST_SetCRS('POINT(4.897070 52.377956)'::GEOMETRY, 'OGC:CRS84') AS with_crs;
----
┌───────────────────────────┐
│         with_crs          │
│   geometry('ogc:crs84')   │
├───────────────────────────┤
│ POINT (4.89707 52.377956) │
└───────────────────────────┘
```

Or if you want to remove the CRS from a geometry, you can either just cast to `GEOMETRY`, or set the CRS to `''`:

```sql
SELECT 'POINT(4.897070 52.377956)'::GEOMETRY('OGC:CRS84')::GEOMETRY AS no_crs;
----
┌───────────────────────────┐
│          no_crs           │
│         geometry          │
├───────────────────────────┤
│ POINT (4.89707 52.377956) │
└───────────────────────────┘

SELECT ST_SetCRS('POINT(4.897070 52.377956)'::GEOMETRY('OGC:CRS84'), '') AS no_crs;
----
┌───────────────────────────┐
│          no_crs           │
│         geometry          │
├───────────────────────────┤
│ POINT (4.89707 52.377956) │
└───────────────────────────┘
```

You can of course also use `ST_CRS(geom)` to retrieve the CRS of a geometry:

```sql
SELECT ST_CRS('POINT(4.897070 52.377956)'::GEOMETRY('OGC:CRS84')) AS crs;
----
┌───────────┐
│    crs    │
│  varchar  │
├───────────┤
│ OGC:CRS84 │
└───────────┘
```

#### Functions {#docs:current:sql:data_types:geometry::functions}

- See [geometry functions](#docs:current:sql:functions:geometry) for the list of built-in geometry functions.
- See the documentation of the [`spatial` extension](#docs:current:core_extensions:spatial:overview) for the large set of additional geometry functions provided by the extension, including functions for calculating areas, distances, intersections, unions, and much more.

### Interval Type {#docs:current:sql:data_types:interval}

`INTERVAL`s represent periods of time that can be added to or subtracted from `DATE`, `TIMESTAMP`, `TIMESTAMPTZ`, or `TIME` values.

| Name | Description |
|:---|:---|
| `INTERVAL` | Period of time |

An `INTERVAL` can be constructed by providing amounts together with units.
Units that aren't *months*, *days*, or *microseconds* are converted to equivalent amounts in the next smaller of these three basis units.

```sql
SELECT
    INTERVAL 1 YEAR, -- single unit using YEAR keyword; stored as 12 months
    INTERVAL (random() * 10) YEAR, -- parentheses necessary for variable amounts;
                                   -- stored as integer number of months
    INTERVAL '1 month 1 day', -- string type necessary for multiple units; stored as (1 month, 1 day)
    '16 months'::INTERVAL, -- string cast supported; stored as 16 months
    '48:00:00'::INTERVAL, -- HH::MM::SS string supported; stored as (48 * 60 * 60 * 1e6 microseconds)
;
```

> **Warning.** Decimal values are truncated to integers when used with unit keywords (unless the unit is `SECONDS` or `MILLISECONDS`).
>
> ```sql
> SELECT INTERVAL '1.5' YEARS;
> -- Returns 12 months; equivalent to `to_years(CAST(trunc(1.5) AS INTEGER))`
> ```
>
> For more precision, include the unit in the string or use a more granular unit; e.g., `INTERVAL '1.5 years'` or `INTERVAL 18 MONTHS`.

Three independent basis units are necessary because a month does not correspond to a fixed amount of days (February has fewer days than March) and a day doesn't correspond to a fixed amount of microseconds (days can be 25 hours or 23 hours long because of daylight saving time).
The division into components makes the `INTERVAL` class suitable for adding or subtracting specific time units to a date. For example, we can generate a table with the first day of every month using the following SQL query:

```sql
SELECT DATE '2000-01-01' + INTERVAL (i) MONTH
FROM range(12) t(i);
```

When `INTERVAL`s are deconstructed via the `datepart` function, the *months* component is additionally split into years and months, and the *microseconds* component is split into hours, minutes and microseconds. The *days* component is not split into additional units. To demonstrate this, the following query generates an `INTERVAL` called `period` by summing random amounts of the three basis units. It then extracts the aforementioned six parts from `period`, adds them back together, and confirms that the result is always equal to the original `period`.

```sql
SELECT
    period = list_reduce(
        [INTERVAL (datepart(part, period) || part) FOR part IN
             ['year', 'month', 'day', 'hour', 'minute', 'microsecond']
        ],
        (i1, i2) -> i1 + i2
    ) -- always true
FROM (
    VALUES (
        INTERVAL (random() * 123_456_789_123) MICROSECONDS
        + INTERVAL (random() * 12_345) DAYS
        + INTERVAL (random() * 12_345) MONTHS
    )
) _(period);
```

> **Warning.** The *microseconds* component is split only into hours, minutes and microseconds, rather than hours, minutes, *seconds* and microseconds.

The following table describes how these parts are extracted by `datepart` in formulas, as a function of the three basis units.

| Part                 | Formula                                          |
|----------------------|--------------------------------------------------|
| `year`               | `#months // 12`                                  |
| `month`              | `#months % 12`                                   |
| `day`                | `#days`                                          | 
| `hour`               | `#microseconds // (60 * 60 * 1_000_000)`         |
| `minute`             | `(#microseconds // (60 * 1_000_000)) % 60`       |
| `microsecond`        | `#microseconds % (60 * 1_000_000)`               |

Additionally, `datepart` may be used to extract centuries, decades, quarters, seconds and milliseconds from `INTERVAL`s. However, these parts are not required when reassembling the original `INTERVAL`. In fact, if the previous query additionally extracted any of these additional parts, then the sum of the extracted parts would generally be larger than the original `period`.

| Part                 | Formula                                          |
|----------------------|--------------------------------------------------|
| `century`            | `datepart('year', interval) // 100`              |
| `decade`             | `datepart('year', interval) // 10`               |
| `quarter`            | `datepart('month', interval) // 3 + 1`           | 
| `second`             | `datepart('microsecond', interval) // 1_000_000` |
| `millisecond`        | `datepart('microsecond', interval) // 1_000`     |

> All units use 0-based indexing, except for quarters, which use 1-based indexing.

For example:

```sql
SELECT
    datepart('decade', INTERVAL 12 YEARS), -- returns 1
    datepart('year', INTERVAL 12 YEARS), -- returns 12
    datepart('second', INTERVAL 1_234 MILLISECONDS), -- returns 1
    datepart('microsecond', INTERVAL 1_234 MILLISECONDS), -- returns 1_234_000
;
```

#### Arithmetic with Timestamps, Dates and Intervals {#docs:current:sql:data_types:interval::arithmetic-with-timestamps-dates-and-intervals}

`INTERVAL`s can be added to and subtracted from `TIMESTAMP`s, `TIMESTAMPTZ`s, `DATE`s, and `TIME`s using the `+` and `-` operators.

```sql
SELECT
    DATE '2000-01-01' + INTERVAL 1 YEAR,
    TIMESTAMP '2000-01-01 01:33:30' - INTERVAL '1 month 13 hours',
    TIME '02:00:00' - INTERVAL '3 days 23 hours', -- wraps; equals TIME '03:00:00'
;
```

> Adding an `INTERVAL` to a `DATE` returns a `TIMESTAMP` even when the `INTERVAL` has no microseconds component. The result is the same as if the `DATE` was cast to a `TIMESTAMP` (which sets the time component to `00:00:00`) before adding the `INTERVAL`.

Conversely, subtracting two `TIMESTAMP`s or two `TIMESTAMPTZ`s from one another creates an `INTERVAL` describing the difference between the timestamps with only the *days and microseconds* components. For example:

```sql
SELECT
    TIMESTAMP '2000-02-06 12:00:00' - TIMESTAMP '2000-01-01 11:00:00', -- 36 days 1 hour
    TIMESTAMP '2000-02-01' + (TIMESTAMP '2000-02-01' - TIMESTAMP '2000-01-01'), -- '2000-03-03', NOT '2000-03-01'
;
```

Subtracting two `DATE`s from one another does not create an `INTERVAL` but rather returns the number of days between the given dates as integer value.

> **Warning.** Extracting a part of the `INTERVAL` difference between two `TIMESTAMP`s is not equivalent to computing the number of partition boundaries between the two `TIMESTAMP`s for the corresponding unit, as computed by the `datediff` function:
>
> ```sql
> SELECT
>     datediff('day', TIMESTAMP '2020-01-01 01:00:00', TIMESTAMP '2020-01-02 00:00:00'), -- 1
>     datepart('day', TIMESTAMP '2020-01-02 00:00:00' - TIMESTAMP '2020-01-01 01:00:00'), -- 0
> ;
> ```

#### Equality and Comparison {#docs:current:sql:data_types:interval::equality-and-comparison}

For equality and ordering comparisons only, the total number of microseconds in an `INTERVAL` is computed by converting the days basis unit to `24 * 60 * 60 * 1e6` microseconds and the months basis unit to 30 days, or `30 * 24 * 60 * 60 * 1e6` microseconds.

As a result, `INTERVAL`s can compare equal even when they are functionally different, and the ordering of `INTERVAL`s is not always preserved when they are added to dates or timestamps.

For example:

* `INTERVAL 30 DAYS = INTERVAL 1 MONTH`
* but `DATE '2020-01-01' + INTERVAL 30 DAYS != DATE '2020-01-01' + INTERVAL 1 MONTH`.

and

* `INTERVAL '30 days 12 hours' > INTERVAL 1 MONTH`
* but `DATE '2020-01-01' + INTERVAL '30 days 12 hours' < DATE '2020-01-01' + INTERVAL 1 MONTH`.

#### Functions {#docs:current:sql:data_types:interval::functions}

See the [Date Part Functions page](#docs:current:sql:functions:datepart) for a list of available date parts for use with an `INTERVAL`.

See the [Interval Operators page](#docs:current:sql:functions:interval) for functions that operate on intervals.

### List Type {#docs:current:sql:data_types:list}

A `LIST` column encodes lists of values. Fields in the column can have values with different lengths, but they must all have the same underlying type. `LIST`s are typically used to store arrays of numbers, but can contain any uniform data type, including other `LIST`s and `STRUCT`s.

`LIST`s are similar to PostgreSQL's `ARRAY` type. DuckDB uses the `LIST` terminology, but some [`array_` functions](#docs:current:sql:functions:list) are provided for PostgreSQL compatibility.

See the [data types overview](#docs:current:sql:data_types:overview) for a comparison between nested data types.

> For storing fixed-length lists, DuckDB uses the [`ARRAY` type](#docs:current:sql:data_types:array).

#### Creating Lists {#docs:current:sql:data_types:list::creating-lists}

Lists can be created using the [`list_value(expr, ...)`](#docs:current:sql:functions:list::list_valueany-) function or the equivalent bracket notation `[expr, ...]`. The expressions can be constants or arbitrary expressions. To create a list from a table column, use the [`list`](#docs:current:sql:functions:aggregates::general-aggregate-functions) aggregate function.

List of integers:

```sql
SELECT [1, 2, 3];
```

List of strings with a `NULL` value:

```sql
SELECT ['duck', 'goose', NULL, 'heron'];
```

List of lists with `NULL` values:

```sql
SELECT [['duck', 'goose', 'heron'], NULL, ['frog', 'toad'], []];
```

Create a list with the list_value function:

```sql
SELECT list_value(1, 2, 3);
```

Create a table with an `INTEGER` list column and a `VARCHAR` list column:

```sql
CREATE TABLE list_table (int_list INTEGER[], varchar_list VARCHAR[]);
```

#### Retrieving from Lists {#docs:current:sql:data_types:list::retrieving-from-lists}

Retrieving one or more values from a list can be accomplished using brackets and slicing notation, or through [list functions](#docs:current:sql:functions:list) like `list_extract`. Multiple equivalent functions are provided as aliases for compatibility with systems that refer to lists as arrays. For example, the function `array_slice`.





| Example                                  | Result     |
|:-----------------------------------------|:-----------|
| SELECT ['a', 'b', 'c'][3]                | 'c'        |
| SELECT ['a', 'b', 'c'][-1]               | 'c'        |
| SELECT ['a', 'b', 'c'][2 + 1]            | 'c'        |
| SELECT list_extract(['a', 'b', 'c'], 3)  | 'c'        |
| SELECT ['a', 'b', 'c'][1:2]              | ['a', 'b'] |
| SELECT ['a', 'b', 'c'][:2]               | ['a', 'b'] |
| SELECT ['a', 'b', 'c'][-2:]              | ['b', 'c'] |
| SELECT list_slice(['a', 'b', 'c'], 2, 3) | ['b', 'c'] |



#### Comparison and Ordering {#docs:current:sql:data_types:list::comparison-and-ordering}

The `LIST` type can be compared using all the [comparison operators](#docs:current:sql:expressions:comparison_operators).
These comparisons can be used in [logical expressions](#docs:current:sql:expressions:logical_operators)
such as `WHERE` and `HAVING` clauses, and return [`BOOLEAN` values](#docs:current:sql:data_types:boolean).

The `LIST` ordering is defined positionally using the following rules, where `min_len = min(len(l1), len(l2))`.

* **Equality.** `l1` and `l2` are equal, if for each `i` in `[1, min_len]`: `l1[i] = l2[i]`.
* **Less Than**. For the first index `i` in `[1, min_len]` where `l1[i] != l2[i]`:
  If `l1[i] < l2[i]`, `l1` is less than `l2`.

`NULL` values are compared following PostgreSQL's semantics.
Lower nesting levels are used for tie-breaking.

Here are some queries returning `true` for the comparison.

```sql
SELECT [1, 2] < [1, 3] AS result;
```

```sql
SELECT [[1], [2, 4, 5]] < [[2]] AS result;
```

```sql
SELECT [ ] < [1] AS result;
```

These queries return `false`.

```sql
SELECT [ ] < [ ] AS result;
```

```sql
SELECT [1, 2] < [1] AS result;
```

These queries return `NULL`.

```sql
SELECT [1, 2] < [1, NULL, 4] AS result;
```

#### Functions {#docs:current:sql:data_types:list::functions}

See [List Functions](#docs:current:sql:functions:list).

### Literal Types {#docs:current:sql:data_types:literal_types}

DuckDB has special literal types for representing `NULL`, integer and string literals in queries. These have their own binding and conversion rules.

> Prior to DuckDB version 0.10.0, integer and string literals behaved identically to the `INTEGER` and `VARCHAR` types.

#### Null Literals {#docs:current:sql:data_types:literal_types::null-literals}

The `NULL` literal is denoted with the keyword `NULL`. The `NULL` literal can be implicitly converted to any other type.

#### Integer Literals {#docs:current:sql:data_types:literal_types::integer-literals}

Integer literals are denoted as a sequence of one or more decimal digits. At runtime, these result in values of the `INTEGER_LITERAL` type. `INTEGER_LITERAL` types can be implicitly converted to any [integer type](#docs:current:sql:data_types:numeric::integer-types) in which the value fits. For example, the integer literal `42` can be implicitly converted to a `TINYINT`, but the integer literal `1000` cannot be.

> DuckDB does not support hexadecimal or binary literals directly. However, strings or string literals in hexadecimal or binary notation with `0x` or `0b` prefixes respectively, can be cast to integer types, e.g., `'0xFF'::INT = 255` or `0b101::INT = 5`.

#### Other Numeric Literals {#docs:current:sql:data_types:literal_types::other-numeric-literals}

Non-integer numeric literals can be denoted with decimal notation, using the period character (` .`) to separate the integer part and the decimal part of the number.
Either the integer part or the decimal part may be omitted:

```sql
SELECT 1.5;          -- 1.5
SELECT .50;          -- 0.5
SELECT 2.;           -- 2.0
```

Non-integer numeric literals can also be denoted using [_E notation_](https://en.wikipedia.org/wiki/Scientific_notation#E_notation). In E notation, an integer or decimal literal is followed by an exponential part, which is denoted by `e` or `E`, followed by a literal integer indicating the exponent.
The exponential part indicates that the preceding value should be multiplied by 10 raised to the power of the exponent:

```sql
SELECT 1e2;           -- 100
SELECT 6.02214e23;    -- Avogadro's constant
SELECT 1e-10;         -- 1 ångström
```

#### Underscores in Numeric Literals {#docs:current:sql:data_types:literal_types::underscores-in-numeric-literals}

DuckDB's SQL dialect allows using the underscore character `_` in numeric literals as an optional separator. The rules for using underscores are as follows:

* Underscores are allowed in integer, decimal, hexadecimal and binary notation.
* Underscores cannot be the first or last character in a literal.
* Underscores have to have an integer/numeric part on either side of them, i.e., there cannot be multiple underscores in a row and underscores cannot appear immediately before or after a decimal or exponent.

Examples:

```sql
SELECT 100_000_000;          -- 100000000
SELECT '0xFF_FF'::INTEGER;   -- 65535
SELECT 1_2.1_2E0_1;          -- 121.2
SELECT '0b0_1_0_1'::INTEGER; -- 5
```

#### String Literals {#docs:current:sql:data_types:literal_types::string-literals}

String literals are delimited using single quotes (` '`, apostrophe) and result in `STRING_LITERAL` values.
Note that double quotes (` "`) cannot be used as string delimiter character: instead, double quotes are used to delimit [quoted identifiers](#docs:current:sql:dialect:keywords_and_identifiers::identifiers).

##### Implicit String Literal Concatenation {#docs:current:sql:data_types:literal_types::implicit-string-literal-concatenation}

Consecutive single-quoted string literals separated only by whitespace that contains at least one newline are implicitly concatenated:

```sql
SELECT 'Hello'
    ' '
    'World' AS greeting;
```

is equivalent to:

```sql
SELECT 'Hello'
    || ' '
    || 'World' AS greeting;
```

They both return the following result:

|  greeting   |
|-------------|
| Hello World |

Note that implicit concatenation only works if there is at least one newline between the literals. Using adjacent string literals separated by whitespace without a newline results in a syntax error:

```sql
SELECT 'Hello' ' ' 'World' AS greeting;
```

```console
Parser Error:
syntax error at or near "' '"

LINE 1: SELECT 'Hello' ' ' 'World' AS greeting;
                       ^
```

Also note that implicit concatenation only works with single-quoted string literals, and does not work with other kinds of string values.

##### Implicit String Conversion {#docs:current:sql:data_types:literal_types::implicit-string-conversion}

`STRING_LITERAL` instances can be implicitly converted to _any_ other type.

For example, we can compare string literals with dates:

```sql
SELECT d > '1992-01-01' AS result
FROM (VALUES (DATE '1992-01-01')) t(d);
```

| result |
|:-------|
| false  |

However, we cannot compare `VARCHAR` values with dates.

```sql
SELECT d > '1992-01-01'::VARCHAR
FROM (VALUES (DATE '1992-01-01')) t(d);
```

```console
Binder Error:
Cannot compare values of type DATE and type VARCHAR - an explicit cast is required
```

##### Escape String Literals {#docs:current:sql:data_types:literal_types::escape-string-literals}

To escape a single quote (apostrophe) character in a string literal, use `''`. For example, `SELECT '''' AS s` returns `'`.

To enable some common escape sequences, such as `\n` for the newline character, prefix a string literal with `e` (or `E`).

```sql
SELECT e'Hello\nworld' AS msg;
```



```text
┌──────────────┐
│     msg      │
│   varchar    │
├──────────────┤
│ Hello\nworld │
└──────────────┘
```

The following backslash escape sequences are supported:

| Escape sequence | Name | ASCII code |
|:--|:--|--:|
| `\b` | backspace | 8 |
| `\f` | form feed | 12 |
| `\n` | newline | 10 |
| `\r` | carriage return |  13 |
| `\t` | tab | 9 |

##### Dollar-Quoted String Literals {#docs:current:sql:data_types:literal_types::dollar-quoted-string-literals}

DuckDB supports dollar-quoted string literals, which are surrounded by double-dollar symbols (` $$`):

```sql
SELECT $$Hello
world$$ AS msg;
```



```text
┌──────────────┐
│     msg      │
│   varchar    │
├──────────────┤
│ Hello\nworld │
└──────────────┘
```

```sql
SELECT $$The price is $9.95$$ AS msg;
```

|        msg         |
|--------------------|
| The price is $9.95 |

Even more, you can insert alphanumeric tags in the double-dollar symbols to allow for the use of regular double-dollar symbols *within* the string literal:

```sql
SELECT $tag$ this string can contain newlines,
'single quotes',
"double quotes",
and $$dollar quotes$$ $tag$ AS msg;
```



```text
┌────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                              msg                                               │
│                                            varchar                                             │
├────────────────────────────────────────────────────────────────────────────────────────────────┤
│  this string can contain newlines,\n'single quotes',\n"double quotes",\nand $$dollar quotes$$  │
└────────────────────────────────────────────────────────────────────────────────────────────────┘
```

[Implicit concatenation](#::implicit-string-literal-concatenation) only works for single-quoted string literals, not with dollar-quoted ones.

### Map Type {#docs:current:sql:data_types:map}

`MAP`s are similar to `STRUCT`s in that they are an ordered list of key-value pairs. However, `MAP`s do not need to have the same keys present for each row, and thus are suitable for use cases where the schema is unknown beforehand or varies per row.

`MAP`s must have a single type for all keys, and a single type for all values. Keys and values can be any type, and the type of the keys does not need to match the type of the values (e.g., a `MAP` of `VARCHAR` to `INT` is valid). `MAP`s may not have duplicate keys. `MAP`s return `NULL` if a key is not found rather than throwing an error as structs do.

In contrast, `STRUCT`s must have string keys, but each value may have a different type. See the [data types overview](#docs:current:sql:data_types:overview) for a comparison between nested data types.

To construct a `MAP`, use the bracket syntax preceded by the `MAP` keyword.

#### Creating Maps {#docs:current:sql:data_types:map::creating-maps}

A map with `VARCHAR` keys and `INTEGER` values. This returns `{key1=10, key2=20, key3=30}`:

```sql
SELECT MAP {'key1': 10, 'key2': 20, 'key3': 30};
```

Alternatively use the `map_from_entries` function. This returns `{key1=10, key2=20, key3=30}`:

```sql
SELECT map_from_entries([('key1', 10), ('key2', 20), ('key3', 30)]);
```

A map can be also created using two lists: keys and values. This returns `{key1=10, key2=20, key3=30}`:

```sql
SELECT MAP(['key1', 'key2', 'key3'], [10, 20, 30]);
```

A map can also use `INTEGER` keys and `NUMERIC` values. This returns `{1=42.001, 5=-32.100}`:

```sql
SELECT MAP {1: 42.001, 5: -32.1};
```

Keys and/or values can also be nested types. This returns `{[a, b]=[1.1, 2.2], [c, d]=[3.3, 4.4]}`:

```sql
SELECT MAP {['a', 'b']: [1.1, 2.2], ['c', 'd']: [3.3, 4.4]};
```

Create a table with a map column that has `INTEGER` keys and `DOUBLE` values:

```sql
CREATE TABLE tbl (col MAP(INTEGER, DOUBLE));
```

#### Retrieving from Maps {#docs:current:sql:data_types:map::retrieving-from-maps}

`MAP` values can be retrieved using the `map_extract_value` function or bracket notation:

```sql
SELECT MAP {'key1': 5, 'key2': 43}['key1'];
```

```text
5
```

If the key has the wrong type, an error is thrown. If it has the correct type but is merely not contained in the map, a `NULL` value is returned:

```sql
SELECT MAP {'key1': 5, 'key2': 43}['key3'];
```

```text
NULL
```

The `map_extract` function (and its synonym `element_at`) can be used to retrieve a value wrapped in a list; it returns an empty list if the key is not contained in the map:

```sql
SELECT map_extract(MAP {'key1': 5, 'key2': 43}, 'key1');
```

```text
[5]
```

```sql
SELECT MAP {'key1': 5, 'key2': 43}['key3'];
```

```text
[]
```

#### Comparison Operators {#docs:current:sql:data_types:map::comparison-operators}

Nested types can be compared using all the [comparison operators](#docs:current:sql:expressions:comparison_operators).
These comparisons can be used in [logical expressions](#docs:current:sql:expressions:logical_operators)
for both `WHERE` and `HAVING` clauses, as well as for creating [Boolean values](#docs:current:sql:data_types:boolean).

The ordering is defined positionally in the same way that words can be ordered in a dictionary.
`NULL` values compare greater than all other values and are considered equal to each other.

At the top level, `NULL` nested values obey standard SQL `NULL` comparison rules:
comparing a `NULL` nested value to a non-`NULL` nested value produces a `NULL` result.
Comparing nested value _members_, however, uses the internal nested value rules for `NULL`s,
and a `NULL` nested value member will compare above a non-`NULL` nested value member.

#### Functions {#docs:current:sql:data_types:map::functions}

See [Map Functions](#docs:current:sql:functions:map).

### NULL Values {#docs:current:sql:data_types:nulls}

`NULL` values are special values that are used to represent missing data in SQL. Columns of any type can contain `NULL` values. Logically, a `NULL` value can be seen as “the value of this field is unknown”.

A `NULL` value can be inserted to any field that does not have the `NOT NULL` qualifier:

```sql
CREATE TABLE integers (i INTEGER);
INSERT INTO integers VALUES (NULL);
```

`NULL` values have special semantics in many parts of the query as well as in many functions:

> Any comparison with a `NULL` value returns `NULL`, including `NULL = NULL`.

You can use `IS NOT DISTINCT FROM` to perform an equality comparison where `NULL` values compare equal to each other. Use `IS (NOT) NULL` to check if a value is `NULL`.

```sql
SELECT NULL = NULL;
```

```text
NULL
```

```sql
SELECT NULL IS NOT DISTINCT FROM NULL;
```

```text
true
```

```sql
SELECT NULL IS NULL;
```

```text
true
```

#### NULL and Functions {#docs:current:sql:data_types:nulls::null-and-functions}

A function that has an input argument as `NULL` **usually** returns `NULL`.

```sql
SELECT cos(NULL);
```

```text
NULL
```

The `coalesce` function is an exception to this: it takes any number of arguments, and returns for each row the first argument that is not `NULL`. If all arguments are `NULL`, `coalesce` also returns `NULL`.

```sql
SELECT coalesce(NULL, NULL, 1);
```

```text
1
```

```sql
SELECT coalesce(10, 20);
```

```text
10
```

```sql
SELECT coalesce(NULL, NULL);
```

```text
NULL
```

The `ifnull` function is a two-argument version of `coalesce`.

```sql
SELECT ifnull(NULL, 'default_string');
```

```text
default_string
```

```sql
SELECT ifnull(1, 'default_string');
```

```text
1
```

#### `NULL` and `AND` / `OR` {#docs:current:sql:data_types:nulls::null-and-and--or}

`NULL` values have special behavior when used with `AND` and `OR`.
For details, see the [Boolean Type documentation](#docs:current:sql:data_types:boolean).

#### `NULL` and `IN` / `NOT IN` {#docs:current:sql:data_types:nulls::null-and-in--not-in}

The behavior of `... IN ⟨something with a NULL⟩`{:.language-sql .highlight} is different from `... IN ⟨something with no NULLs⟩`{:.language-sql .highlight}.
For details, see the [`IN` documentation](#docs:current:sql:expressions:in).

#### `NULL` and Aggregate Functions {#docs:current:sql:data_types:nulls::null-and-aggregate-functions}

`NULL` values are ignored in most aggregate functions.

Aggregate functions that do not ignore `NULL` values include: `first`, `last`, `list` and `array_agg`. To exclude `NULL` values from those aggregate functions, the [`FILTER` clause](#docs:current:sql:query_syntax:filter) can be used.

```sql
CREATE TABLE integers (i INTEGER);
INSERT INTO integers VALUES (1), (10), (NULL);
```

```sql
SELECT min(i) FROM integers;
```

```text
1
```

```sql
SELECT max(i) FROM integers;
```

```text
10
```

### Numeric Types {#docs:current:sql:data_types:numeric}

#### Fixed-Width Integer Types {#docs:current:sql:data_types:numeric::fixed-width-integer-types}

The types `TINYINT`, `SMALLINT`, `INTEGER`, `BIGINT` and `HUGEINT` store whole numbers, that is, numbers without fractional components, of various ranges. Attempts to store values outside of the allowed range will result in an error.
The types `UTINYINT`, `USMALLINT`, `UINTEGER`, `UBIGINT` and `UHUGEINT` store whole unsigned numbers. Attempts to store negative numbers or values outside of the allowed range will result in an error.



| Name        | Aliases                          |     Min |       Max | Size in bytes |
| :---------- | :------------------------------- | ------: | --------: | ------------: |
| `TINYINT`   | `INT1`                           |   - 2^7 |   2^7 - 1 |             1 |
| `SMALLINT`  | `INT2`, `INT16`, `SHORT`         |  - 2^15 |  2^15 - 1 |             2 |
| `INTEGER`   | `INT4`, `INT32`, `INT`, `SIGNED` |  - 2^31 |  2^31 - 1 |             4 |
| `BIGINT`    | `INT8`, `INT64`, `LONG`          |  - 2^63 |  2^63 - 1 |             8 |
| `HUGEINT`   | `INT128`                         | - 2^127 | 2^127 - 1 |            16 |
| `UTINYINT`  | `UINT8`                          |       0 |   2^8 - 1 |             1 |
| `USMALLINT` | `UINT16`                         |       0 |  2^16 - 1 |             2 |
| `UINTEGER`  | `UINT32`                         |       0 |  2^32 - 1 |             4 |
| `UBIGINT`   | `UINT64`                         |       0 |  2^64 - 1 |             8 |
| `UHUGEINT`  | `UINT128`                        |       0 | 2^128 - 1 |            16 |

> `INT8` is a 64-bit integer, and is not the signed equivalent of `UINT8`, an unsigned, 8-bit integer. The type aliases `INT1`, `INT2`, `INT4` and `INT8` for signed integers were inherited from PostgreSQL, where digits in these names indicate their size in *bytes*, whereas the type aliases for their unsigned equivalents, `UINT8`, `UINT16`, `UINT32` and `UINT64`, indicate their size in *bits* following the C/C++ convention.

The type integer is the common choice, as it offers the best balance between range, storage size, and performance. The `SMALLINT` type is generally only used if disk space is at a premium. The `BIGINT` and `HUGEINT` types are designed to be used when the range of the integer type is insufficient.

#### Variable-Length Integers {#docs:current:sql:data_types:numeric::variable-length-integers}

The previously mentioned integer types all have in common that the numbers in the minimum and maximum range all have the same storage size, `UTINYINT` is 1 byte, `SMALLINT` is 2 bytes, etc.
But sometimes you need numbers that are even bigger than what is supported by a `HUGEINT`! In these situations, you can use the `BIGNUM` type, which stores positive numbers in a similar fashion as other integer types, but uses three additional bytes to store the required size and a sign bit. A number with `N` decimal digits requires approximately `0.415 * N + 3` bytes when stored in a `BIGNUM`. 

Unlike variable-length integer implementations in other systems, there are limits to `BIGNUM`: the maximal and minimal representable values are approximately `±4.27e20201778`. Those are numbers with 20,201,779 decimal digits and storing a single such number requires 8 megabytes. 

#### Fixed-Point Decimals {#docs:current:sql:data_types:numeric::fixed-point-decimals}

The data type `DECIMAL(WIDTH, SCALE)` (also available under the alias `NUMERIC(WIDTH, SCALE)`) represents an exact fixed-point decimal value. When creating a value of type `DECIMAL`, the `WIDTH` and `SCALE` can be specified to define which size of decimal values can be held in the field. The `WIDTH` field determines how many digits can be held, and the `scale` determines the number of digits after the decimal point. For example, the type `DECIMAL(3, 2)` can fit the value `1.23`, but cannot fit the value `12.3` or the value `1.234`. The default `WIDTH` and `SCALE` is `DECIMAL(18, 3)`, if none are specified.

Addition, subtraction and multiplication of two fixed-point decimals returns another fixed-point decimal with the required `WIDTH` and `SCALE` to contain the exact result, or throws an error if the required `WIDTH` would exceed the maximal supported `WIDTH`, which is currently 38.

Division of fixed-point decimals does not typically produce numbers with finite decimal expansion. Therefore, DuckDB uses approximate [floating-point arithmetic](#::floating-point-types) for all divisions that involve fixed-point decimals and accordingly returns floating-point data types.

Internally, decimals are represented as integers depending on their specified `WIDTH`.

| Width | Internal | Size (bytes) |
| :---- | :------- | -----------: |
| 1-4   | `INT16`  |            2 |
| 5-9   | `INT32`  |            4 |
| 10-18 | `INT64`  |            8 |
| 19-38 | `INT128` |           16 |

Performance can be impacted by using too large decimals when not required. In particular, decimal values with a width above 19 are slow, as arithmetic involving the `INT128` type is much more expensive than operations involving the `INT32` or `INT64` types. It is therefore recommended to stick with a `WIDTH` of `18` or below, unless there is a good reason for why this is insufficient.

#### Floating-Point Types {#docs:current:sql:data_types:numeric::floating-point-types}

The data types `FLOAT` and `DOUBLE` precision are variable-precision numeric types. In practice, these types are usually implementations of IEEE Standard 754 for Binary Floating-Point Arithmetic (single and double precision, respectively), to the extent that the underlying processor, operating system, and compiler support it.

| Name     | Aliases          | Description                                      |
| :------- | :--------------- | :----------------------------------------------- |
| `FLOAT`  | `FLOAT4`, `REAL` | Single precision floating-point number (4 bytes) |
| `DOUBLE` | `FLOAT8`         | Double precision floating-point number (8 bytes) |

Like for fixed-point data types, conversion from literals or casts from other datatypes to floating-point types stores inputs that cannot be represented exactly as approximations. However, it can be harder to predict what inputs are affected by this. For example, it is not surprising that `1.3::DECIMAL(1, 0) - 0.7::DECIMAL(1, 0) != 0.6::DECIMAL(1, 0)` but it may be surprising that `1.3::FLOAT - 0.7::FLOAT != 0.6::FLOAT`.

Additionally, whereas multiplication, addition and subtraction of fixed-point decimal data types is exact, these operations are only approximate on floating-point binary data types.

For more complex mathematical operations, however, floating-point arithmetic is used internally and more precise results can be obtained if intermediate steps are _not_ cast to fixed point formats of the same width as in- and outputs. For example, `(10::FLOAT / 3::FLOAT)::FLOAT * 3 = 10` whereas `(10::DECIMAL(18, 3) / 3::DECIMAL(18, 3))::DECIMAL(18, 3) * 3 = 9.999`.

In general:

* If you require exact storage of numbers with a known number of decimal digits and require exact additions, subtractions and multiplications (such as for monetary amounts), use the [`DECIMAL` data type](#::fixed-point-decimals) or its `NUMERIC` alias instead.
* If you want to do fast or complicated calculations, the floating-point data types may be more appropriate. However, if you use the results for anything important, you should evaluate your implementation carefully for corner cases (ranges, infinities, underflows, invalid operations) that may be handled differently from what you expect and you should familiarize yourself with common floating-point pitfalls. The article [“What Every Computer Scientist Should Know About Floating-Point Arithmetic” by David Goldberg](https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html) and the [floating point series on Bruce Dawson's blog](https://randomascii.wordpress.com/2017/06/19/sometimes-floating-point-math-is-perfect/) provide excellent starting points.

On most platforms, the `FLOAT` type has a range of at least 1E-37 to 1E+37 with a precision of at least 6 decimal digits. The `DOUBLE` type typically has a range of around 1E-307 to 1E+308 with a precision of at least 15 digits. Positive numbers outside of these ranges (and negative numbers outside the mirrored ranges) may cause errors on some platforms but will usually be converted to zero or infinity, respectively.

In addition to ordinary numeric values, the floating-point types have several special values representing IEEE 754 special values:

* `Infinity`: infinity
* `-Infinity`: negative infinity
* `NaN`: not a number

On machines with the required CPU/FPU support, DuckDB follows the IEEE 754 specification regarding these special values, with two exceptions:

* `NaN` compares equal to `NaN` and greater than any other floating point number.
* Some floating point functions, like `sqrt` / `sin` / `asin` throw errors rather than return `NaN` for values outside their ranges of definition.

To insert these values as literals in a SQL command, you must put quotes around them, you may abbreviate `Infinity` as `Inf`, and you may use any capitalization. For example:

```sql
SELECT
    sqrt(2) > '-inf',
    'nan' > sqrt(2);
```



| `(sqrt(2) > '-inf')` | `('nan' > sqrt(2))` |
| -------------------: | ------------------: |
|                 true |                true |

#### Universally Unique Identifiers (` UUID`s) {#docs:current:sql:data_types:numeric::universally-unique-identifiers--uuids}

DuckDB supports [universally unique identifiers (UUIDs)](https://en.wikipedia.org/wiki/Universally_unique_identifier) through the `UUID` type.
These use 128 bits and are represented internally as `HUGEINT` values.
When printed, they are shown with lowercase hexadecimal characters, separated by dashes as follows: `⟨12345678⟩-⟨1234⟩-⟨1234⟩-⟨1234⟩-⟨1234567890ab⟩`{:.language-sql .highlight} (using 36 characters in total including the dashes). For example, `4ac7a9e9-607c-4c8a-84f3-843f0191e3fd` is a valid UUID.

DuckDB supports generating UUIDv4 and [UUIDv7](https://uuid7.com/) identifiers.
To retrieve the version of a UUID value, use the [`uuid_extract_version` function](#docs:current:sql:functions:utility::uuid_extract_versionuuid).

##### UUIDv4 {#docs:current:sql:data_types:numeric::uuidv4}

To generate a UUIDv4 value, use the
[`uuid()` function](#docs:current:sql:functions:utility::uuid) or its aliases
the [`uuidv4()`](#docs:current:sql:functions:utility::uuidv4) and [`gen_random_uuid()`](#docs:current:sql:functions:utility::gen_random_uuid)
functions.

##### UUIDv7 {#docs:current:sql:data_types:numeric::uuidv7}

To generate a UUIDv7 value, use the [`uuidv7()`](#docs:current:sql:functions:utility::uuidv7) function.
To retrieve the timestamp from a UUIDv7 value, use the [`uuid_extract_timestamp` function](#docs:current:sql:functions:utility::uuid_extract_timestampuuidv7):

```sql
SELECT uuid_extract_timestamp(uuidv7()) AS ts;
```

| ts                        |
| ------------------------- |
| 2025-04-19 15:51:20.07+00 |

#### Functions {#docs:current:sql:data_types:numeric::functions}

See [Numeric Functions and Operators](#docs:current:sql:functions:numeric).

### Struct Data Type {#docs:current:sql:data_types:struct}

Conceptually, a `STRUCT` column contains an ordered list of columns called “entries”. The entries are referenced by name using strings. This document refers to those entry names as keys. Each row in the `STRUCT` column must have the same keys. The names of the struct entries are part of the *schema*. Each row in a `STRUCT` column must have the same layout. The names of the struct entries are case-insensitive.

`STRUCT`s are typically used to nest multiple columns into a single column, and the nested column can be of any type, including other `STRUCT`s and `LIST`s.

`STRUCT`s are similar to PostgreSQL's `ROW` type. The key difference is that DuckDB `STRUCT`s require the same keys in each row of a `STRUCT` column. This allows DuckDB to provide significantly improved performance by fully utilizing its vectorized execution engine, and also enforces type consistency for improved correctness. DuckDB includes a `row` function as a special way to produce a `STRUCT`, but does not have a `ROW` data type. See an example below and the [`STRUCT` functions documentation](#docs:current:sql:functions:struct) for details.

See the [data types overview](#docs:current:sql:data_types:overview) for a comparison between nested data types.

#### Creating Structs {#docs:current:sql:data_types:struct::creating-structs}

Structs can be created using the [`struct_pack(name := expr, ...)`](#docs:current:sql:functions:struct) function, the equivalent array notation `{'name': expr, ...}`, using a row variable, or using the `row` function.

Create a struct using the `struct_pack` function. Note the lack of single quotes around the keys and the use of the `:=` operator:

```sql
SELECT struct_pack(key1 := 'value1', key2 := 42) AS s;
```

Create a struct using the array notation:

```sql
SELECT {'key1': 'value1', 'key2': 42} AS s;
```

Create a struct using a row variable:

```sql
SELECT d AS s FROM (SELECT 'value1' AS key1, 42 AS key2) d;
```

Create a struct of integers:

```sql
SELECT {'x': 1, 'y': 2, 'z': 3} AS s;
```

Create a struct of strings with a `NULL` value:

```sql
SELECT {'yes': 'duck', 'maybe': 'goose', 'huh': NULL, 'no': 'heron'} AS s;
```

Create a struct with a different type for each key:

```sql
SELECT {'key1': 'string', 'key2': 1, 'key3': 12.345} AS s;
```

Create a struct of structs with `NULL` values:

```sql
SELECT {
        'birds': {'yes': 'duck', 'maybe': 'goose', 'huh': NULL, 'no': 'heron'},
        'aliens': NULL,
        'amphibians': {'yes': 'frog', 'maybe': 'salamander', 'huh': 'dragon', 'no': 'toad'}
    } AS s;
```

#### Adding or Updating Fields of Structs {#docs:current:sql:data_types:struct::adding-or-updating-fields-of-structs}

To add new fields or update existing ones, you can use `struct_update`:

```sql
SELECT struct_update({'a': 1, 'b': 2}, b := 3, c := 4) AS s;
```

Alternatively, `struct_insert` also allows adding new fields but not updating existing ones.

#### Retrieving from Structs {#docs:current:sql:data_types:struct::retrieving-from-structs}

Retrieving a value from a struct can be accomplished using dot notation, bracket notation, or through [struct functions](#docs:current:sql:functions:struct) like `struct_extract`.

Use dot notation to retrieve the value at a key's location. In the following query, the subquery generates a struct column `a`, which we then query with `a.x`.

```sql
SELECT a.x FROM (SELECT {'x': 1, 'y': 2, 'z': 3} AS a);
```

If a key contains a space, simply wrap it in double quotes (` "`).

```sql
SELECT a."x space" FROM (SELECT {'x space': 1, 'y': 2, 'z': 3} AS a);
```

Bracket notation may also be used. Note that this uses single quotes (` '`) since the goal is to specify a certain string key and only constant expressions may be used inside the brackets (no expressions):

```sql
SELECT a['x space'] FROM (SELECT {'x space': 1, 'y': 2, 'z': 3} AS a);
```

The `struct_extract` function is also equivalent. This returns 1:

```sql
SELECT struct_extract({'x space': 1, 'y': 2, 'z': 3}, 'x space');
```

##### `unnest` / `STRUCT.*` {#docs:current:sql:data_types:struct::unnest--struct}

Rather than retrieving a single key from a struct, the `unnest` special function can be used to retrieve all keys from a struct as separate columns.
This is particularly useful when a prior operation creates a struct of unknown shape, or if a query must handle any potential struct keys:

```sql
SELECT unnest(a)
FROM (SELECT {'x': 1, 'y': 2, 'z': 3} AS a);
```

| x | y | z |
|--:|--:|--:|
| 1 | 2 | 3 |

The same can be achieved with the star notation (` *`), which additionally allows [modifications of the returned columns](#docs:current:sql:expressions:star):

```sql
SELECT a.* EXCLUDE ('y')
FROM (SELECT {'x': 1, 'y': 2, 'z': 3} AS a);
```

| x | z |
|--:|--:|
| 1 | 3 |

> **Warning.** The star notation is currently limited to top-level struct columns and non-aggregate expressions.

#### Dot Notation Order of Operations {#docs:current:sql:data_types:struct::dot-notation-order-of-operations}

Referring to structs with dot notation can be ambiguous with referring to schemas and tables. In general, DuckDB looks for columns first, then for struct keys within columns. DuckDB resolves references in these orders, using the first match to occur:

##### No Dots {#docs:current:sql:data_types:struct::no-dots}

```sql
SELECT part1
FROM tbl;
```

1. `part1` is a column

##### One Dot {#docs:current:sql:data_types:struct::one-dot}

```sql
SELECT part1.part2
FROM tbl;
```

1. `part1` is a table, `part2` is a column
2. `part1` is a column, `part2` is a property of that column

##### Two (or More) Dots {#docs:current:sql:data_types:struct::two-or-more-dots}

```sql
SELECT part1.part2.part3
FROM tbl;
```

1. `part1` is a schema, `part2` is a table, `part3` is a column
2. `part1` is a table, `part2` is a column, `part3` is a property of that column
3. `part1` is a column, `part2` is a property of that column, `part3` is a property of that column

Any extra parts (e.g., `.part4.part5`, etc.) are always treated as properties

#### Creating Structs with the `row` Function {#docs:current:sql:data_types:struct::creating-structs-with-the-row-function}

The `row` function can be used to automatically convert multiple columns to a single struct column.
When using `row` the keys will be empty strings allowing for easy insertion into a table with a struct column.
Columns, however, cannot be initialized with the `row` function, and must be explicitly named.
For example, inserting values into a struct column using the `row` function:

```sql
CREATE TABLE t1 (s STRUCT(v VARCHAR, i INTEGER));
INSERT INTO t1 VALUES (row('a', 42));
SELECT * FROM t1;
```

The table will contain a single entry:

```sql
{'v': a, 'i': 42}
```

The following produces the same result as above:

```sql
CREATE TABLE t1 AS (
    SELECT row('a', 42)::STRUCT(v VARCHAR, i INTEGER)
);
```

Initializing a struct column with the `row` function will fail:

```sql
CREATE TABLE t2 AS SELECT row('a');
```

```console
Invalid Input Error:
A table cannot be created from an unnamed struct
```

When casting between structs, the names of at least one field have to match. Therefore, the following query will fail:

```sql
SELECT a::STRUCT(y INTEGER) AS b
FROM
    (SELECT {'x': 42} AS a);
```

```console
Binder Error:
STRUCT to STRUCT cast must have at least one matching member
```

A workaround for this is to use [`struct_pack`](#::creating-structs) instead:

```sql
SELECT struct_pack(y := a.x) AS b
FROM
    (SELECT {'x': 42} AS a);
```

The `row` function can be used to return unnamed structs. For example:

```sql
SELECT row(x, x + 1, y) FROM (SELECT 1 AS x, 'a' AS y) AS s;
```

This produces `(1, 2, a)`.

If using multiple expressions when creating a struct, the `row` function is optional. The following query returns the same result as the previous one:

```sql
SELECT (x, x + 1, y) AS s FROM (SELECT 1 AS x, 'a' AS y);
```

#### Comparison and Ordering {#docs:current:sql:data_types:struct::comparison-and-ordering}

The `STRUCT` type can be compared using all the [comparison operators](#docs:current:sql:expressions:comparison_operators).
These comparisons can be used in [logical expressions](#docs:current:sql:expressions:logical_operators)
such as `WHERE` and `HAVING` clauses, and return [`BOOLEAN` values](#docs:current:sql:data_types:boolean).

Comparisons are done in lexicographical order, with individual entries being compared as usual except that `NULL` values are treated as larger than all other values.

Specifically:

* If all values of `s1` and `s2` compare equal, then `s1` and `s2` compare equal.
* else, if `s1.value[i] < s2.value[i] OR s2.value[i] is NULL` for the first index `i` where `s1.value[i] != s2.value[i]`, then `s1` is less than `s2`, and vice versa.

Structs of different types are implicitly cast to a struct type with the union of the involved keys, following the rules for [combination casting](#docs:current:sql:data_types:typecasting::structs).

The following queries return `true`:

```sql
SELECT {'k1': 0, 'k2': 0} < {'k1': 1, 'k2': 0};
```

```sql
SELECT {'k1': 'hello'} < {'k1': 'world'};
```

```sql
SELECT {'k1': 0, 'k2': 0} < {'k1': 0, 'k2': NULL};
```

```sql
SELECT {'k1': 0} < {'k2': 0};
```

```sql
SELECT  {'k1': 0, 'k2': 0} < {'k2': 0, 'k3': 0};
```

```sql
SELECT {'k1': 1, 'k2': 0} > {'k3': 0, 'k1': 0};
```

The following queries return `false`:

```sql
SELECT {'k1': 1, 'k2': 0} < {'k1': 0, 'k2': 1};
```

```sql
SELECT {'k1': [0]} < {'k1': [0, 0]};
```

```sql
SELECT {'k1': 1} > {'k2': 0};
```

```sql
SELECT {'k1': 0, 'k2': 0} < {'k3': 0, 'k1': 1};
```

```sql
SELECT  {'k1': 1, 'k2': 0} > {'k2': 0, 'k3': 0};
```

#### Updating the Schema {#docs:current:sql:data_types:struct::updating-the-schema}

Starting with DuckDB v1.3.0, it's possible to update the sub-schema of structs
using the [`ALTER TABLE` clause](#docs:current:sql:statements:alter_table).

To follow the examples, initialize the `test` table as follows:

```sql
CREATE TABLE test (s STRUCT(i INTEGER, j INTEGER));
INSERT INTO test VALUES (ROW(1, 1)), (ROW(2, 2));
```

##### Adding a Field {#docs:current:sql:data_types:struct::adding-a-field}

Add field `k INTEGER` to struct `s` in table `test`:

```sql
ALTER TABLE test ADD COLUMN s.k INTEGER;
FROM test;
```

```text
┌─────────────────────────────────────────┐
│                    s                    │
│ struct(i integer, j integer, k integer) │
├─────────────────────────────────────────┤
│ {'i': 1, 'j': 1, 'k': NULL}             │
│ {'i': 2, 'j': 2, 'k': NULL}             │
└─────────────────────────────────────────┘
```

##### Dropping a Field {#docs:current:sql:data_types:struct::dropping-a-field}

Drop field `i` from struct `s` in table `test`:

```sql
ALTER TABLE test DROP COLUMN s.i;
FROM test;
```

```text
┌──────────────────────────────┐
│              s               │
│ struct(j integer, k integer) │
├──────────────────────────────┤
│ {'j': 1, 'k': NULL}          │
│ {'j': 2, 'k': NULL}          │
└──────────────────────────────┘
```

##### Renaming a Field {#docs:current:sql:data_types:struct::renaming-a-field}

Renaming field `j` of struct `s` to `v1` in table `test`:

```sql
ALTER TABLE test RENAME s.j TO v1;
FROM test;
```

```text
┌───────────────────────────────┐
│               s               │
│ struct(v1 integer, k integer) │
├───────────────────────────────┤
│ {'v1': 1, 'k': NULL}          │
│ {'v1': 2, 'k': NULL}          │
└───────────────────────────────┘
```

#### Functions {#docs:current:sql:data_types:struct::functions}

See [Struct Functions](#docs:current:sql:functions:struct).

### Text Types {#docs:current:sql:data_types:text}

In DuckDB, strings can be stored in the `VARCHAR` field.
The field allows storage of Unicode characters. Internally, the data is encoded as UTF-8.

| Name | Aliases | Description |
|:---|:---|:---|
| `VARCHAR` | `CHAR`, `BPCHAR`, `STRING`, `TEXT` | Variable-length character string |
| `VARCHAR(n)` | `CHAR(n)`, `BPCHAR(n)`, `STRING(n)`, `TEXT(n)` | Variable-length character string. The maximum length `n` has no effect and is only provided for compatibility |

#### Specifying a Length Limit {#docs:current:sql:data_types:text::specifying-a-length-limit}

Specifying the length for the `VARCHAR`, `STRING` and `TEXT` types is not required and has no effect on the system. Specifying the length will not improve performance or reduce storage space of the strings in the database. These variants are supported for compatibility with other systems that do require a length to be specified for strings.

If you wish to restrict the number of characters in a `VARCHAR` column for data integrity reasons the `CHECK` constraint should be used, for example:

```sql
CREATE TABLE strings (
    val VARCHAR CHECK (length(val) <= 10) -- val has a maximum length of 10
);
```

The `VARCHAR` field allows storage of Unicode characters. Internally, the data is encoded as UTF-8.

#### Specifying a Compression Type {#docs:current:sql:data_types:text::specifying-a-compression-type}

You can specify a compression type for a string with the `USING COMPRESSION` clause.
For example, to apply zstd compression, run:

```sql
CREATE TABLE tbl (s VARCHAR USING COMPRESSION zstd);
```

#### Text Type Values {#docs:current:sql:data_types:text::text-type-values}

Values of the text type are character strings, also known as string values or simply strings. At runtime, string values are constructed in one of the following ways:

* referencing columns whose declared or implied type is the text data type
* [string literals](#docs:current:sql:data_types:literal_types::string-literals)
* [casting](#docs:current:sql:expressions:cast::explicit-casting) expressions to a text type
* applying a [string operator](#docs:current:sql:functions:text::text-functions-and-operators), or invoking a function that returns a text type value

#### Strings with Special Characters {#docs:current:sql:data_types:text::strings-with-special-characters}

To use special characters in a string, use [escape string literals](#docs:current:sql:data_types:literal_types::escape-string-literals) or [dollar-quoted string literals](#docs:current:sql:data_types:literal_types::dollar-quoted-string-literals). Alternatively, you can use concatenation and the [`chr` character function](#docs:current:sql:functions:text):

```sql
SELECT 'Hello' || chr(10) || 'world' AS msg;
```

```text
┌──────────────┐
│     msg      │
│   varchar    │
├──────────────┤
│ Hello\nworld │
└──────────────┘
```

#### Functions {#docs:current:sql:data_types:text::functions}

See [Text Functions](#docs:current:sql:functions:text) and [Pattern Matching](#docs:current:sql:functions:pattern_matching).

### Time Types {#docs:current:sql:data_types:time}

The `TIME` and `TIMETZ` types specify the hour, minute, second, microsecond of a day.

| Name      | Aliases                  | Description                        |
| :-------- | :----------------------- | :--------------------------------- |
| `TIME`    | `TIME WITHOUT TIME ZONE` | Time of day                        |
| `TIMETZ`  | `TIME WITH TIME ZONE`    | Time of day, with time zone offset |
| `TIME_NS` |                          | Time of day, nanosecond precision  |

Instances can be created using the type names as a keyword, where the data must be formatted according to the ISO 8601 format (` hh:mm:ss[.zzzzzz[zzz]][+-TT[:tt]]`).

```sql
SELECT TIME '1992-09-20 11:30:00.123456';
```

```text
11:30:00.123456
```

```sql
SELECT TIMETZ '1992-09-20 11:30:00.123456';
```

```text
11:30:00.123456+00
```

```sql
SELECT TIMETZ '1992-09-20 11:30:00.123456-02:00';
```

```text
13:30:00.123456+00
```

```sql
SELECT TIMETZ '1992-09-20 11:30:00.123456+05:30';
```

```text
06:00:00.123456+00
```

```sql
SELECT '15:30:00.123456789'::TIME_NS;
```

```text
15:30:00.123456789
```

`TIME_NS` values can also be read from Parquet when the type is [`TIME` with unit `NANOS`](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#time).

> **Warning.** The `TIME` type should only be used in rare cases, where the date part of the timestamp can be disregarded.
> Most applications should use the [`TIMESTAMP` types](#docs:current:sql:data_types:timestamp) to represent their timestamps.

### Timestamp Types {#docs:current:sql:data_types:timestamp}

Timestamps represent points in time. As such, they combine [`DATE`](#docs:current:sql:data_types:date) and [`TIME`](#docs:current:sql:data_types:time) information.
They can be created using the type name followed by a string formatted according to the ISO 8601 format, `YYYY-MM-DD hh:mm:ss[.zzzzzzzzz][+-TT[:tt]]`, which is also the format we use in this documentation. Decimal places beyond the supported precision are ignored.

#### Timestamp Types {#docs:current:sql:data_types:timestamp::timestamp-types}

| Name | Aliases | Description |
|:---|:---|:---|
| `TIMESTAMP_NS` |                                           | Naive timestamp with nanosecond precision              |
| `TIMESTAMP`    | `DATETIME`, `TIMESTAMP WITHOUT TIME ZONE` | Naive timestamp with microsecond precision             |
| `TIMESTAMP_MS` |                                           | Naive timestamp with millisecond precision             |
| `TIMESTAMP_S`  |                                           | Naive timestamp with second precision                  |
| `TIMESTAMPTZ`  | `TIMESTAMP WITH TIME ZONE`                | Time zone aware timestamp with microsecond precision   |

> **Warning.** Since there is not currently a `TIMESTAMP_NS WITH TIME ZONE` data type, external columns with nanosecond precision and `WITH TIME ZONE` semantics, e.g., [Parquet timestamp columns with `isAdjustedToUTC=true`](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#instant-semantics-timestamps-normalized-to-utc), are converted to `TIMESTAMP WITH TIME ZONE` and thus lose precision when read using DuckDB.

```sql
SELECT TIMESTAMP_NS '1992-09-20 11:30:00.123456789';
```

```text
1992-09-20 11:30:00.123456789
```

```sql
SELECT TIMESTAMP '1992-09-20 11:30:00.123456789';
```

```text
1992-09-20 11:30:00.123456
```

```sql
SELECT TIMESTAMP_MS '1992-09-20 11:30:00.123456789';
```

```text
1992-09-20 11:30:00.123
```

```sql
SELECT TIMESTAMP_S '1992-09-20 11:30:00.123456789';
```

```text
1992-09-20 11:30:00
```

```sql
SELECT TIMESTAMPTZ '1992-09-20 11:30:00.123456789';
```

```text
1992-09-20 11:30:00.123456+00
```

```sql
SELECT TIMESTAMPTZ '1992-09-20 12:30:00.123456789+01:00';
```

```text
1992-09-20 11:30:00.123456+00
```

DuckDB distinguishes timestamps `WITHOUT TIME ZONE` and `WITH TIME ZONE` (of which the only current representative is `TIMESTAMP WITH TIME ZONE`).

Despite the name, a `TIMESTAMP WITH TIME ZONE` does not store time zone information. Instead, it only stores the `INT64` number of non-leap microseconds since the Unix epoch `1970-01-01 00:00:00+00`, and thus unambiguously identifies a point in absolute time, or [*instant*](#docs:current:sql:data_types:timestamp::instants). The reason for the labels *time zone aware* and `WITH TIME ZONE` is that timestamp arithmetic, [*binning*](#docs:current:sql:data_types:timestamp::temporal-binning), and string formatting for this type are performed in a [configured time zone](#docs:current:sql:data_types:timestamp::time-zone-support), which defaults to the system time zone and is just `UTC+00:00` in the examples above.

The corresponding `TIMESTAMP WITHOUT TIME ZONE` stores the same `INT64`, but arithmetic, binning and string formatting follow the straightforward rules of Coordinated Universal Time (UTC) without offsets or time zones. Accordingly, `TIMESTAMP`s could be interpreted as UTC timestamps, but more commonly they are used to represent *local* observations of time recorded in an unspecified time zone, and operations on these types can be interpreted as simply manipulating tuple fields following nominal temporal logic.
It is a common data cleaning problem to disambiguate such observations, which may also be stored in raw strings without time zone specification or UTC offsets, into unambiguous `TIMESTAMP WITH TIME ZONE` instants. One possible solution to this is to append UTC offsets to strings, followed by an explicit cast to `TIMESTAMP WITH TIME ZONE`. Alternatively, a `TIMESTAMP WITHOUT TIME ZONE` may be created first and then be combined with a time zone specification to obtain a time zone aware `TIMESTAMP WITH TIME ZONE`.

#### Conversion between Strings and Naïve / Time Zone-Aware Timestamps {#docs:current:sql:data_types:timestamp::conversion-between-strings-and-nave--time-zone-aware-timestamps}

The conversion between strings *without* UTC offsets or IANA time zone names and `WITHOUT TIME ZONE` types is unambiguous and straightforward.
The conversion between strings *with* UTC offsets or time zone names and `WITH TIME ZONE` types is also unambiguous, but requires the `ICU` extension to handle time zone names.

When strings *without* UTC offsets or time zone names are converted to a `WITH TIME ZONE` type, the string is interpreted in the configured time zone. 
When strings with UTC offsets are passed to a `WITHOUT TIME ZONE` type, the offsets or time zone specifications are ignored.
When strings with time zone names other than `UTC` are passed to a `WITHOUT TIME ZONE` type, an error is thrown. 

Finally, when `WITH TIME ZONE` and `WITHOUT TIME ZONE` types are converted to each other via explicit or implicit casts, the translation uses the configured time zone. To use an alternative time zone, the `timezone` function provided by the `ICU` extension may be used:

```sql
SELECT
    timezone('America/Denver', TIMESTAMP '2001-02-16 20:38:40') AS aware1,
    timezone('America/Denver', TIMESTAMPTZ '2001-02-16 04:38:40') AS naive1,
    timezone('UTC', TIMESTAMP '2001-02-16 20:38:40+00:00') AS aware2,
    timezone('UTC', TIMESTAMPTZ '2001-02-16 04:38:40 Europe/Berlin') AS naive2;
```



|         aware1         |       naive1        |         aware2         |       naive2        |
|------------------------|---------------------|------------------------|---------------------|
| 2001-02-17 04:38:40+01 | 2001-02-15 20:38:40 | 2001-02-16 21:38:40+01 | 2001-02-16 03:38:40 |

Note that `TIMESTAMP`s are displayed without time zone specification in the results, following ISO 8601 rules for local times, while time-zone aware `TIMESTAMPTZ`s are displayed with the UTC offset of the configured time zone, which is `'Europe/Berlin'` in the example. The UTC offsets of `'America/Denver'` and `'Europe/Berlin'` at all involved instants are `-07:00` and `+01:00`, respectively.

#### Special Values {#docs:current:sql:data_types:timestamp::special-values}

Three special strings can be used to create timestamps:

| Input string | Description                                      |
|:-------------|:-------------------------------------------------|
| `epoch`      | 1970-01-01 00:00:00[+00] (Unix system time zero) |
| `infinity`   | Later than all other timestamps                  |
| `-infinity`  | Earlier than all other timestamps                |

The values `infinity` and `-infinity` are special cased and are displayed unchanged, whereas the value `epoch` is simply a notational shorthand that is converted to the corresponding timestamp value when read.

```sql
SELECT '-infinity'::TIMESTAMP, 'epoch'::TIMESTAMP, 'infinity'::TIMESTAMP;
```

| Negative  | Epoch               | Positive |
|:----------|:--------------------|:---------|
| -infinity | 1970-01-01 00:00:00 | infinity |

#### Functions {#docs:current:sql:data_types:timestamp::functions}

See [Timestamp Functions](#docs:current:sql:functions:timestamp).

#### Time Zones {#docs:current:sql:data_types:timestamp::time-zones}

To understand time zones and the `WITH TIME ZONE` types, it helps to start with two concepts: *instants* and *temporal binning*.

##### Instants {#docs:current:sql:data_types:timestamp::instants}

An instant is a point in absolute time, usually given as a count of some time increment from a fixed point in time (called the *epoch*). This is similar to how positions on the earth's surface are given using latitude and longitude relative to the equator and the Greenwich Meridian. In DuckDB, the fixed point is the Unix epoch `1970-01-01 00:00:00+00:00`, and the increment is in seconds, milliseconds, microseconds, or nanoseconds, depending on the specific data type.

##### Temporal Binning {#docs:current:sql:data_types:timestamp::temporal-binning}

Binning is a common practice with continuous data: A range of possible values is broken up into contiguous subsets and the binning operation maps actual values to the *bin* they fall into. *Temporal binning* is simply applying this practice to instants; for example, by binning instants into years, months and days.

![](../images/blog/timezones/tz-instants-light.svg)



Temporal binning rules are complex, and generally come in two sets: *time zones* and *calendars*.
For most tasks, the calendar will just be the widely used Gregorian calendar,
but time zones apply locale-specific rules and can vary widely.
For example, here is what binning for the `'America/Los_Angeles'` time zone looks like near the epoch:

![](../images/blog/timezones/tz-timezone-light.svg)



The most common temporal binning problem occurs when daylight saving time changes.
The example below contains a daylight saving time change where the "hour" bin is two hours long.
To distinguish the two hours, another range of bins containing the offset from UTC is needed:

![](../images/blog/timezones/tz-daylight-light.svg)



##### Time Zone Support {#docs:current:sql:data_types:timestamp::time-zone-support}

The `TIMESTAMPTZ` type can be binned into calendar and clock bins using a suitable extension.
The built-in [ICU extension](#docs:current:core_extensions:icu) implements all the binning and arithmetic functions using the
[International Components for Unicode](https://icu.unicode.org) time zone and calendar functions.

To set the time zone to use, first load the ICU extension. The ICU extension comes pre-bundled with several DuckDB clients (including Python, R, JDBC and ODBC), so this step can be skipped in those cases. In other cases you might first need to install and load the ICU extension.

```sql
INSTALL icu;
LOAD icu;
```

Next, use the `SET TimeZone` command:

```sql
SET TimeZone = 'America/Los_Angeles';
```

Time binning operations for `TIMESTAMPTZ` will then be implemented using the given time zone.

A list of available time zones can be pulled from the `pg_timezone_names()` table function:

```sql
SELECT
    name,
    abbrev,
    utc_offset
FROM pg_timezone_names()
ORDER BY
    name;
```

You can also find a reference table of [available time zones](#docs:current:sql:data_types:timezones).

#### Calendar Support {#docs:current:sql:data_types:timestamp::calendar-support}

The [ICU extension](#docs:current:core_extensions:icu) also supports non-Gregorian calendars using the `SET Calendar` command.
Note that the `INSTALL` and `LOAD` steps are only required if the DuckDB client does not bundle the ICU extension.

```sql
INSTALL icu;
LOAD icu;
SET Calendar = 'japanese';
```

Time binning operations for `TIMESTAMPTZ` will then be implemented using the given calendar.
In this example, the `era` part will now report the Japanese imperial era number.

A list of available calendars can be pulled from the `icu_calendar_names()` table function:

```sql
SELECT name
FROM icu_calendar_names()
ORDER BY 1;
```

#### Settings {#docs:current:sql:data_types:timestamp::settings}

The current value of the `TimeZone` and `Calendar` settings are determined by ICU when it starts up.
They can be queried from in the `duckdb_settings()` table function:

```sql
SELECT *
FROM duckdb_settings()
WHERE name = 'TimeZone';
```

|   name   |      value       |      description      | input_type |
|----------|------------------|-----------------------|------------|
| TimeZone | Europe/Amsterdam | The current time zone | VARCHAR    |

```sql
SELECT *
FROM duckdb_settings()
WHERE name = 'Calendar';
```

|   name   |   value   |     description      | input_type |
|----------|-----------|----------------------|------------|
| Calendar | gregorian | The current calendar | VARCHAR    |

> If you find that your binning operations are not behaving as you expect, check the `TimeZone` and `Calendar` values and adjust them if needed.

### Time Zone Reference List {#docs:current:sql:data_types:timezones}

An up-to-date version of this list can be pulled from the `pg_timezone_names()` table function:

```sql
SELECT name, abbrev
FROM pg_timezone_names()
ORDER BY name;
```



|               name               |              abbrev              |
|----------------------------------|----------------------------------|
| ACT                              | ACT                              |
| AET                              | AET                              |
| AGT                              | AGT                              |
| ART                              | ART                              |
| AST                              | AST                              |
| Africa/Abidjan                   | Iceland                          |
| Africa/Accra                     | Iceland                          |
| Africa/Addis_Ababa               | EAT                              |
| Africa/Algiers                   | Africa/Algiers                   |
| Africa/Asmara                    | EAT                              |
| Africa/Asmera                    | EAT                              |
| Africa/Bamako                    | Iceland                          |
| Africa/Bangui                    | Africa/Bangui                    |
| Africa/Banjul                    | Iceland                          |
| Africa/Bissau                    | Africa/Bissau                    |
| Africa/Blantyre                  | CAT                              |
| Africa/Brazzaville               | Africa/Brazzaville               |
| Africa/Bujumbura                 | CAT                              |
| Africa/Cairo                     | ART                              |
| Africa/Casablanca                | Africa/Casablanca                |
| Africa/Ceuta                     | Africa/Ceuta                     |
| Africa/Conakry                   | Iceland                          |
| Africa/Dakar                     | Iceland                          |
| Africa/Dar_es_Salaam             | EAT                              |
| Africa/Djibouti                  | EAT                              |
| Africa/Douala                    | Africa/Douala                    |
| Africa/El_Aaiun                  | Africa/El_Aaiun                  |
| Africa/Freetown                  | Iceland                          |
| Africa/Gaborone                  | CAT                              |
| Africa/Harare                    | CAT                              |
| Africa/Johannesburg              | Africa/Johannesburg              |
| Africa/Juba                      | Africa/Juba                      |
| Africa/Kampala                   | EAT                              |
| Africa/Khartoum                  | Africa/Khartoum                  |
| Africa/Kigali                    | CAT                              |
| Africa/Kinshasa                  | Africa/Kinshasa                  |
| Africa/Lagos                     | Africa/Lagos                     |
| Africa/Libreville                | Africa/Libreville                |
| Africa/Lome                      | Iceland                          |
| Africa/Luanda                    | Africa/Luanda                    |
| Africa/Lubumbashi                | CAT                              |
| Africa/Lusaka                    | CAT                              |
| Africa/Malabo                    | Africa/Malabo                    |
| Africa/Maputo                    | CAT                              |
| Africa/Maseru                    | Africa/Maseru                    |
| Africa/Mbabane                   | Africa/Mbabane                   |
| Africa/Mogadishu                 | EAT                              |
| Africa/Monrovia                  | Africa/Monrovia                  |
| Africa/Nairobi                   | EAT                              |
| Africa/Ndjamena                  | Africa/Ndjamena                  |
| Africa/Niamey                    | Africa/Niamey                    |
| Africa/Nouakchott                | Iceland                          |
| Africa/Ouagadougou               | Iceland                          |
| Africa/Porto-Novo                | Africa/Porto-Novo                |
| Africa/Sao_Tome                  | Africa/Sao_Tome                  |
| Africa/Timbuktu                  | Iceland                          |
| Africa/Tripoli                   | Libya                            |
| Africa/Tunis                     | Africa/Tunis                     |
| Africa/Windhoek                  | Africa/Windhoek                  |
| America/Adak                     | America/Adak                     |
| America/Anchorage                | AST                              |
| America/Anguilla                 | PRT                              |
| America/Antigua                  | PRT                              |
| America/Araguaina                | America/Araguaina                |
| America/Argentina/Buenos_Aires   | AGT                              |
| America/Argentina/Catamarca      | America/Argentina/Catamarca      |
| America/Argentina/ComodRivadavia | America/Argentina/ComodRivadavia |
| America/Argentina/Cordoba        | America/Argentina/Cordoba        |
| America/Argentina/Jujuy          | America/Argentina/Jujuy          |
| America/Argentina/La_Rioja       | America/Argentina/La_Rioja       |
| America/Argentina/Mendoza        | America/Argentina/Mendoza        |
| America/Argentina/Rio_Gallegos   | America/Argentina/Rio_Gallegos   |
| America/Argentina/Salta          | America/Argentina/Salta          |
| America/Argentina/San_Juan       | America/Argentina/San_Juan       |
| America/Argentina/San_Luis       | America/Argentina/San_Luis       |
| America/Argentina/Tucuman        | America/Argentina/Tucuman        |
| America/Argentina/Ushuaia        | America/Argentina/Ushuaia        |
| America/Aruba                    | PRT                              |
| America/Asuncion                 | America/Asuncion                 |
| America/Atikokan                 | EST                              |
| America/Atka                     | America/Atka                     |
| America/Bahia                    | America/Bahia                    |
| America/Bahia_Banderas           | America/Bahia_Banderas           |
| America/Barbados                 | America/Barbados                 |
| America/Belem                    | America/Belem                    |
| America/Belize                   | America/Belize                   |
| America/Blanc-Sablon             | PRT                              |
| America/Boa_Vista                | America/Boa_Vista                |
| America/Bogota                   | America/Bogota                   |
| America/Boise                    | America/Boise                    |
| America/Buenos_Aires             | AGT                              |
| America/Cambridge_Bay            | America/Cambridge_Bay            |
| America/Campo_Grande             | America/Campo_Grande             |
| America/Cancun                   | America/Cancun                   |
| America/Caracas                  | America/Caracas                  |
| America/Catamarca                | America/Catamarca                |
| America/Cayenne                  | America/Cayenne                  |
| America/Cayman                   | EST                              |
| America/Chicago                  | CST                              |
| America/Chihuahua                | America/Chihuahua                |
| America/Ciudad_Juarez            | America/Ciudad_Juarez            |
| America/Coral_Harbour            | EST                              |
| America/Cordoba                  | America/Cordoba                  |
| America/Costa_Rica               | America/Costa_Rica               |
| America/Creston                  | MST                              |
| America/Cuiaba                   | America/Cuiaba                   |
| America/Curacao                  | PRT                              |
| America/Danmarkshavn             | America/Danmarkshavn             |
| America/Dawson                   | America/Dawson                   |
| America/Dawson_Creek             | America/Dawson_Creek             |
| America/Denver                   | Navajo                           |
| America/Detroit                  | America/Detroit                  |
| America/Dominica                 | PRT                              |
| America/Edmonton                 | America/Edmonton                 |
| America/Eirunepe                 | America/Eirunepe                 |
| America/El_Salvador              | America/El_Salvador              |
| America/Ensenada                 | America/Ensenada                 |
| America/Fort_Nelson              | America/Fort_Nelson              |
| America/Fort_Wayne               | IET                              |
| America/Fortaleza                | America/Fortaleza                |
| America/Glace_Bay                | America/Glace_Bay                |
| America/Godthab                  | America/Godthab                  |
| America/Goose_Bay                | America/Goose_Bay                |
| America/Grand_Turk               | America/Grand_Turk               |
| America/Grenada                  | PRT                              |
| America/Guadeloupe               | PRT                              |
| America/Guatemala                | America/Guatemala                |
| America/Guayaquil                | America/Guayaquil                |
| America/Guyana                   | America/Guyana                   |
| America/Halifax                  | America/Halifax                  |
| America/Havana                   | Cuba                             |
| America/Hermosillo               | America/Hermosillo               |
| America/Indiana/Indianapolis     | IET                              |
| America/Indiana/Knox             | America/Indiana/Knox             |
| America/Indiana/Marengo          | America/Indiana/Marengo          |
| America/Indiana/Petersburg       | America/Indiana/Petersburg       |
| America/Indiana/Tell_City        | America/Indiana/Tell_City        |
| America/Indiana/Vevay            | America/Indiana/Vevay            |
| America/Indiana/Vincennes        | America/Indiana/Vincennes        |
| America/Indiana/Winamac          | America/Indiana/Winamac          |
| America/Indianapolis             | IET                              |
| America/Inuvik                   | America/Inuvik                   |
| America/Iqaluit                  | America/Iqaluit                  |
| America/Jamaica                  | Jamaica                          |
| America/Jujuy                    | America/Jujuy                    |
| America/Juneau                   | America/Juneau                   |
| America/Kentucky/Louisville      | America/Kentucky/Louisville      |
| America/Kentucky/Monticello      | America/Kentucky/Monticello      |
| America/Knox_IN                  | America/Knox_IN                  |
| America/Kralendijk               | PRT                              |
| America/La_Paz                   | America/La_Paz                   |
| America/Lima                     | America/Lima                     |
| America/Los_Angeles              | PST                              |
| America/Louisville               | America/Louisville               |
| America/Lower_Princes            | PRT                              |
| America/Maceio                   | America/Maceio                   |
| America/Managua                  | America/Managua                  |
| America/Manaus                   | America/Manaus                   |
| America/Marigot                  | PRT                              |
| America/Martinique               | America/Martinique               |
| America/Matamoros                | America/Matamoros                |
| America/Mazatlan                 | America/Mazatlan                 |
| America/Mendoza                  | America/Mendoza                  |
| America/Menominee                | America/Menominee                |
| America/Merida                   | America/Merida                   |
| America/Metlakatla               | America/Metlakatla               |
| America/Mexico_City              | America/Mexico_City              |
| America/Miquelon                 | America/Miquelon                 |
| America/Moncton                  | America/Moncton                  |
| America/Monterrey                | America/Monterrey                |
| America/Montevideo               | America/Montevideo               |
| America/Montreal                 | America/Montreal                 |
| America/Montserrat               | PRT                              |
| America/Nassau                   | America/Nassau                   |
| America/New_York                 | EST5EDT                          |
| America/Nipigon                  | America/Nipigon                  |
| America/Nome                     | America/Nome                     |
| America/Noronha                  | America/Noronha                  |
| America/North_Dakota/Beulah      | America/North_Dakota/Beulah      |
| America/North_Dakota/Center      | America/North_Dakota/Center      |
| America/North_Dakota/New_Salem   | America/North_Dakota/New_Salem   |
| America/Nuuk                     | America/Nuuk                     |
| America/Ojinaga                  | America/Ojinaga                  |
| America/Panama                   | EST                              |
| America/Pangnirtung              | America/Pangnirtung              |
| America/Paramaribo               | America/Paramaribo               |
| America/Phoenix                  | MST                              |
| America/Port-au-Prince           | America/Port-au-Prince           |
| America/Port_of_Spain            | PRT                              |
| America/Porto_Acre               | America/Porto_Acre               |
| America/Porto_Velho              | America/Porto_Velho              |
| America/Puerto_Rico              | PRT                              |
| America/Punta_Arenas             | America/Punta_Arenas             |
| America/Rainy_River              | America/Rainy_River              |
| America/Rankin_Inlet             | America/Rankin_Inlet             |
| America/Recife                   | America/Recife                   |
| America/Regina                   | America/Regina                   |
| America/Resolute                 | America/Resolute                 |
| America/Rio_Branco               | America/Rio_Branco               |
| America/Rosario                  | America/Rosario                  |
| America/Santa_Isabel             | America/Santa_Isabel             |
| America/Santarem                 | America/Santarem                 |
| America/Santiago                 | America/Santiago                 |
| America/Santo_Domingo            | America/Santo_Domingo            |
| America/Sao_Paulo                | BET                              |
| America/Scoresbysund             | America/Scoresbysund             |
| America/Shiprock                 | Navajo                           |
| America/Sitka                    | America/Sitka                    |
| America/St_Barthelemy            | PRT                              |
| America/St_Johns                 | CNT                              |
| America/St_Kitts                 | PRT                              |
| America/St_Lucia                 | PRT                              |
| America/St_Thomas                | PRT                              |
| America/St_Vincent               | PRT                              |
| America/Swift_Current            | America/Swift_Current            |
| America/Tegucigalpa              | America/Tegucigalpa              |
| America/Thule                    | America/Thule                    |
| America/Thunder_Bay              | America/Thunder_Bay              |
| America/Tijuana                  | America/Tijuana                  |
| America/Toronto                  | America/Toronto                  |
| America/Tortola                  | PRT                              |
| America/Vancouver                | America/Vancouver                |
| America/Virgin                   | PRT                              |
| America/Whitehorse               | America/Whitehorse               |
| America/Winnipeg                 | America/Winnipeg                 |
| America/Yakutat                  | America/Yakutat                  |
| America/Yellowknife              | America/Yellowknife              |
| Antarctica/Casey                 | Antarctica/Casey                 |
| Antarctica/Davis                 | Antarctica/Davis                 |
| Antarctica/DumontDUrville        | Antarctica/DumontDUrville        |
| Antarctica/Macquarie             | Antarctica/Macquarie             |
| Antarctica/Mawson                | Antarctica/Mawson                |
| Antarctica/McMurdo               | NZ                               |
| Antarctica/Palmer                | Antarctica/Palmer                |
| Antarctica/Rothera               | Antarctica/Rothera               |
| Antarctica/South_Pole            | NZ                               |
| Antarctica/Syowa                 | Antarctica/Syowa                 |
| Antarctica/Troll                 | Antarctica/Troll                 |
| Antarctica/Vostok                | Antarctica/Vostok                |
| Arctic/Longyearbyen              | Arctic/Longyearbyen              |
| Asia/Aden                        | Asia/Aden                        |
| Asia/Almaty                      | Asia/Almaty                      |
| Asia/Amman                       | Asia/Amman                       |
| Asia/Anadyr                      | Asia/Anadyr                      |
| Asia/Aqtau                       | Asia/Aqtau                       |
| Asia/Aqtobe                      | Asia/Aqtobe                      |
| Asia/Ashgabat                    | Asia/Ashgabat                    |
| Asia/Ashkhabad                   | Asia/Ashkhabad                   |
| Asia/Atyrau                      | Asia/Atyrau                      |
| Asia/Baghdad                     | Asia/Baghdad                     |
| Asia/Bahrain                     | Asia/Bahrain                     |
| Asia/Baku                        | Asia/Baku                        |
| Asia/Bangkok                     | Asia/Bangkok                     |
| Asia/Barnaul                     | Asia/Barnaul                     |
| Asia/Beirut                      | Asia/Beirut                      |
| Asia/Bishkek                     | Asia/Bishkek                     |
| Asia/Brunei                      | Asia/Brunei                      |
| Asia/Calcutta                    | IST                              |
| Asia/Chita                       | Asia/Chita                       |
| Asia/Choibalsan                  | Asia/Choibalsan                  |
| Asia/Chongqing                   | CTT                              |
| Asia/Chungking                   | CTT                              |
| Asia/Colombo                     | Asia/Colombo                     |
| Asia/Dacca                       | BST                              |
| Asia/Damascus                    | Asia/Damascus                    |
| Asia/Dhaka                       | BST                              |
| Asia/Dili                        | Asia/Dili                        |
| Asia/Dubai                       | Asia/Dubai                       |
| Asia/Dushanbe                    | Asia/Dushanbe                    |
| Asia/Famagusta                   | Asia/Famagusta                   |
| Asia/Gaza                        | Asia/Gaza                        |
| Asia/Harbin                      | CTT                              |
| Asia/Hebron                      | Asia/Hebron                      |
| Asia/Ho_Chi_Minh                 | VST                              |
| Asia/Hong_Kong                   | Hongkong                         |
| Asia/Hovd                        | Asia/Hovd                        |
| Asia/Irkutsk                     | Asia/Irkutsk                     |
| Asia/Istanbul                    | Turkey                           |
| Asia/Jakarta                     | Asia/Jakarta                     |
| Asia/Jayapura                    | Asia/Jayapura                    |
| Asia/Jerusalem                   | Israel                           |
| Asia/Kabul                       | Asia/Kabul                       |
| Asia/Kamchatka                   | Asia/Kamchatka                   |
| Asia/Karachi                     | PLT                              |
| Asia/Kashgar                     | Asia/Kashgar                     |
| Asia/Kathmandu                   | Asia/Kathmandu                   |
| Asia/Katmandu                    | Asia/Katmandu                    |
| Asia/Khandyga                    | Asia/Khandyga                    |
| Asia/Kolkata                     | IST                              |
| Asia/Krasnoyarsk                 | Asia/Krasnoyarsk                 |
| Asia/Kuala_Lumpur                | Singapore                        |
| Asia/Kuching                     | Asia/Kuching                     |
| Asia/Kuwait                      | Asia/Kuwait                      |
| Asia/Macao                       | Asia/Macao                       |
| Asia/Macau                       | Asia/Macau                       |
| Asia/Magadan                     | Asia/Magadan                     |
| Asia/Makassar                    | Asia/Makassar                    |
| Asia/Manila                      | Asia/Manila                      |
| Asia/Muscat                      | Asia/Muscat                      |
| Asia/Nicosia                     | Asia/Nicosia                     |
| Asia/Novokuznetsk                | Asia/Novokuznetsk                |
| Asia/Novosibirsk                 | Asia/Novosibirsk                 |
| Asia/Omsk                        | Asia/Omsk                        |
| Asia/Oral                        | Asia/Oral                        |
| Asia/Phnom_Penh                  | Asia/Phnom_Penh                  |
| Asia/Pontianak                   | Asia/Pontianak                   |
| Asia/Pyongyang                   | Asia/Pyongyang                   |
| Asia/Qatar                       | Asia/Qatar                       |
| Asia/Qostanay                    | Asia/Qostanay                    |
| Asia/Qyzylorda                   | Asia/Qyzylorda                   |
| Asia/Rangoon                     | Asia/Rangoon                     |
| Asia/Riyadh                      | Asia/Riyadh                      |
| Asia/Saigon                      | VST                              |
| Asia/Sakhalin                    | Asia/Sakhalin                    |
| Asia/Samarkand                   | Asia/Samarkand                   |
| Asia/Seoul                       | ROK                              |
| Asia/Shanghai                    | CTT                              |
| Asia/Singapore                   | Singapore                        |
| Asia/Srednekolymsk               | Asia/Srednekolymsk               |
| Asia/Taipei                      | ROC                              |
| Asia/Tashkent                    | Asia/Tashkent                    |
| Asia/Tbilisi                     | Asia/Tbilisi                     |
| Asia/Tehran                      | Iran                             |
| Asia/Tel_Aviv                    | Israel                           |
| Asia/Thimbu                      | Asia/Thimbu                      |
| Asia/Thimphu                     | Asia/Thimphu                     |
| Asia/Tokyo                       | JST                              |
| Asia/Tomsk                       | Asia/Tomsk                       |
| Asia/Ujung_Pandang               | Asia/Ujung_Pandang               |
| Asia/Ulaanbaatar                 | Asia/Ulaanbaatar                 |
| Asia/Ulan_Bator                  | Asia/Ulan_Bator                  |
| Asia/Urumqi                      | Asia/Urumqi                      |
| Asia/Ust-Nera                    | Asia/Ust-Nera                    |
| Asia/Vientiane                   | Asia/Vientiane                   |
| Asia/Vladivostok                 | Asia/Vladivostok                 |
| Asia/Yakutsk                     | Asia/Yakutsk                     |
| Asia/Yangon                      | Asia/Yangon                      |
| Asia/Yekaterinburg               | Asia/Yekaterinburg               |
| Asia/Yerevan                     | NET                              |
| Atlantic/Azores                  | Atlantic/Azores                  |
| Atlantic/Bermuda                 | Atlantic/Bermuda                 |
| Atlantic/Canary                  | Atlantic/Canary                  |
| Atlantic/Cape_Verde              | Atlantic/Cape_Verde              |
| Atlantic/Faeroe                  | Atlantic/Faeroe                  |
| Atlantic/Faroe                   | Atlantic/Faroe                   |
| Atlantic/Jan_Mayen               | Atlantic/Jan_Mayen               |
| Atlantic/Madeira                 | Atlantic/Madeira                 |
| Atlantic/Reykjavik               | Iceland                          |
| Atlantic/South_Georgia           | Atlantic/South_Georgia           |
| Atlantic/St_Helena               | Iceland                          |
| Atlantic/Stanley                 | Atlantic/Stanley                 |
| Australia/ACT                    | AET                              |
| Australia/Adelaide               | Australia/Adelaide               |
| Australia/Brisbane               | Australia/Brisbane               |
| Australia/Broken_Hill            | Australia/Broken_Hill            |
| Australia/Canberra               | AET                              |
| Australia/Currie                 | Australia/Currie                 |
| Australia/Darwin                 | ACT                              |
| Australia/Eucla                  | Australia/Eucla                  |
| Australia/Hobart                 | Australia/Hobart                 |
| Australia/LHI                    | Australia/LHI                    |
| Australia/Lindeman               | Australia/Lindeman               |
| Australia/Lord_Howe              | Australia/Lord_Howe              |
| Australia/Melbourne              | Australia/Melbourne              |
| Australia/NSW                    | AET                              |
| Australia/North                  | ACT                              |
| Australia/Perth                  | Australia/Perth                  |
| Australia/Queensland             | Australia/Queensland             |
| Australia/South                  | Australia/South                  |
| Australia/Sydney                 | AET                              |
| Australia/Tasmania               | Australia/Tasmania               |
| Australia/Victoria               | Australia/Victoria               |
| Australia/West                   | Australia/West                   |
| Australia/Yancowinna             | Australia/Yancowinna             |
| BET                              | BET                              |
| BST                              | BST                              |
| Brazil/Acre                      | Brazil/Acre                      |
| Brazil/DeNoronha                 | Brazil/DeNoronha                 |
| Brazil/East                      | BET                              |
| Brazil/West                      | Brazil/West                      |
| CAT                              | CAT                              |
| CET                              | CET                              |
| CNT                              | CNT                              |
| CST                              | CST                              |
| CST6CDT                          | CST                              |
| CTT                              | CTT                              |
| Canada/Atlantic                  | Canada/Atlantic                  |
| Canada/Central                   | Canada/Central                   |
| Canada/East-Saskatchewan         | Canada/East-Saskatchewan         |
| Canada/Eastern                   | Canada/Eastern                   |
| Canada/Mountain                  | Canada/Mountain                  |
| Canada/Newfoundland              | CNT                              |
| Canada/Pacific                   | Canada/Pacific                   |
| Canada/Saskatchewan              | Canada/Saskatchewan              |
| Canada/Yukon                     | Canada/Yukon                     |
| Chile/Continental                | Chile/Continental                |
| Chile/EasterIsland               | Chile/EasterIsland               |
| Cuba                             | Cuba                             |
| EAT                              | EAT                              |
| ECT                              | ECT                              |
| EET                              | EET                              |
| EST                              | EST                              |
| EST5EDT                          | EST5EDT                          |
| Egypt                            | ART                              |
| Eire                             | Eire                             |
| Etc/GMT                          | GMT                              |
| Etc/GMT+0                        | GMT                              |
| Etc/GMT+1                        | Etc/GMT+1                        |
| Etc/GMT+10                       | Etc/GMT+10                       |
| Etc/GMT+11                       | Etc/GMT+11                       |
| Etc/GMT+12                       | Etc/GMT+12                       |
| Etc/GMT+2                        | Etc/GMT+2                        |
| Etc/GMT+3                        | Etc/GMT+3                        |
| Etc/GMT+4                        | Etc/GMT+4                        |
| Etc/GMT+5                        | Etc/GMT+5                        |
| Etc/GMT+6                        | Etc/GMT+6                        |
| Etc/GMT+7                        | Etc/GMT+7                        |
| Etc/GMT+8                        | Etc/GMT+8                        |
| Etc/GMT+9                        | Etc/GMT+9                        |
| Etc/GMT-0                        | GMT                              |
| Etc/GMT-1                        | Etc/GMT-1                        |
| Etc/GMT-10                       | Etc/GMT-10                       |
| Etc/GMT-11                       | Etc/GMT-11                       |
| Etc/GMT-12                       | Etc/GMT-12                       |
| Etc/GMT-13                       | Etc/GMT-13                       |
| Etc/GMT-14                       | Etc/GMT-14                       |
| Etc/GMT-2                        | Etc/GMT-2                        |
| Etc/GMT-3                        | Etc/GMT-3                        |
| Etc/GMT-4                        | Etc/GMT-4                        |
| Etc/GMT-5                        | Etc/GMT-5                        |
| Etc/GMT-6                        | Etc/GMT-6                        |
| Etc/GMT-7                        | Etc/GMT-7                        |
| Etc/GMT-8                        | Etc/GMT-8                        |
| Etc/GMT-9                        | Etc/GMT-9                        |
| Etc/GMT0                         | GMT                              |
| Etc/Greenwich                    | GMT                              |
| Etc/UCT                          | UCT                              |
| Etc/UTC                          | UCT                              |
| Etc/Universal                    | UCT                              |
| Etc/Zulu                         | UCT                              |
| Europe/Amsterdam                 | CET                              |
| Europe/Andorra                   | Europe/Andorra                   |
| Europe/Astrakhan                 | Europe/Astrakhan                 |
| Europe/Athens                    | EET                              |
| Europe/Belfast                   | GB                               |
| Europe/Belgrade                  | Europe/Belgrade                  |
| Europe/Berlin                    | Europe/Berlin                    |
| Europe/Bratislava                | Europe/Bratislava                |
| Europe/Brussels                  | CET                              |
| Europe/Bucharest                 | Europe/Bucharest                 |
| Europe/Budapest                  | Europe/Budapest                  |
| Europe/Busingen                  | Europe/Busingen                  |
| Europe/Chisinau                  | Europe/Chisinau                  |
| Europe/Copenhagen                | Europe/Copenhagen                |
| Europe/Dublin                    | Eire                             |
| Europe/Gibraltar                 | Europe/Gibraltar                 |
| Europe/Guernsey                  | GB                               |
| Europe/Helsinki                  | Europe/Helsinki                  |
| Europe/Isle_of_Man               | GB                               |
| Europe/Istanbul                  | Turkey                           |
| Europe/Jersey                    | GB                               |
| Europe/Kaliningrad               | Europe/Kaliningrad               |
| Europe/Kiev                      | Europe/Kiev                      |
| Europe/Kirov                     | Europe/Kirov                     |
| Europe/Kyiv                      | Europe/Kyiv                      |
| Europe/Lisbon                    | WET                              |
| Europe/Ljubljana                 | Europe/Ljubljana                 |
| Europe/London                    | GB                               |
| Europe/Luxembourg                | CET                              |
| Europe/Madrid                    | Europe/Madrid                    |
| Europe/Malta                     | Europe/Malta                     |
| Europe/Mariehamn                 | Europe/Mariehamn                 |
| Europe/Minsk                     | Europe/Minsk                     |
| Europe/Monaco                    | ECT                              |
| Europe/Moscow                    | W-SU                             |
| Europe/Nicosia                   | Europe/Nicosia                   |
| Europe/Oslo                      | Europe/Oslo                      |
| Europe/Paris                     | ECT                              |
| Europe/Podgorica                 | Europe/Podgorica                 |
| Europe/Prague                    | Europe/Prague                    |
| Europe/Riga                      | Europe/Riga                      |
| Europe/Rome                      | Europe/Rome                      |
| Europe/Samara                    | Europe/Samara                    |
| Europe/San_Marino                | Europe/San_Marino                |
| Europe/Sarajevo                  | Europe/Sarajevo                  |
| Europe/Saratov                   | Europe/Saratov                   |
| Europe/Simferopol                | Europe/Simferopol                |
| Europe/Skopje                    | Europe/Skopje                    |
| Europe/Sofia                     | Europe/Sofia                     |
| Europe/Stockholm                 | Europe/Stockholm                 |
| Europe/Tallinn                   | Europe/Tallinn                   |
| Europe/Tirane                    | Europe/Tirane                    |
| Europe/Tiraspol                  | Europe/Tiraspol                  |
| Europe/Ulyanovsk                 | Europe/Ulyanovsk                 |
| Europe/Uzhgorod                  | Europe/Uzhgorod                  |
| Europe/Vaduz                     | Europe/Vaduz                     |
| Europe/Vatican                   | Europe/Vatican                   |
| Europe/Vienna                    | Europe/Vienna                    |
| Europe/Vilnius                   | Europe/Vilnius                   |
| Europe/Volgograd                 | Europe/Volgograd                 |
| Europe/Warsaw                    | Poland                           |
| Europe/Zagreb                    | Europe/Zagreb                    |
| Europe/Zaporozhye                | Europe/Zaporozhye                |
| Europe/Zurich                    | Europe/Zurich                    |
| Factory                          | Factory                          |
| GB                               | GB                               |
| GB-Eire                          | GB                               |
| GMT                              | GMT                              |
| GMT+0                            | GMT                              |
| GMT-0                            | GMT                              |
| GMT0                             | GMT                              |
| Greenwich                        | GMT                              |
| HST                              | HST                              |
| Hongkong                         | Hongkong                         |
| IET                              | IET                              |
| IST                              | IST                              |
| Iceland                          | Iceland                          |
| Indian/Antananarivo              | EAT                              |
| Indian/Chagos                    | Indian/Chagos                    |
| Indian/Christmas                 | Indian/Christmas                 |
| Indian/Cocos                     | Indian/Cocos                     |
| Indian/Comoro                    | EAT                              |
| Indian/Kerguelen                 | Indian/Kerguelen                 |
| Indian/Mahe                      | Indian/Mahe                      |
| Indian/Maldives                  | Indian/Maldives                  |
| Indian/Mauritius                 | Indian/Mauritius                 |
| Indian/Mayotte                   | EAT                              |
| Indian/Reunion                   | Indian/Reunion                   |
| Iran                             | Iran                             |
| Israel                           | Israel                           |
| JST                              | JST                              |
| Jamaica                          | Jamaica                          |
| Japan                            | JST                              |
| Kwajalein                        | Kwajalein                        |
| Libya                            | Libya                            |
| MET                              | CET                              |
| MIT                              | MIT                              |
| MST                              | MST                              |
| MST7MDT                          | Navajo                           |
| Mexico/BajaNorte                 | Mexico/BajaNorte                 |
| Mexico/BajaSur                   | Mexico/BajaSur                   |
| Mexico/General                   | Mexico/General                   |
| NET                              | NET                              |
| NST                              | NZ                               |
| NZ                               | NZ                               |
| NZ-CHAT                          | NZ-CHAT                          |
| Navajo                           | Navajo                           |
| PLT                              | PLT                              |
| PNT                              | MST                              |
| PRC                              | CTT                              |
| PRT                              | PRT                              |
| PST                              | PST                              |
| PST8PDT                          | PST                              |
| Pacific/Apia                     | MIT                              |
| Pacific/Auckland                 | NZ                               |
| Pacific/Bougainville             | Pacific/Bougainville             |
| Pacific/Chatham                  | NZ-CHAT                          |
| Pacific/Chuuk                    | Pacific/Chuuk                    |
| Pacific/Easter                   | Pacific/Easter                   |
| Pacific/Efate                    | Pacific/Efate                    |
| Pacific/Enderbury                | Pacific/Enderbury                |
| Pacific/Fakaofo                  | Pacific/Fakaofo                  |
| Pacific/Fiji                     | Pacific/Fiji                     |
| Pacific/Funafuti                 | Pacific/Funafuti                 |
| Pacific/Galapagos                | Pacific/Galapagos                |
| Pacific/Gambier                  | Pacific/Gambier                  |
| Pacific/Guadalcanal              | SST                              |
| Pacific/Guam                     | Pacific/Guam                     |
| Pacific/Honolulu                 | HST                              |
| Pacific/Johnston                 | HST                              |
| Pacific/Kanton                   | Pacific/Kanton                   |
| Pacific/Kiritimati               | Pacific/Kiritimati               |
| Pacific/Kosrae                   | Pacific/Kosrae                   |
| Pacific/Kwajalein                | Kwajalein                        |
| Pacific/Majuro                   | Pacific/Majuro                   |
| Pacific/Marquesas                | Pacific/Marquesas                |
| Pacific/Midway                   | Pacific/Midway                   |
| Pacific/Nauru                    | Pacific/Nauru                    |
| Pacific/Niue                     | Pacific/Niue                     |
| Pacific/Norfolk                  | Pacific/Norfolk                  |
| Pacific/Noumea                   | Pacific/Noumea                   |
| Pacific/Pago_Pago                | Pacific/Pago_Pago                |
| Pacific/Palau                    | Pacific/Palau                    |
| Pacific/Pitcairn                 | Pacific/Pitcairn                 |
| Pacific/Pohnpei                  | SST                              |
| Pacific/Ponape                   | SST                              |
| Pacific/Port_Moresby             | Pacific/Port_Moresby             |
| Pacific/Rarotonga                | Pacific/Rarotonga                |
| Pacific/Saipan                   | Pacific/Saipan                   |
| Pacific/Samoa                    | Pacific/Samoa                    |
| Pacific/Tahiti                   | Pacific/Tahiti                   |
| Pacific/Tarawa                   | Pacific/Tarawa                   |
| Pacific/Tongatapu                | Pacific/Tongatapu                |
| Pacific/Truk                     | Pacific/Truk                     |
| Pacific/Wake                     | Pacific/Wake                     |
| Pacific/Wallis                   | Pacific/Wallis                   |
| Pacific/Yap                      | Pacific/Yap                      |
| Poland                           | Poland                           |
| Portugal                         | WET                              |
| ROC                              | ROC                              |
| ROK                              | ROK                              |
| SST                              | SST                              |
| Singapore                        | Singapore                        |
| SystemV/AST4                     | SystemV/AST4                     |
| SystemV/AST4ADT                  | SystemV/AST4ADT                  |
| SystemV/CST6                     | SystemV/CST6                     |
| SystemV/CST6CDT                  | SystemV/CST6CDT                  |
| SystemV/EST5                     | SystemV/EST5                     |
| SystemV/EST5EDT                  | SystemV/EST5EDT                  |
| SystemV/HST10                    | SystemV/HST10                    |
| SystemV/MST7                     | SystemV/MST7                     |
| SystemV/MST7MDT                  | SystemV/MST7MDT                  |
| SystemV/PST8                     | SystemV/PST8                     |
| SystemV/PST8PDT                  | SystemV/PST8PDT                  |
| SystemV/YST9                     | SystemV/YST9                     |
| SystemV/YST9YDT                  | SystemV/YST9YDT                  |
| Turkey                           | Turkey                           |
| UCT                              | UCT                              |
| US/Alaska                        | AST                              |
| US/Aleutian                      | US/Aleutian                      |
| US/Arizona                       | MST                              |
| US/Central                       | CST                              |
| US/East-Indiana                  | IET                              |
| US/Eastern                       | EST5EDT                          |
| US/Hawaii                        | HST                              |
| US/Indiana-Starke                | US/Indiana-Starke                |
| US/Michigan                      | US/Michigan                      |
| US/Mountain                      | Navajo                           |
| US/Pacific                       | PST                              |
| US/Pacific-New                   | PST                              |
| US/Samoa                         | US/Samoa                         |
| UTC                              | UCT                              |
| Universal                        | UCT                              |
| VST                              | VST                              |
| W-SU                             | W-SU                             |
| WET                              | WET                              |
| Zulu                             | UCT                              |

### Union Type {#docs:current:sql:data_types:union}

A `UNION` *type* (not to be confused with the SQL [`UNION` operator](#docs:current:sql:query_syntax:setops::union-all-by-name)) is a nested type capable of holding one of multiple “alternative” values, much like the `union` in C. The main difference is that these `UNION` types are *tagged unions* and thus always carry a discriminator “tag” which signals which alternative it is currently holding, even if the inner value itself is null. `UNION` types are thus more similar to C++17's `std::variant`, Rust's `Enum` or the “sum type” present in most functional languages.

`UNION` types must always have at least one member, and while they can contain multiple members of the same type, the tag names must be unique. `UNION` types can have at most 256 members.

Under the hood, `UNION` types are implemented on top of `STRUCT` types, and simply keep the “tag” as the first entry.

`UNION` values can be created with the [`union_value(tag := expr)`](#docs:current:sql:functions:union) function or by [casting from a member type](#::casting-to-unions).

#### Example {#docs:current:sql:data_types:union::example}

Create a table with a `UNION` column:

```sql
CREATE TABLE tbl1 (u UNION(num INTEGER, str VARCHAR));
INSERT INTO tbl1 VALUES (1), ('two'), (union_value(str := 'three'));
```

Any type can be implicitly cast to a `UNION` containing the type. Any `UNION` can also be implicitly cast to another `UNION` if the source `UNION` members are a subset of the target's (if the cast is unambiguous).

`UNION` uses the member types' `VARCHAR` cast functions when casting to `VARCHAR`:

```sql
SELECT u FROM tbl1;
```

|   u   |
|-------|
| 1     |
| two   |
| three |

Select all the `str` members:

```sql
SELECT union_extract(u, 'str') AS str
FROM tbl1;
```

|  str  |
|-------|
| NULL  |
| two   |
| three |

Alternatively, you can use 'dot syntax' similarly to [`STRUCT`s](#docs:current:sql:data_types:struct).

```sql
SELECT u.str
FROM tbl1;
```

|  str  |
|-------|
| NULL  |
| two   |
| three |

Select the currently active tag from the `UNION` as an `ENUM`.

```sql
SELECT union_tag(u) AS t
FROM tbl1;
```

|  t  |
|-----|
| num |
| str |
| str |

#### Union Casts {#docs:current:sql:data_types:union::union-casts}

Compared to other nested types, `UNION`s allow a set of implicit casts to facilitate unintrusive and natural usage when working with their members as “subtypes”.
However, these casts have been designed with two principles in mind, to avoid ambiguity and to avoid casts that could lead to loss of information. This prevents `UNION`s from being completely “transparent”, while still allowing `UNION` types to have a “supertype” relationship with their members.

Thus `UNION` types can't be implicitly cast to any of their member types in general, since the information in the other members not matching the target type would be “lost”. If you want to coerce a `UNION` into one of its members, you should use the `union_extract` function explicitly instead.

The only exception to this is when casting a `UNION` to `VARCHAR`, in which case the members will all use their corresponding `VARCHAR` casts. Since everything can be cast to `VARCHAR`, this is “safe” in a sense.

##### Casting to Unions {#docs:current:sql:data_types:union::casting-to-unions}

A type can always be implicitly cast to a `UNION` if it can be implicitly cast to one of the `UNION` member types.

* If there are multiple candidates, the built in implicit casting priority rules determine the target type. For example, a `FLOAT` → `UNION(i INTEGER, v VARCHAR)` cast will always cast the `FLOAT` to the `INTEGER` member before `VARCHAR`.
* If the cast still is ambiguous, i.e., there are multiple candidates with the same implicit casting priority, an error is raised. This usually happens when the `UNION` contains multiple members of the same type, e.g., a `FLOAT` → `UNION(i INTEGER, num INTEGER)` is always ambiguous.

So how do we disambiguate if we want to create a `UNION` with multiple members of the same type? By using the `union_value` function, which takes a keyword argument specifying the tag. For example, `union_value(num := 2::INTEGER)` will create a `UNION` with a single member of type `INTEGER` with the tag `num`. This can then be used to disambiguate in an explicit (or implicit, read on below!) `UNION` to `UNION` cast, like `CAST(union_value(b := 2) AS UNION(a INTEGER, b INTEGER))`.

##### Casting between Unions {#docs:current:sql:data_types:union::casting-between-unions}

`UNION` types can be cast between each other if the source type is a “subset” of the target type. In other words, all the tags in the source `UNION` must be present in the target `UNION`, and all the types of the matching tags must be implicitly castable between source and target. In essence, this means that `UNION` types are covariant with respect to their members.

| Ok | Source                 | Target                 | Comments                               |
|----|------------------------|------------------------|----------------------------------------|
| ✅ | `UNION(a A, b B)`      | `UNION(a A, b B, c C)` |                                        |
| ✅ | `UNION(a A, b B)`      | `UNION(a A, b C)`      | if `B` can be implicitly cast to `C`   |
| ❌ | `UNION(a A, b B, c C)` | `UNION(a A, b B)`      |                                        |
| ❌ | `UNION(a A, b B)`      | `UNION(a A, b C)`      | if `B` can't be implicitly cast to `C` |
| ❌ | `UNION(A, B, D)`       | `UNION(A, B, C)`       |                                        |

#### Comparison and Sorting {#docs:current:sql:data_types:union::comparison-and-sorting}

Since `UNION` types are implemented on top of `STRUCT` types internally, they can be used with all the comparison operators as well as in both `WHERE` and `HAVING` clauses with the [same semantics as `STRUCT`s](#docs:current:sql:data_types:struct::comparison-operators). The “tag” is always stored as the first struct entry, which ensures that the `UNION` types are compared and ordered by “tag” first.

#### Functions {#docs:current:sql:data_types:union::functions}

See [Union Functions](#docs:current:sql:functions:union).

### Typecasting {#docs:current:sql:data_types:typecasting}

Typecasting is an operation that converts a value in one particular data type to the closest corresponding value in another data type.
Like other SQL engines, DuckDB supports both implicit and explicit typecasting.

#### Explicit Casting {#docs:current:sql:data_types:typecasting::explicit-casting}

Explicit typecasting is performed by using a `CAST` expression. For example, `CAST(col AS VARCHAR)` or `col::VARCHAR` explicitly cast the column `col` to `VARCHAR`. See the [cast page](#docs:current:sql:expressions:cast) for more information.

#### Implicit Casting {#docs:current:sql:data_types:typecasting::implicit-casting}

In many situations, the system will add casts by itself. This is called *implicit* casting and happens, for example, when a function is called with an argument that does not match the type of the function but can be cast to the required type.

Implicit casts can only be added for a number of type combinations, and is generally only possible when the cast cannot fail. For example, an implicit cast can be added from `INTEGER` to `DOUBLE` – but not from `DOUBLE` to `INTEGER`.

Consider the function `sin(DOUBLE)`. This function takes as input argument a column of type `DOUBLE`, however, it can be called with an integer as well: `sin(1)`. The integer is converted into a double before being passed to the `sin` function.

> **Tip.** To check whether a type can be implicitly cast to another type, use the [`can_cast_implicitly` function](#docs:current:sql:functions:utility::can_cast_implicitlysource_value-target_value).

##### Combination Casting {#docs:current:sql:data_types:typecasting::combination-casting}

When values of different types need to be combined to an unspecified joint parent type, the system will perform implicit casts to an automatically selected parent type. For example, `list_value(1::INT64, 1::UINT64)` creates a list of type `INT128[]`. The implicit casts performed in this situation are sometimes more lenient than regular implicit casts. For example, a `BOOL` value may be cast to `INT` (with `true` mapping to `1` and `false` to `0`) even though this is not possible for regular implicit casts.

This *combination casting* occurs for comparisons (` =` / `<` / `>`), set operations (` UNION` / `EXCEPT` / `INTERSECT`), and nested type constructors (` list_value` / `[...]` / `MAP`).

#### Casting Operations Matrix {#docs:current:sql:data_types:typecasting::casting-operations-matrix}

Values of a particular data type cannot always be cast to any arbitrary target data type. The only exception is the `NULL` value – which can always be converted between types.
The following matrix describes which conversions are supported.
When implicit casting is allowed, it implies that explicit casting is also possible.

![Typecasting matrix](../images/typecasting-matrix.png)

Even though a casting operation is supported based on the source and target data type, it does not necessarily mean the cast operation will succeed at runtime.

> **Deprecated.** Prior to version 0.10.0, DuckDB allowed any type to be implicitly cast to `VARCHAR` during function binding.
> Version 0.10.0 introduced a [breaking change which no longer allows implicit casts to `VARCHAR`](https://duckdb.org/2024/02/13/announcing-duckdb-0100#breaking-sql-changes).
> The [`old_implicit_casting` configuration option](#docs:current:configuration:pragmas::implicit-casting-to-varchar) setting can be used to revert to the old behavior.
> However, please note that this flag will be deprecated in the future.

##### Lossy Casts {#docs:current:sql:data_types:typecasting::lossy-casts}

Casting operations that result in loss of precision are allowed. For example, it is possible to explicitly cast a numeric type with fractional digits – such as `DECIMAL`, `FLOAT` or `DOUBLE` – to an integral type like `INTEGER` or `BIGINT`. The number will be rounded.

```sql
SELECT CAST(3.1 AS INTEGER);  -- 3
SELECT CAST(3.5 AS INTEGER);  -- 4
SELECT CAST(-1.7 AS INTEGER); -- -2
```

##### Overflows {#docs:current:sql:data_types:typecasting::overflows}

Casting operations that would result in a value overflow throw an error. For example, the value `999` is too large to be represented by the `TINYINT` data type. Therefore, an attempt to cast that value to that type results in a runtime error:

```sql
SELECT CAST(999 AS TINYINT);
```

```console
Conversion Error:
Type INT32 with value 999 can't be cast because the value is out of range for the destination type INT8
```

So even though the cast operation from `INTEGER` to `TINYINT` is supported, it is not possible for this particular value. [TRY_CAST](#docs:current:sql:expressions:cast) can be used to convert the value into `NULL` instead of throwing an error.

##### Varchar {#docs:current:sql:data_types:typecasting::varchar}

The [`VARCHAR`](#docs:current:sql:data_types:text) type acts as a universal target: any arbitrary value of any arbitrary type can always be cast to the `VARCHAR` type. This type is also used for displaying values in the shell.

```sql
SELECT CAST(42.5 AS VARCHAR);
```

Casting from `VARCHAR` to another data type is supported, but can raise an error at runtime if DuckDB cannot parse and convert the provided text to the target data type.

```sql
SELECT CAST('NotANumber' AS INTEGER);
```

In general, casting to `VARCHAR` is a lossless operation and any type can be cast back to the original type after being converted into text.

```sql
SELECT CAST(CAST([1, 2, 3] AS VARCHAR) AS INTEGER[]);
```

##### Literal Types {#docs:current:sql:data_types:typecasting::literal-types}

Integer literals (such as `42`) and string literals (such as `'string'`) have special implicit casting rules. See the [literal types page](#docs:current:sql:data_types:literal_types) for more information.

##### Lists / Arrays {#docs:current:sql:data_types:typecasting::lists--arrays}

Lists can be explicitly cast to other lists using the same casting rules. The cast is applied to the children of the list. For example, if we convert an `INTEGER[]` list to a `VARCHAR[]` list, the child `INTEGER` elements are individually cast to `VARCHAR` and a new list is constructed.

```sql
SELECT CAST([1, 2, 3] AS VARCHAR[]);
```

##### Arrays {#docs:current:sql:data_types:typecasting::arrays}

Arrays follow the same casting rules as lists. In addition, arrays can be implicitly cast to lists of the same type. For example, an `INTEGER[3]` array can be implicitly cast to an `INTEGER[]` list.

##### Structs {#docs:current:sql:data_types:typecasting::structs}

Structs can be cast to other structs as long as they share at least one field.

> The rationale behind this requirement is to help avoid unintended errors. If two structs do not have any fields in common, then the cast was likely not intended.

```sql
SELECT CAST({'a': 42} AS STRUCT(a VARCHAR));
```

Fields that exist in the target struct, but that do not exist in the source struct, default to `NULL`.

```sql
SELECT CAST({'a': 42} AS STRUCT(a VARCHAR, b VARCHAR));
```

Fields that only exist in the source struct are ignored.

```sql
SELECT CAST({'a': 42, 'b': 43} AS STRUCT(a VARCHAR));
```

The names of the struct can also be in a different order. The fields of the struct will be reshuffled based on the names of the structs.

```sql
SELECT CAST({'a': 42, 'b': 84} AS STRUCT(b VARCHAR, a VARCHAR));
```

For [combination casting](#docs:current:sql:data_types:typecasting::combination-casting), the fields of the resulting struct are the superset of all fields of the input structs.
This logic also applies recursively to potentially nested structs.

```sql
SELECT {'outer1': {'inner1': 42, 'inner2': 42}} AS c
UNION
SELECT {'outer1': {'inner2': 'hello', 'inner3': 'world'}, 'outer2': '100'} AS c;
```

```sql
SELECT [{'a': 42}, {'b': 84}];
```

##### Unions {#docs:current:sql:data_types:typecasting::unions}

Union casting rules can be found on the [`UNION type page`](#docs:current:sql:data_types:union::casting-to-unions).

## Expressions {#sql:expressions}

### Expressions {#docs:current:sql:expressions:overview}

An expression is a combination of values, operators and functions. Expressions are highly composable, and range from very simple to arbitrarily complex. They can be found in many different parts of SQL statements. This section describes the different types of operators and functions that can be used within expressions.

### CASE Expression {#docs:current:sql:expressions:case}



The `CASE` expression performs a switch based on a condition. The basic form is identical to the ternary condition used in many programming languages (` CASE WHEN cond THEN a ELSE b END` is equivalent to `cond ? a : b`). With a single condition this can be expressed with `IF(cond, a, b)`.

```sql
CREATE OR REPLACE TABLE integers AS SELECT unnest([1, 2, 3]) AS i;
SELECT i, CASE WHEN i > 2 THEN 1 ELSE 0 END AS test
FROM integers;
```

| i | test |
|--:|-----:|
| 1 | 0    |
| 2 | 0    |
| 3 | 1    |

This is equivalent to:

```sql
SELECT i, IF(i > 2, 1, 0) AS test
FROM integers;
```

The `WHEN cond THEN expr` part of the `CASE` expression can be chained, whenever any of the conditions returns true for a single tuple, the corresponding expression is evaluated and returned.

```sql
CREATE OR REPLACE TABLE integers AS SELECT unnest([1, 2, 3]) AS i;
SELECT i, CASE WHEN i = 1 THEN 10 WHEN i = 2 THEN 20 ELSE 0 END AS test
FROM integers;
```

| i | test |
|--:|-----:|
| 1 | 10   |
| 2 | 20   |
| 3 | 0    |

The `ELSE` clause of the `CASE` expression is optional. If no `ELSE` clause is provided and none of the conditions match, the `CASE` expression will return `NULL`.

```sql
CREATE OR REPLACE TABLE integers AS SELECT unnest([1, 2, 3]) AS i;
SELECT i, CASE WHEN i = 1 THEN 10 END AS test
FROM integers;
```

| i | test |
|--:|-----:|
| 1 | 10   |
| 2 | NULL |
| 3 | NULL |

It is also possible to provide an individual expression after the `CASE` but before the `WHEN`. When this is done, the `CASE` expression is effectively transformed into a `switch` statement.

```sql
CREATE OR REPLACE TABLE integers AS SELECT unnest([1, 2, 3]) AS i;
SELECT i, CASE i WHEN 1 THEN 10 WHEN 2 THEN 20 WHEN 3 THEN 30 END AS test
FROM integers;
```

| i | test |
|--:|-----:|
| 1 | 10   |
| 2 | 20   |
| 3 | 30   |

This is equivalent to:

```sql
SELECT i, CASE WHEN i = 1 THEN 10 WHEN i = 2 THEN 20 WHEN i = 3 THEN 30 END AS test
FROM integers;
```

#### `SWITCH` Expression {#docs:current:sql:expressions:case::switch-expression}

The `SWITCH` expression is syntactic sugar for the `CASE` expression. It takes an expression, a [`MAP`](#docs:current:sql:data_types:map) of values to results, and an optional default value.

```sql
CREATE OR REPLACE TABLE integers AS SELECT unnest([1, 2, 3]) AS i;
SELECT i, SWITCH(i, MAP {1: 'one', 2: 'two', 3: 'three'}) AS test
FROM integers;
```

| i | test  |
|--:|-------|
| 1 | one   |
| 2 | two   |
| 3 | three |

A default value can be provided as the third argument, which is returned when none of the map keys match:

```sql
SELECT i, SWITCH(i, MAP {1: 'one', 2: 'two'}, 'other') AS test
FROM integers;
```

| i | test  |
|--:|-------|
| 1 | one   |
| 2 | two   |
| 3 | other |

### Casting {#docs:current:sql:expressions:cast}



Casting refers to the operation of converting a value in a particular data type to the corresponding value in another data type.
Casting can occur either implicitly or explicitly. The syntax described here performs an explicit cast. More information on casting can be found on the [typecasting page](#docs:current:sql:data_types:typecasting).

#### Explicit Casting {#docs:current:sql:expressions:cast::explicit-casting}

The standard SQL syntax for explicit casting is `CAST(expr AS TYPENAME)`, where `TYPENAME` is a name (or alias) of one of [DuckDB's data types](#docs:current:sql:data_types:overview). DuckDB also supports the shorthand `expr::TYPENAME`, which is also present in PostgreSQL.

```sql
SELECT CAST(i AS VARCHAR) AS i
FROM generate_series(1, 3) tbl(i);
```

| i |
|---|
| 1 |
| 2 |
| 3 |

```sql
SELECT i::DOUBLE AS i
FROM generate_series(1, 3) tbl(i);
```

|  i  |
|----:|
| 1.0 |
| 2.0 |
| 3.0 |

##### Casting Rules {#docs:current:sql:expressions:cast::casting-rules}

Not all casts are possible. For example, it is not possible to convert an `INTEGER` to a `DATE`. Casts may also throw errors when the cast could not be successfully performed. For example, trying to cast the string `'hello'` to an `INTEGER` will result in an error being thrown.

```sql
SELECT CAST('hello' AS INTEGER);
```

```console
Conversion Error:
Could not convert string 'hello' to INT32
```

The exact behavior of the cast depends on the source and destination types. For example, when casting from `VARCHAR` to any other type, the string will be attempted to be converted.

##### `TRY_CAST` {#docs:current:sql:expressions:cast::try_cast}

`TRY_CAST` can be used when the preferred behavior is not to throw an error, but instead to return a `NULL` value. `TRY_CAST` will never throw an error, and will instead return `NULL` if a cast is not possible.

```sql
SELECT TRY_CAST('hello' AS INTEGER) AS i;
```

|  i   |
|------|
| NULL |

#### `cast_to_type` Function {#docs:current:sql:expressions:cast::cast_to_type-function}

The `cast_to_type` function allows generating a cast from an expression to the type of another column.
For example:

```sql
SELECT cast_to_type('42', NULL::INTEGER) AS result;
```

```text
┌───────┐
│  res  │
│ int32 │
├───────┤
│  42   │
└───────┘
```

This function is primarily useful in [macros](#docs:current:guides:snippets:sharing_macros), as it allows you to maintain types.
This helps with making generic macros that operate on different types. For example, the following macro adds to a number if the input is an `INTEGER`:

```sql
CREATE TABLE tbl (i INT, s VARCHAR);
INSERT INTO tbl VALUES (42, 'hello world');

CREATE MACRO conditional_add(col, nr) AS
    CASE
        WHEN typeof(col) == 'INTEGER' THEN cast_to_type(col::INTEGER + nr, col)
        ELSE col
    END;
SELECT conditional_add(COLUMNS(*), 100) FROM tbl;
```

```text
┌───────┬─────────────┐
│   i   │      s      │
│ int32 │   varchar   │
├───────┼─────────────┤
│  142  │ hello world │
└───────┴─────────────┘
```

Note that the `CASE` statement needs to return the same type in all code paths. We can perform the addition on any input column by adding a cast to the desired type – but we need to cast the result of the addition back to the source type to make the binding work.

### Collations {#docs:current:sql:expressions:collations}



Collations provide rules for how text should be sorted or compared in the execution engine. Collations are useful for localization, as the rules for how text should be ordered are different for different languages or for different countries. These orderings are often incompatible with one another. For example, in English the letter `y` comes between `x` and `z`. However, in Lithuanian the letter `y` comes between the `i` and `j`. For that reason, different collations are supported. The user must choose which collation they want to use when performing sorting and comparison operations.

By default, the `BINARY` collation is used. That means that strings are ordered and compared based only on their binary contents. This makes sense for standard ASCII characters (i.e., the letters A-Z and numbers 0-9), but generally does not make much sense for special unicode characters. It is, however, by far the fastest method of performing ordering and comparisons. Hence it is recommended to stick with the `BINARY` collation unless required otherwise.

> The `BINARY` collation is also available under the aliases `C` and `POSIX`.

> **Warning.** Collation support in DuckDB has [some known limitations](https://github.com/duckdb/duckdb/issues?q=is%3Aissue+is%3Aopen+collation+) and has [several planned improvements](https://github.com/duckdb/duckdb/issues/604).

#### Using Collations {#docs:current:sql:expressions:collations::using-collations}

In the stand-alone installation of DuckDB three collations are included: `NOCASE`, `NOACCENT` and `NFC`. The `NOCASE` collation compares characters as equal regardless of their casing. The `NOACCENT` collation compares characters as equal regardless of their accents. The `NFC` collation performs NFC-normalized comparisons, see [Unicode normalization](https://en.wikipedia.org/wiki/Unicode_equivalence#Normalization) for more information.

```sql
SELECT 'hello' = 'hElLO';
```

```text
false
```

```sql
SELECT 'hello' COLLATE NOCASE = 'hElLO';
```

```text
true
```

```sql
SELECT 'hello' = 'hëllo';
```

```text
false
```

```sql
SELECT 'hello' COLLATE NOACCENT = 'hëllo';
```

```text
true
```

Collations can be combined by chaining them using the dot operator. Note, however, that not all collations can be combined together. In general, the `NOCASE` collation can be combined with any other collator, but most other collations cannot be combined.

```sql
SELECT 'hello' COLLATE NOCASE = 'hElLÖ';
```

```text
false
```

```sql
SELECT 'hello' COLLATE NOACCENT = 'hElLÖ';
```

```text
false
```

```sql
SELECT 'hello' COLLATE NOCASE.NOACCENT = 'hElLÖ';
```

```text
true
```

#### Default Collations {#docs:current:sql:expressions:collations::default-collations}

The collations we have seen so far have all been specified *per expression*. It is also possible to specify a default collator, either on the global database level or on a base table column. The `PRAGMA` `default_collation` can be used to specify the global default collator. This is the collator that will be used if no other one is specified.

```sql
SET default_collation = NOCASE;
SELECT 'hello' = 'HeLlo';
```

```text
true
```

Collations can also be specified per-column when creating a table. When that column is then used in a comparison, the per-column collation is used to perform that comparison.

```sql
CREATE TABLE names (name VARCHAR COLLATE NOACCENT);
INSERT INTO names VALUES ('hännes');
```

```sql
SELECT name
FROM names
WHERE name = 'hannes';
```

```text
hännes
```

Be careful here, however, as different collations cannot be combined. This can be problematic when you want to compare columns that have a different collation specified.

```sql
SELECT name
FROM names
WHERE name = 'hannes' COLLATE NOCASE;
```

```console
ERROR: Cannot combine types with different collation!
```

```sql
CREATE TABLE other_names (name VARCHAR COLLATE NOCASE);
INSERT INTO other_names VALUES ('HÄNNES');
```

```sql
SELECT names.name AS name, other_names.name AS other_name
FROM names, other_names
WHERE names.name = other_names.name;
```

```console
ERROR: Cannot combine types with different collation!
```

We need to manually overwrite the collation:

```sql
SELECT names.name AS name, other_names.name AS other_name
FROM names, other_names
WHERE names.name COLLATE NOACCENT.NOCASE = other_names.name COLLATE NOACCENT.NOCASE;
```

|  name  | other_name |
|--------|------------|
| hännes | HÄNNES     |

#### ICU Collations {#docs:current:sql:expressions:collations::icu-collations}

The collations we have seen so far are not region-dependent, and do not follow any specific regional rules. If you wish to follow the rules of a specific region or language, you will need to use one of the ICU collations. For that, you need to [load the ICU extension](#docs:current:core_extensions:icu::installing-and-loading).

Loading this extension will add a number of language and region specific collations to your database. These can be queried using the `PRAGMA collations` command, or by querying the `pragma_collations` function.

```sql
PRAGMA collations;
SELECT list(collname) FROM pragma_collations();
```

```text
[af, am, ar, ar_sa, as, az, be, bg, bn, bo, br, bs, ca, ceb, chr, cs, cy, da, de, de_at, dsb, dz, ee, el, en, en_us, eo, es, et, fa, fa_af, ff, fi, fil, fo, fr, fr_ca, fy, ga, gl, gu, ha, haw, he, he_il, hi, hr, hsb, hu, hy, icu_noaccent, id, id_id, ig, is, it, ja, ka, kk, kl, km, kn, ko, kok, ku, ky, lb, lkt, ln, lo, lt, lv, mk, ml, mn, mr, ms, mt, my, nb, nb_no, ne, nfc, nl, nn, noaccent, nocase, om, or, pa, pa_in, pl, ps, pt, ro, ru, sa, se, si, sk, sl, smn, sq, sr, sr_ba, sr_me, sr_rs, sv, sw, ta, te, th, tk, to, tr, ug, uk, ur, uz, vi, wae, wo, xh, yi, yo, yue, yue_cn, zh, zh_cn, zh_hk, zh_mo, zh_sg, zh_tw, zu]
```

These collations can then be used as the other collations would be used before. They can also be combined with the `NOCASE` collation. For example, to use the German collation rules you could use the following code snippet:

```sql
CREATE TABLE strings (s VARCHAR COLLATE DE);
INSERT INTO strings VALUES ('Gabel'), ('Göbel'), ('Goethe'), ('Goldmann'), ('Göthe'), ('Götz');
SELECT * FROM strings ORDER BY s;
```

```text
"Gabel", "Göbel", "Goethe", "Goldmann", "Göthe", "Götz"
```

### Comparisons {#docs:current:sql:expressions:comparison_operators}

#### Comparison Operators {#docs:current:sql:expressions:comparison_operators::comparison-operators}



The table below shows the standard comparison operators.
Whenever either of the input arguments is `NULL`, the output of the comparison is `NULL`.

| Operator | Description | Example | Result |
|:---|:---|:---|:---|
| `<` | less than | `2 < 3` | `true` |
| `>` | greater than | `2 > 3` | `false` |
| `<=` | less than or equal to | `2 <= 3` | `true` |
| `>=` | greater than or equal to | `4 >= NULL` | `NULL` |
| `=` or `==` | equal | `NULL = NULL` | `NULL` |
| `<>` or `!=` | not equal | `2 <> 2` | `false` |

The table below shows the standard distinction operators.
These operators treat `NULL` values as equal.

| Operator | Description | Example | Result |
|:---|:---|:---|:-|
| `IS DISTINCT FROM` | not equal, including `NULL` | `2 IS DISTINCT FROM NULL` | `true` |
| `IS NOT DISTINCT FROM` | equal, including `NULL` | `NULL IS NOT DISTINCT FROM NULL` | `true` |

##### Combination Casting {#docs:current:sql:expressions:comparison_operators::combination-casting}

When performing comparison on different types, DuckDB performs [Combination Casting](#docs:current:sql:data_types:typecasting::combination-casting).
These casts were introduced to make interactive querying more convenient and are in line with the casts performed by several programming languages but are often not compatible with PostgreSQL's behavior. For example, the following expressions evaluate and return `true` in DuckDB but fail in PostgreSQL.

```sql
SELECT 1 = true;
SELECT 1 = '1.1';
```

> It is not possible to enforce stricter type-checking for DuckDB's comparison operators. If you require stricter type-checking, consider creating a [macro](#docs:current:sql:statements:create_macro) with the [`typeof` function](#docs:current:sql:functions:utility::typeofexpression) or implementing a [user-defined function](#docs:current:clients:python:function).

#### `BETWEEN` and `IS [NOT] NULL` {#docs:current:sql:expressions:comparison_operators::between-and-is-not-null}



Besides the standard comparison operators there are also the `BETWEEN` and `IS (NOT) NULL` operators. These behave much like operators, but have special syntax mandated by the SQL standard. They are shown in the table below.

Note that `BETWEEN` and `NOT BETWEEN` are only equivalent to the examples below in the cases where both `a`, `x` and `y` are of the same type, as `BETWEEN` will cast all of its inputs to the same type.

| Predicate | Description |
|:---|:---|
| `a BETWEEN x AND y` | equivalent to `x <= a AND a <= y` |
| `a NOT BETWEEN x AND y` | equivalent to `x > a OR a > y` |
| `expression IS NULL` | `true` if expression is `NULL`, `false` otherwise |
| `expression ISNULL` | alias for `IS NULL` (non-standard) |
| `expression IS NOT NULL` | `false` if expression is `NULL`, `true` otherwise |
| `expression NOTNULL` | alias for `IS NOT NULL` (non-standard) |

> For the expression `BETWEEN x AND y`, `x` is used as the lower bound and `y` is used as the upper bound. Therefore, if `x > y`, the result will always be `false`.

### IN Operator {#docs:current:sql:expressions:in}

The `IN` operator checks containment of the left expression inside the _collection_ on the right hand side (RHS).
Supported collections on the RHS are tuples, lists, maps and subqueries that return a single column.



#### `IN (val1, val2, ...)` (Tuple) {#docs:current:sql:expressions:in::in-val1-val2--tuple}

The `IN` operator on a tuple `(val1, val2, ...)` returns `true` if the expression is present in the RHS, `false` if the expression is not in the RHS and the RHS has no `NULL` values, or `NULL` if the expression is not in the RHS and the RHS has `NULL` values.

```sql
SELECT 'Math' IN ('CS', 'Math');
```

```text
true
```

```sql
SELECT 'English' IN ('CS', 'Math');
```

```text
false
```

```sql
SELECT 'Math' IN ('CS', 'Math', NULL);
```

```text
true
```

```sql
SELECT 'English' IN ('CS', 'Math', NULL);
```

```text
NULL
```

#### `IN [val1, val2, ...]` (List) {#docs:current:sql:expressions:in::in-val1-val2--list}

The `IN` operator works on lists according to the semantics used in Python.
Unlike for the [`IN tuple` operator](#::in-val1-val2--tuple), the presence of `NULL` values on the right hand side of the expression does not make a difference in the result:

```sql
SELECT 'Math' IN ['CS', 'Math', NULL];
```

```text
true
```

```sql
SELECT 'English' IN ['CS', 'Math', NULL];
```

```text
false
```

#### `IN` Map {#docs:current:sql:expressions:in::in-map}

The `IN` operator works on [maps](#docs:current:sql:data_types:map) according to the semantics used in Python, i.e., it checks for the presence of keys (not values):

```sql
SELECT 'key1' IN MAP {'key1': 50, 'key2': 75};
```

```text
true
```

```sql
SELECT 'key3' IN MAP {'key1': 50, 'key2': 75};
```

```text
false
```

#### `IN` Subquery {#docs:current:sql:expressions:in::in-subquery}

The `IN` operator works with [subqueries](#docs:current:sql:expressions:subqueries) that return a single column.
For example:

```sql
SELECT 42 IN (SELECT unnest([32, 42, 52]) AS x);
```

```text
true
```

If the subquery returns more than one column, a Binder Error is thrown:

```sql
SELECT 42 IN (SELECT unnest([32, 42, 52]) AS x, 'a' AS y);
```

```console
Binder Error:
Subquery returns 2 columns - expected 1
```

#### `IN` String {#docs:current:sql:expressions:in::in-string}

The `IN` operator can be used as a shorthand for the [`contains` string function](#docs:current:sql:functions:text::containsstring-search_string).
For example:

```sql
SELECT 'Hello' IN 'Hello World';
```

```text
true
```

#### `NOT IN` {#docs:current:sql:expressions:in::not-in}

`NOT IN` can be used to check if an element is not present in the set.
`x NOT IN y` is equivalent to `NOT (x IN y)`.

### Logical Operators {#docs:current:sql:expressions:logical_operators}



The following logical operators are available: `AND`, `OR` and `NOT`. SQL uses a three-valued logic system with `true`, `false` and `NULL`. Note that logical operators involving `NULL` do not always evaluate to `NULL`. For example, `NULL AND false` will evaluate to `false`, and `NULL OR true` will evaluate to `true`. Below are the complete truth tables.

#### Binary Operators: `AND` and `OR` {#docs:current:sql:expressions:logical_operators::binary-operators-and-and-or}



| `a` | `b` | `a AND b` | `a OR b` |
|:---|:---|:---|:---|
| true | true | true | true |
| true | false | false | true |
| true | NULL | NULL | true |
| false | false | false | false |
| false | NULL | false | NULL |
| NULL | NULL | NULL | NULL|

#### Unary Operator: `NOT` {#docs:current:sql:expressions:logical_operators::unary-operator-not}



| `a` | `NOT a` |
|:---|:---|
| true | false |
| false | true |
| NULL | NULL |

The operators `AND` and `OR` are commutative, that is, you can switch the left and right operand without affecting the result.

### Star Expression {#docs:current:sql:expressions:star}

#### Syntax {#docs:current:sql:expressions:star::syntax}



The `*` expression can be used in a `SELECT` statement to select all columns that are projected in the `FROM` clause.

```sql
SELECT *
FROM tbl;
```

##### `TABLE.*` and `STRUCT.*` {#docs:current:sql:expressions:star::table-and-struct}

The `*` expression can be prepended by a table name to select only columns from that table.

```sql
SELECT tbl.*
FROM tbl
JOIN other_tbl USING (id);
```

Similarly, the `*` expression can also be used to retrieve all keys from a struct as separate columns.
This is particularly useful when a prior operation creates a struct of unknown shape, or if a query must handle any potential struct keys.
See the [`STRUCT` data type](#docs:current:sql:data_types:struct) and [`STRUCT` functions](#docs:current:sql:functions:struct) pages for more details on working with structs.

For example:

```sql
SELECT st.* FROM (SELECT {'x': 1, 'y': 2, 'z': 3} AS st);
```

| x | y | z |
|--:|--:|--:|
| 1 | 2 | 3 |


##### `EXCLUDE` Clause {#docs:current:sql:expressions:star::exclude-clause}

`EXCLUDE` allows you to exclude specific columns from the `*` expression.

```sql
SELECT * EXCLUDE (col)
FROM tbl;
```

##### `REPLACE` Clause {#docs:current:sql:expressions:star::replace-clause}

`REPLACE` allows you to replace specific columns by alternative expressions.

```sql
SELECT * REPLACE (col1 / 1_000 AS col1, col2 / 1_000 AS col2)
FROM tbl;
```

##### `RENAME` Clause {#docs:current:sql:expressions:star::rename-clause}

`RENAME` allows you to replace specific columns.

```sql
SELECT * RENAME (col1 AS height, col2 AS width)
FROM tbl;
```

##### Column Filtering via Pattern Matching Operators {#docs:current:sql:expressions:star::column-filtering-via-pattern-matching-operators}

The [pattern matching operators](#docs:current:sql:functions:pattern_matching) `LIKE`, `GLOB`, `SIMILAR TO` and their variants allow you to select columns by matching their names to patterns.

```sql
SELECT * LIKE 'col%'
FROM tbl;
```

```sql
SELECT * GLOB 'col*'
FROM tbl;
```

```sql
SELECT * SIMILAR TO 'col.'
FROM tbl;
```

The `NOT` variants of these operators are also supported to exclude columns that match the pattern:

```sql
SELECT * NOT SIMILAR TO 'col.'
FROM tbl;
```

#### `COLUMNS` Expression {#docs:current:sql:expressions:star::columns-expression}


The `COLUMNS` expression is similar to the regular star expression, but additionally allows you to execute the same expression on the resulting columns.

```sql
CREATE TABLE numbers (id INTEGER, number INTEGER);
INSERT INTO numbers VALUES (1, 10), (2, 20), (3, NULL);
SELECT min(COLUMNS(*)), count(COLUMNS(*)) FROM numbers;
```

| id | number | id | number |
|---:|-------:|---:|-------:|
| 1  | 10     | 3  | 2      |

```sql
SELECT
    min(COLUMNS(* REPLACE (number + id AS number))),
    count(COLUMNS(* EXCLUDE (number)))
FROM numbers;
```

| id | min(number := (number + id)) | id |
|---:|-----------------------------:|---:|
| 1  | 11                           | 3  |

`COLUMNS` expressions can also be combined, as long as they contain the same star expression:

```sql
SELECT COLUMNS(*) + COLUMNS(*) FROM numbers;
```

| id | number |
|---:|-------:|
| 2  | 20     |
| 4  | 40     |
| 6  | NULL   |


##### `COLUMNS` Expression in a `WHERE` Clause {#docs:current:sql:expressions:star::columns-expression-in-a-where-clause}

`COLUMNS` expressions can also be used in `WHERE` clauses. The conditions are applied to all columns and are combined using the logical `AND` operator.

```sql
SELECT *
FROM (
    SELECT 'a', 'a'
    UNION ALL
    SELECT 'a', 'b'
    UNION ALL
    SELECT 'b', 'b'
) _(x, y)
WHERE COLUMNS(*) = 'a'; -- equivalent to: x = 'a' AND y = 'a'
```

| x | y |
|--:|--:|
| a | a |

To combine conditions using the logical `OR` operator, you can `UNPACK` the `COLUMNS` expression into the variadic `greatest` function.

```sql
SELECT *
FROM (
    SELECT 'a', 'a'
    UNION ALL
    SELECT 'a', 'b'
    UNION ALL
    SELECT 'b', 'b'
) _(x, y)
WHERE greatest(UNPACK(COLUMNS(*) = 'a')); -- equivalent to: x = 'a' OR y = 'a'
```

| x | y |
|--:|--:|
| a | a |
| a | b |

##### `COLUMNS` Expression in `DISTINCT ON` {#docs:current:sql:expressions:star::columns-expression-in-distinct-on}

`COLUMNS` expressions can be used in [`DISTINCT ON`](#docs:current:sql:query_syntax:select::distinct-on-clause) clauses to specify distinct columns by pattern:

```sql
SELECT DISTINCT ON (COLUMNS('x|y')) *
FROM (VALUES (1, 2, 'a'), (1, 2, 'b'), (3, 4, 'c')) t(x, y, z);
```

| x | y | z |
|--:|--:|---|
| 1 | 2 | a |
| 3 | 4 | c |

##### Regular Expressions in a `COLUMNS` Expression {#docs:current:sql:expressions:star::regular-expressions-in-a-columns-expression}

`COLUMNS` expressions don't currently support the pattern matching operators, but they do support regular expression matching by simply passing a string constant in place of the star:

```sql
SELECT COLUMNS('(id|numbers?)') FROM numbers;
```

| id | number |
|---:|-------:|
| 1  | 10     |
| 2  | 20     |
| 3  | NULL   |

##### Renaming Columns with Regular Expressions in a `COLUMNS` Expression {#docs:current:sql:expressions:star::renaming-columns-with-regular-expressions-in-a-columns-expression}

The matches of capture groups in regular expressions can be used to rename matching columns.
The capture groups are one-indexed; `\0` is the original column name.

For example, to select the first three letters of column names, run:

```sql
SELECT COLUMNS('(\w{3}).*') AS '\1' FROM numbers;
```

| id | num  |
|---:|-----:|
| 1  | 10   |
| 2  | 20   |
| 3  | NULL |

To remove a colon (` :`) character in the middle of a column name, run:

```sql
CREATE TABLE tbl ("Foo:Bar" INTEGER, "Foo:Baz" INTEGER, "Foo:Qux" INTEGER);
SELECT COLUMNS('(\w*):(\w*)') AS '\1\2' FROM tbl;
```

To add the original column name to the expression alias, run:

```sql
SELECT min(COLUMNS(*)) AS "min_\0" FROM numbers;
```

| min_id | min_number |
|-------:|-----------:|
|      1 |         10 |

##### `COLUMNS` Lambda Function {#docs:current:sql:expressions:star::columns-lambda-function}

`COLUMNS` also supports passing in a lambda function. The lambda function will be evaluated for all columns present in the `FROM` clause, and only columns that match the lambda function will be returned. This allows the execution of arbitrary expressions in order to select and rename columns.

```sql
SELECT COLUMNS(lambda c: c LIKE '%num%') FROM numbers;
```

| number |
|-------:|
| 10     |
| 20     |
| NULL   |


##### `COLUMNS` List {#docs:current:sql:expressions:star::columns-list}

`COLUMNS` also supports passing in a list of column names.

```sql
SELECT COLUMNS(['id', 'num']) FROM numbers;
```

| id | num  |
|---:|-----:|
| 1  | 10   |
| 2  | 20   |
| 3  | NULL |

#### Unpacking a `COLUMNS` Expression {#docs:current:sql:expressions:star::unpacking-a-columns-expression}

By wrapping a `COLUMNS` expression in `UNPACK`, the columns expand into a parent expression, much like the [iterable unpacking behavior in Python](https://peps.python.org/pep-3132/).

Without `UNPACK`, operations on the `COLUMNS` expression are applied to each column separately:

```sql
SELECT coalesce(COLUMNS(['a', 'b', 'c'])) AS result
FROM (SELECT NULL a, 42 b, true c);
```

| result | result | result |
|--------|-------:|-------:|
| NULL   | 42     | true   |

With `UNPACK`, the `COLUMNS` expression is expanded into its parent expression, `coalesce` in the example above, which results in a single column:

```sql
SELECT coalesce(UNPACK(COLUMNS(['a', 'b', 'c']))) AS result
FROM (SELECT NULL AS a, 42 AS b, true AS c);
```

| result |
|-------:|
| 42     |

The `UNPACK` keyword may be replaced by `*`, [matching Python syntax](https://peps.python.org/pep-3132/), when it is applied directly to the `COLUMNS` expression without any intermediate operations.

```sql
SELECT coalesce(*COLUMNS(*)) AS result
FROM (SELECT NULL a, 42 AS b, true AS c);
```

| result |
|-------:|
| 42     |

> **Warning.** In the following example, replacing `UNPACK` by `*` results in a syntax error:
> 
> ```sql
> SELECT greatest(UNPACK(COLUMNS(*) + 1)) AS result
> FROM (SELECT 1 AS a, 2 AS b, 3 AS c);
> ```
> 
> | result |
> |-------:|
> | 4      |

#### `STRUCT.*` {#docs:current:sql:expressions:star::struct}

The `*` expression can also be used to retrieve all keys from a struct as separate columns.
This is particularly useful when a prior operation creates a struct of unknown shape, or if a query must handle any potential struct keys.
See the [`STRUCT` data type](#docs:current:sql:data_types:struct) and [`STRUCT` functions](#docs:current:sql:functions:struct) pages for more details on working with structs.

For example:

```sql
SELECT st.* FROM (SELECT {'x': 1, 'y': 2, 'z': 3} AS st);
```

| x | y | z |
|--:|--:|--:|
| 1 | 2 | 3 |

### Subqueries {#docs:current:sql:expressions:subqueries}

Subqueries are parenthesized query expressions that appear as part of a larger, outer query. Subqueries are usually based on `SELECT ... FROM`, but in DuckDB other query constructs such as [`PIVOT`](#docs:current:sql:statements:pivot) can also appear as a subquery.

#### Scalar Subquery {#docs:current:sql:expressions:subqueries::scalar-subquery}



Scalar subqueries are subqueries that return a single value. They can be used anywhere where an expression can be used. If a scalar subquery returns more than a single value, an error is raised (unless `scalar_subquery_error_on_multiple_rows` is set to `false`, in which case a row is selected randomly).

Consider the following table:

##### Grades {#docs:current:sql:expressions:subqueries::grades}

| grade | course |
|---:|:---|
| 7 | Math |
| 9 | Math |
| 8 | CS |

```sql
CREATE TABLE grades (grade INTEGER, course VARCHAR);
INSERT INTO grades VALUES (7, 'Math'), (9, 'Math'), (8, 'CS');
```

We can run the following query to obtain the minimum grade:

```sql
SELECT min(grade) FROM grades;
```

| min(grade) |
|-----------:|
| 7          |

By using a scalar subquery in the `WHERE` clause, we can figure out for which course this grade was obtained:

```sql
SELECT course FROM grades WHERE grade = (SELECT min(grade) FROM grades);
```

| course |
|--------|
| Math   |

#### `ARRAY` Subqueries {#docs:current:sql:expressions:subqueries::array-subqueries}

Subqueries that return multiple values can be wrapped with `ARRAY` to collect all results in a list.

```sql
SELECT ARRAY(SELECT grade FROM grades) AS all_grades;
```

| all_grades |
|-----------:|
| [7, 9, 8]  |



#### Subquery Comparisons: `ALL`, `ANY` and `SOME` {#docs:current:sql:expressions:subqueries::subquery-comparisons-all-any-and-some}

In the section on [scalar subqueries](#::scalar-subquery), a scalar expression was compared directly to a subquery using the equality [comparison operator](#docs:current:sql:expressions:comparison_operators::comparison-operators) (` =`).
Such direct comparisons only make sense with scalar subqueries.

Scalar expressions can still be compared to single-column subqueries returning multiple rows by specifying a quantifier. Available quantifiers are `ALL`, `ANY` and `SOME`. The quantifiers `ANY` and `SOME` are equivalent.

##### `ALL` {#docs:current:sql:expressions:subqueries::all}

The `ALL` quantifier specifies that the comparison as a whole evaluates to `true` when the individual comparison results of _the expression at the left hand side of the comparison operator_ with each of the values from _the subquery at the right hand side of the comparison operator_ **all** evaluate to `true`:

```sql
SELECT 6 <= ALL (SELECT grade FROM grades) AS adequate;
```

returns:

| adequate |
|----------|
| true     |

because 6 is less than or equal to each of the subquery results 7, 8 and 9.

However, the following query

```sql
SELECT 8 >= ALL (SELECT grade FROM grades) AS excellent;
```

returns

| excellent |
|-----------|
| false     |

because 8 is not greater than or equal to the subquery result 9. And thus, because not all comparisons evaluate to `true`, `>= ALL` as a whole evaluates to `false`.

##### `ANY` {#docs:current:sql:expressions:subqueries::any}

The `ANY` quantifier specifies that the comparison as a whole evaluates to `true` when at least one of the individual comparison results evaluates to `true`.
For example:

```sql
SELECT 5 >= ANY (SELECT grade FROM grades) AS fail;
```

returns

| fail  |
|-------|
| false |

because no result of the subquery is less than or equal to 5.

The quantifier `SOME` may be used instead of `ANY`: `ANY` and `SOME` are interchangeable.

#### `EXISTS` {#docs:current:sql:expressions:subqueries::exists}



The `EXISTS` operator tests for the existence of any row inside the subquery. It returns either true when the subquery returns one or more records, and false otherwise. The `EXISTS` operator is generally the most useful as a *correlated* subquery to express semijoin operations. However, it can be used as an uncorrelated subquery as well.

For example, we can use it to figure out if there are any grades present for a given course:

```sql
SELECT EXISTS (FROM grades WHERE course = 'Math') AS math_grades_present;
```

| math_grades_present |
|--------------------:|
| true                |

```sql
SELECT EXISTS (FROM grades WHERE course = 'History') AS history_grades_present;
```

| history_grades_present |
|-----------------------:|
| false                  |

> The subqueries in the examples above make use of the fact that you can omit the `SELECT *` in DuckDB thanks to the [`FROM`-first syntax](#docs:current:sql:query_syntax:from). The `SELECT` clause is required in subqueries by other SQL systems but cannot fulfill any purpose in `EXISTS` and `NOT EXISTS` subqueries.

##### `NOT EXISTS` {#docs:current:sql:expressions:subqueries::not-exists}

The `NOT EXISTS` operator tests for the absence of any row inside the subquery. It returns either true when the subquery returns an empty result, and false otherwise. The `NOT EXISTS` operator is generally the most useful as a *correlated* subquery to express antijoin operations. For example, to find Person nodes without an interest:

```sql
CREATE TABLE Person (id BIGINT, name VARCHAR);
CREATE TABLE interest (PersonId BIGINT, topic VARCHAR);

INSERT INTO Person VALUES (1, 'Jane'), (2, 'Joe');
INSERT INTO interest VALUES (2, 'Music');

SELECT *
FROM Person
WHERE NOT EXISTS (FROM interest WHERE interest.PersonId = Person.id);
```

| id | name |
|---:|------|
| 1  | Jane |

> DuckDB automatically detects when a `NOT EXISTS` query expresses an antijoin operation. There is no need to manually rewrite such queries to use `LEFT OUTER JOIN ... WHERE ... IS NULL`.

#### `IN` Operator {#docs:current:sql:expressions:subqueries::in-operator}



The `IN` operator checks containment of the left expression inside the result defined by the subquery or the set of expressions on the right hand side (RHS). The `IN` operator returns true if the expression is present in the RHS, false if the expression is not in the RHS and the RHS has no `NULL` values, or `NULL` if the expression is not in the RHS and the RHS has `NULL` values.

We can use the `IN` operator in a similar manner as we used the `EXISTS` operator:

```sql
SELECT 'Math' IN (SELECT course FROM grades) AS math_grades_present;
```

| math_grades_present |
|--------------------:|
| true                |

#### Correlated Subqueries {#docs:current:sql:expressions:subqueries::correlated-subqueries}

All the subqueries presented here so far have been **uncorrelated** subqueries, where the subqueries themselves are entirely self-contained and can be run without the parent query. There exists a second type of subqueries called **correlated** subqueries. For correlated subqueries, the subquery uses values from the parent subquery.

Conceptually, the subqueries are run once for every single row in the parent query. Perhaps a simple way of envisioning this is that the correlated subquery is a **function** that is applied to every row in the source dataset.

For example, suppose that we want to find the minimum grade for every course. We could do that as follows:

```sql
SELECT *
FROM grades grades_parent
WHERE grade =
    (SELECT min(grade)
     FROM grades
     WHERE grades.course = grades_parent.course);
```

| grade | course |
|------:|--------|
| 7     | Math   |
| 8     | CS     |

The subquery uses a column from the parent query (` grades_parent.course`). Conceptually, we can see the subquery as a function where the correlated column is a parameter to that function:

```sql
SELECT min(grade)
FROM grades
WHERE course = ?;
```

Now when we execute this function for each of the rows, we can see that for `Math` this will return `7`, and for `CS` it will return `8`. We then compare it against the grade for that actual row. As a result, the row `(Math, 9)` will be filtered out, as `9 <> 7`.

#### Returning Each Row of the Subquery as a Struct {#docs:current:sql:expressions:subqueries::returning-each-row-of-the-subquery-as-a-struct}

Using the name of a subquery in the `SELECT` clause (without referring to a specific column) turns each row of the subquery into a struct whose fields correspond to the columns of the subquery. For example:

```sql
SELECT t
FROM (SELECT unnest(generate_series(41, 43)) AS x, 'hello' AS y) t;
```



|           t           |
|-----------------------|
| {'x': 41, 'y': hello} |
| {'x': 42, 'y': hello} |
| {'x': 43, 'y': hello} |

### TRY Expression {#docs:current:sql:expressions:try}

The `TRY` expression ensures that errors caused by the input rows in the child (scalar) expression result in `NULL` for those rows, instead of causing the query to throw an error.

> The `TRY` expression was inspired by the [`TRY_CAST` expression](#docs:current:sql:expressions:cast::try_cast).

#### Examples {#docs:current:sql:expressions:try::examples}

The following calls return errors when invoked without the `TRY` expression.
When they are wrapped into a `TRY` expression, they return `NULL`:

##### Casting {#docs:current:sql:expressions:try::casting}

###### Without `TRY` {#docs:current:sql:expressions:try::without-try}

```sql
SELECT 'abc'::INTEGER;
```

```console
Conversion Error:
Could not convert string 'abc' to INT32
```

###### With `TRY` {#docs:current:sql:expressions:try::with-try}

```sql
SELECT TRY('abc'::INTEGER);
```

```text
NULL
```

##### Logarithm on Zero {#docs:current:sql:expressions:try::logarithm-on-zero}

###### Without `TRY` {#docs:current:sql:expressions:try::without-try}

```sql
SELECT ln(0);
```

```console
Out of Range Error:
cannot take logarithm of zero
```

###### With `TRY` {#docs:current:sql:expressions:try::with-try}

```sql
SELECT TRY(ln(0));
```

```text
NULL
```

##### Casting Multiple Rows {#docs:current:sql:expressions:try::casting-multiple-rows}

###### Without `TRY` {#docs:current:sql:expressions:try::without-try}

```sql
WITH cte AS (FROM (VALUES ('123'), ('test'), ('235')) t(a))
SELECT a::INTEGER AS x FROM cte;
```

```console
Conversion Error:
Could not convert string 'test' to INT32
```

###### With `TRY` {#docs:current:sql:expressions:try::with-try}

```sql
WITH cte AS (FROM (VALUES ('123'), ('test'), ('235')) t(a))
SELECT TRY(a::INTEGER) AS x FROM cte;
```



|  x   |
|-----:|
| 123  |
| NULL |
| 235  |

#### Limitations {#docs:current:sql:expressions:try::limitations}

`TRY` cannot be used in combination with a volatile function, an aggregate function, or a [scalar subquery](#docs:current:sql:expressions:subqueries::scalar-subquery).
For example:

```sql
SELECT TRY(random())
```

```console
Binder Error:
TRY can not be used in combination with a volatile function
```

## Functions {#sql:functions}

### Functions {#docs:current:sql:functions:overview}

#### Function Syntax {#docs:current:sql:functions:overview::function-syntax}



#### Function Chaining via the Dot Operator {#docs:current:sql:functions:overview::function-chaining-via-the-dot-operator}

DuckDB supports the dot syntax for function chaining. This allows the function call `fn(arg1, arg2, arg3, ...)` to be rewritten as `arg1.fn(arg2, arg3, ...)`. For example, take the following use of the [`replace` function](#docs:current:sql:functions:text::replacestring-source-target):

```sql
SELECT replace(goose_name, 'goose', 'duck') AS duck_name
FROM unnest(['African goose', 'Faroese goose', 'Hungarian goose', 'Pomeranian goose']) breed(goose_name);
```

This can be rewritten as follows:

```sql
SELECT goose_name.replace('goose', 'duck') AS duck_name
FROM unnest(['African goose', 'Faroese goose', 'Hungarian goose', 'Pomeranian goose']) breed(goose_name);
```

##### Using with Literals and Arrays {#docs:current:sql:functions:overview::using-with-literals-and-arrays}

To apply function chaining to literals and following array access operations, you must surround the argument with parentheses, e.g.:

```sql
SELECT ('hello world').replace(' ', '_');
```

```sql
SELECT (2).sqrt();
```

```sql
SELECT (m[1]).map_entries()
FROM (VALUES ([MAP {'hello': 42}, MAP {'world': 42}])) t(m);
```

In the absence of these parentheses, DuckDB will return a `Parser Error` for the function call:

```console
Parser Error:
syntax error at or near "("
```

##### Limitations {#docs:current:sql:functions:overview::limitations}

Function chaining via the dot operator is limited to *scalar* functions and is not supported for *table* functions.
For example, the following call returns a `Parser Error`:

```sql
SELECT * FROM ('my_file.parquet').read_parquet(); -- does not work
```

Additionally, the functions `coalesce` and `ifnull` cannot be used with function chaining for the time being:

```sql
SELECT (2).coalesce(0); -- does not work
SELECT (2).ifnull(0); -- does not work
```

#### Query Functions {#docs:current:sql:functions:overview::query-functions}

The `duckdb_functions()` table function shows the list of functions currently built into the system.

```sql
SELECT DISTINCT ON(function_name)
    function_name,
    function_type,
    return_type,
    parameters,
    parameter_types,
    description
FROM duckdb_functions()
WHERE function_type = 'scalar'
  AND function_name LIKE 'b%'
ORDER BY function_name;
```

| function_name | function_type | return_type | parameters             | parameter_types                  | description                                                                                                                              |
| ------------- | ------------- | ----------- | ---------------------- | -------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
| bar           | scalar        | VARCHAR     | [x, min, max, width]   | [DOUBLE, DOUBLE, DOUBLE, DOUBLE] | Draws a band whose width is proportional to (x - min) and equal to width characters when x = max. width defaults to 80                   |
| base64        | scalar        | VARCHAR     | [blob]                 | [BLOB]                           | Convert a blob to a base64 encoded string                                                                                                |
| bin           | scalar        | VARCHAR     | [value]                | [VARCHAR]                        | Converts the value to binary representation                                                                                              |
| bit_count     | scalar        | TINYINT     | [x]                    | [TINYINT]                        | Returns the number of bits that are set                                                                                                  |
| bit_length    | scalar        | BIGINT      | [col0]                 | [VARCHAR]                        | Number of bits in a string                                                                                                               |
| bit_position  | scalar        | INTEGER     | [substring, bitstring] | [BIT, BIT]                       | Returns first starting index of the specified substring within bits, or zero if it is not present. The first (leftmost) bit is indexed 1 |
| bitstring     | scalar        | BIT         | [bitstring, length]    | [VARCHAR, INTEGER]               | Pads the bitstring until the specified length                                                                                            |

> Currently, the description and parameter names of functions are not available in the `duckdb_functions()` function.

### Aggregate Functions {#docs:current:sql:functions:aggregates}



#### Examples {#docs:current:sql:functions:aggregates::examples}

Produce a single row containing the sum of the `amount` column:

```sql
SELECT sum(amount)
FROM sales;
```

Produce one row per unique region, containing the sum of `amount` for each group:

```sql
SELECT region, sum(amount)
FROM sales
GROUP BY region;
```

Return only the regions that have a sum of `amount` higher than 100:

```sql
SELECT region
FROM sales
GROUP BY region
HAVING sum(amount) > 100;
```

Return the number of unique values in the `region` column:

```sql
SELECT count(DISTINCT region)
FROM sales;
```

Return two values, the total sum of `amount` and the sum of `amount` minus columns where the region is `north` using the [`FILTER` clause](#docs:current:sql:query_syntax:filter):

```sql
SELECT sum(amount), sum(amount) FILTER (region != 'north')
FROM sales;
```

Returns a list of all regions in order of the `amount` column:

```sql
SELECT list(region ORDER BY amount DESC)
FROM sales;
```

Returns the amount of the first sale using the `first()` aggregate function:

```sql
SELECT first(amount ORDER BY date ASC)
FROM sales;
```

#### Syntax {#docs:current:sql:functions:aggregates::syntax}



Aggregates are functions that *combine* multiple rows into a single value. Aggregates are different from scalar functions and window functions because they change the cardinality of the result. As such, aggregates can only be used in the `SELECT` and `HAVING` clauses of a SQL query.

##### `DISTINCT` Clause in Aggregate Functions {#docs:current:sql:functions:aggregates::distinct-clause-in-aggregate-functions}

When the `DISTINCT` clause is provided, only distinct values are considered in the computation of the aggregate. This is typically used in combination with the `count` aggregate to get the number of distinct elements; but it can be used together with any aggregate function in the system.
There are some aggregates that are insensitive to duplicate values (e.g., `min` and `max`) and for them this clause is parsed and ignored.

##### `ORDER BY` Clause in Aggregate Functions {#docs:current:sql:functions:aggregates::order-by-clause-in-aggregate-functions}

An `ORDER BY` clause can be provided after the last argument of the function call. Note the lack of the comma separator before the clause.

```sql
SELECT ⟨aggregate_function⟩(⟨arg⟩, ⟨sep⟩ ORDER BY ⟨ordering_criteria⟩);
```

This clause ensures that the values being aggregated are sorted before applying the function.
Most aggregate functions are order-insensitive, and for them this clause is parsed and discarded.
However, there are some order-sensitive aggregates that can have non-deterministic results without ordering, e.g., `first`, `last`, `list` and `string_agg` / `group_concat` / `listagg`.
These can be made deterministic by ordering the arguments.

For example:

```sql
CREATE TABLE tbl AS
    SELECT s FROM range(1, 4) r(s);

SELECT string_agg(s, ', ' ORDER BY s DESC) AS countdown
FROM tbl;
```

| countdown |
|-----------|
| 3, 2, 1   |

##### Handling `NULL` Values {#docs:current:sql:functions:aggregates::handling-null-values}

All general aggregate functions ignore `NULL`s, except for [`list`](#::listarg) ([`array_agg`](#::listarg)), [`first`](#::firstarg) ([`arbitrary`](#::firstarg)) and [`last`](#::lastarg).
To exclude `NULL`s from `list`, you can use a [`FILTER` clause](#docs:current:sql:query_syntax:filter).
To ignore `NULL`s from `first`, you can use the [`any_value` aggregate](#::any_valuearg).

All general aggregate functions except [`count`](#::countarg) return `NULL` on empty groups.
In particular, [`list`](#::listarg) does *not* return an empty list, [`sum`](#::sumarg) does *not* return zero, and [`string_agg`](#::string_aggarg-sep) does *not* return an empty string in this case.

#### General Aggregate Functions {#docs:current:sql:functions:aggregates::general-aggregate-functions}

The table below shows the available general aggregate functions.

| Function | Description |
|:--|:--------|
| [`any_value(arg)`](#::any_valuearg) | Returns the first non-null value from `arg`. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| [`arg_max(arg, val)`](#::arg_maxarg-val) | Finds the row with the maximum `val` and calculates the `arg` expression at that row. Rows where the value of the `arg` or `val` expression is `NULL` are ignored. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| [`arg_max(arg, val, n)`](#::arg_maxarg-val-n) | The generalized case of [`arg_max`](#::arg_maxarg-val) for `n` values: returns a `LIST` containing the `arg` expressions for the top `n` rows ordered by `val` descending. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| [`arg_max_null(arg, val)`](#::arg_max_nullarg-val) | Finds the row with the maximum `val` and calculates the `arg` expression at that row. Rows where the `val` expression evaluates to `NULL` are ignored. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| [`arg_min(arg, val)`](#::arg_minarg-val) | Finds the row with the minimum `val` and calculates the `arg` expression at that row. Rows where the value of the `arg` or `val` expression is `NULL` are ignored. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| [`arg_min(arg, val, n)`](#::arg_minarg-val-n) | Returns a `LIST` containing the `arg` expressions for the "bottom" `n` rows ordered by `val` ascending. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| [`arg_min_null(arg, val)`](#::arg_min_nullarg-val) | Finds the row with the minimum `val` and calculates the `arg` expression at that row. Rows where the `val` expression evaluates to `NULL` are ignored. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| [`avg(arg)`](#::avgarg) | Calculates the average of all non-null values in `arg`. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| [`bit_and(arg)`](#::bit_andarg) | Returns the bitwise AND of all bits in a given expression. |
| [`bit_or(arg)`](#::bit_orarg) | Returns the bitwise OR of all bits in a given expression. |
| [`bit_xor(arg)`](#::bit_xorarg) | Returns the bitwise XOR of all bits in a given expression. |
| [`bitstring_agg(arg)`](#::bitstring_aggarg) | Returns a bitstring whose length corresponds to the range of the non-null (integer) values, with bits set at the location of each (distinct) value. |
| [`bool_and(arg)`](#::bool_andarg) | Returns `true` if every input value is `true`, otherwise `false`. |
| [`bool_or(arg)`](#::bool_orarg) | Returns `true` if any input value is `true`, otherwise `false`. |
| [`count()`](#::count) | Returns the number of rows. |
| [`count(arg)`](#::countarg) | Returns the number of rows where `arg` is not `NULL`. |
| [`countif(arg)`](#::countifarg) | Returns the number of rows where `arg` is `true`. |
| [`favg(arg)`](#::favgarg) | Calculates the average using a more accurate floating point summation (Kahan Sum). This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| [`first(arg)`](#::firstarg) | Returns the first value (null or non-null) from `arg`. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| [`fsum(arg)`](#::fsumarg) | Calculates the sum using a more accurate floating point summation (Kahan Sum). This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| [`geometric_mean(arg)`](#::geometric_meanarg) | Calculates the geometric mean of all non-null values in `arg`. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| [`histogram(arg)`](#::histogramarg) | Returns a `MAP` of key-value pairs representing buckets and counts. |
| [`histogram(arg, boundaries)`](#::histogramarg-boundaries) | Returns a `MAP` of key-value pairs representing the provided upper `boundaries` and counts of elements in the corresponding bins (left-open and right-closed partitions) of the datatype. A boundary at the largest value of the datatype is automatically added when elements larger than all provided `boundaries` appear, see [`is_histogram_other_bin`](#docs:current:sql:functions:utility::is_histogram_other_binarg). Boundaries may be provided, e.g., via [`equi_width_bins`](#docs:current:sql:functions:utility::equi_width_binsminmaxbincountnice). |
| [`histogram_exact(arg, elements)`](#::histogram_exactarg-elements) | Returns a `MAP` of key-value pairs representing the requested elements and their counts. A catch-all element specific to the data-type is automatically added to count other elements when they appear, see [`is_histogram_other_bin`](#docs:current:sql:functions:utility::is_histogram_other_binarg). |
| [`histogram_values(source, boundaries)`](#::histogram_valuessource-col_name-technique-bin_count) | Returns the upper boundaries of the bins and their counts. |
| [`last(arg)`](#::lastarg) | Returns the last value of a column. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| [`list(arg)`](#::listarg) | Returns a `LIST` containing all the values of a column. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| [`max(arg)`](#::maxarg) | Returns the maximum value present in `arg`. This function is [unaffected by distinctness](#::distinct-clause-in-aggregate-functions). |
| [`max(arg, n)`](#::maxarg-n) | Returns a `LIST` containing the `arg` values for the "top" `n` rows ordered by `arg` descending. |
| [`min(arg)`](#::minarg) | Returns the minimum value present in `arg`. This function is [unaffected by distinctness](#::distinct-clause-in-aggregate-functions). |
| [`min(arg, n)`](#::minarg-n) | Returns a `LIST` containing the `arg` values for the "bottom" `n` rows ordered by `arg` ascending. |
| [`product(arg)`](#::productarg) | Calculates the product of all non-null values in `arg`. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| [`string_agg(arg)`](#::string_aggarg-sep) | Concatenates the column string values with a comma separator (` ,`). This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| [`string_agg(arg, sep)`](#::string_aggarg-sep) | Concatenates the column string values with a separator. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| [`sum(arg)`](#::sumarg) | Calculates the sum of all non-null values in `arg` / counts `true` values when `arg` is boolean. The floating-point versions of this function are [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| [`weighted_avg(arg, weight)`](#::weighted_avgarg-weight) | Calculates the weighted average of all non-null values in `arg`, where each value is scaled by its corresponding `weight`. If `weight` is `NULL`, the corresponding `arg` value will be skipped. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |

###### `any_value(arg)` {#docs:current:sql:functions:aggregates::any_valuearg}



|   |   |
|:--|:--------|
| **Description** |Returns the first non-`NULL` value from `arg`. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| **Example** | `any_value(A)` |

###### `arg_max(arg, val)` {#docs:current:sql:functions:aggregates::arg_maxarg-val}



|   |   |
|:--|:--------|
| **Description** |Finds the row with the maximum `val` and calculates the `arg` expression at that row. Rows where the value of the `arg` or `val` expression is `NULL` are ignored. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| **Example** | `arg_max(A, B)` |
| **Alias(es)** | `argmax(arg, val)`, `max_by(arg, val)` |

###### `arg_max(arg, val, n)` {#docs:current:sql:functions:aggregates::arg_maxarg-val-n}



|   |   |
|:--|:--------|
| **Description** |The generalized case of [`arg_max`](#::arg_maxarg-val) for `n` values: returns a `LIST` containing the `arg` expressions for the top `n` rows ordered by `val` descending. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| **Example** | `arg_max(A, B, 2)` |
| **Alias(es)** | `argmax(arg, val, n)`, `max_by(arg, val, n)` |

###### `arg_max_null(arg, val)` {#docs:current:sql:functions:aggregates::arg_max_nullarg-val}



|   |   |
|:--|:--------|
| **Description** |Finds the row with the maximum `val` and calculates the `arg` expression at that row. Rows where the `val` expression evaluates to `NULL` are ignored. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| **Example** | `arg_max_null(A, B)` |

###### `arg_min(arg, val)` {#docs:current:sql:functions:aggregates::arg_minarg-val}



|   |   |
|:--|:--------|
| **Description** |Finds the row with the minimum `val` and calculates the `arg` expression at that row. Rows where the value of the `arg` or `val` expression is `NULL` are ignored. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| **Example** | `arg_min(A, B)` |
| **Alias(es)** | `argmin(arg, val)`, `min_by(arg, val)` |

###### `arg_min(arg, val, n)` {#docs:current:sql:functions:aggregates::arg_minarg-val-n}



|   |   |
|:--|:--------|
| **Description** |The generalized case of [`arg_min`](#::arg_minarg-val) for `n` values: returns a `LIST` containing the `arg` expressions for the bottom `n` rows ordered by `val` ascending. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| **Example** | `arg_min(A, B, 2)` |
| **Alias(es)** | `argmin(arg, val, n)`, `min_by(arg, val, n)` |

###### `arg_min_null(arg, val)` {#docs:current:sql:functions:aggregates::arg_min_nullarg-val}



|   |   |
|:--|:--------|
| **Description** |Finds the row with the minimum `val` and calculates the `arg` expression at that row. Rows where the `val` expression evaluates to `NULL` are ignored. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| **Example** | `arg_min_null(A, B)` |

###### `avg(arg)` {#docs:current:sql:functions:aggregates::avgarg}



|   |   |
|:--|:--------|
| **Description** |Calculates the average of all non-null values in `arg`. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| **Example** | `avg(A)` |
| **Alias(es)** | `mean` |

###### `bit_and(arg)` {#docs:current:sql:functions:aggregates::bit_andarg}



|   |   |
|:--|:--------|
| **Description** |Returns the bitwise `AND` of all bits in a given expression. |
| **Example** | `bit_and(A)` |

###### `bit_or(arg)` {#docs:current:sql:functions:aggregates::bit_orarg}



|   |   |
|:--|:--------|
| **Description** |Returns the bitwise `OR` of all bits in a given expression. |
| **Example** | `bit_or(A)` |

###### `bit_xor(arg)` {#docs:current:sql:functions:aggregates::bit_xorarg}



|   |   |
|:--|:--------|
| **Description** |Returns the bitwise `XOR` of all bits in a given expression. |
| **Example** | `bit_xor(A)` |

###### `bitstring_agg(arg)` {#docs:current:sql:functions:aggregates::bitstring_aggarg}



|   |   |
|:--|:--------|
| **Description** |Returns a bitstring whose length corresponds to the range of the non-null (integer) values, with bits set at the location of each (distinct) value. |
| **Example** | `bitstring_agg(A)` |

###### `bool_and(arg)` {#docs:current:sql:functions:aggregates::bool_andarg}



|   |   |
|:--|:--------|
| **Description** |Returns `true` if every input value is `true`, otherwise `false`. |
| **Example** | `bool_and(A)` |

###### `bool_or(arg)` {#docs:current:sql:functions:aggregates::bool_orarg}



|   |   |
|:--|:--------|
| **Description** |Returns `true` if any input value is `true`, otherwise `false`. |
| **Example** | `bool_or(A)` |

###### `count()` {#docs:current:sql:functions:aggregates::count}



|   |   |
|:--|:--------|
| **Description** |Returns the number of rows. |
| **Example** | `count()` |
| **Alias(es)** | `count(*)` |

###### `count(arg)` {#docs:current:sql:functions:aggregates::countarg}



|   |   |
|:--|:--------|
| **Description** |Returns the number of rows where `arg` is not `NULL`. |
| **Example** | `count(A)` |

###### `countif(arg)` {#docs:current:sql:functions:aggregates::countifarg}



|   |   |
|:--|:--------|
| **Description** |Returns the number of rows where `arg` is `true`. |
| **Example** | `countif(A)` |

###### `favg(arg)` {#docs:current:sql:functions:aggregates::favgarg}



|   |   |
|:--|:--------|
| **Description** |Calculates the average using a more accurate floating point summation (Kahan Sum). This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| **Example** | `favg(A)` |

###### `first(arg)` {#docs:current:sql:functions:aggregates::firstarg}



|   |   |
|:--|:--------|
| **Description** |Returns the first value (null or non-null) from `arg`. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| **Example** | `first(A)` |
| **Alias(es)** | `arbitrary(A)` |

###### `fsum(arg)` {#docs:current:sql:functions:aggregates::fsumarg}



|   |   |
|:--|:--------|
| **Description** |Calculates the sum using a more accurate floating point summation (Kahan Sum). This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| **Example** | `fsum(A)` |
| **Alias(es)** | `sumkahan`, `kahan_sum` |

###### `geometric_mean(arg)` {#docs:current:sql:functions:aggregates::geometric_meanarg}



|   |   |
|:--|:--------|
| **Description** |Calculates the geometric mean of all non-null values in `arg`. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| **Example** | `geometric_mean(A)` |
| **Alias(es)** | `geomean(A)` |

###### `histogram(arg)` {#docs:current:sql:functions:aggregates::histogramarg}



|   |   |
|:--|:--------|
| **Description** |Returns a `MAP` of key-value pairs representing buckets and counts. |
| **Example** | `histogram(A)` |

###### `histogram(arg, boundaries)` {#docs:current:sql:functions:aggregates::histogramarg-boundaries}



|   |   |
|:--|:--------|
| **Description** |Returns a `MAP` of key-value pairs representing the provided upper `boundaries` and counts of elements in the corresponding bins (left-open and right-closed partitions) of the datatype. A boundary at the largest value of the datatype is automatically added when elements larger than all provided `boundaries` appear, see [`is_histogram_other_bin`](#docs:current:sql:functions:utility::is_histogram_other_binarg). Boundaries may be provided, e.g., via [`equi_width_bins`](#docs:current:sql:functions:utility::equi_width_binsminmaxbincountnice). |
| **Example** | `histogram(A, [0, 1, 10])` |

###### `histogram_exact(arg, elements)` {#docs:current:sql:functions:aggregates::histogram_exactarg-elements}



|   |   |
|:--|:--------|
| **Description** |Returns a `MAP` of key-value pairs representing the requested elements and their counts. A catch-all element specific to the data-type is automatically added to count other elements when they appear, see [`is_histogram_other_bin`](#docs:current:sql:functions:utility::is_histogram_other_binarg). |
| **Example** | `histogram_exact(A, ['a', 'b', 'c'])` |

###### `histogram_values(source, col_name, technique, bin_count)` {#docs:current:sql:functions:aggregates::histogram_valuessource-col_name-technique-bin_count}



|   |   |
|:--|:--------|
| **Description** |Returns the upper boundaries of the bins and their counts. |
| **Example** | `histogram_values(integers, i, bin_count := 2)` |

###### `last(arg)` {#docs:current:sql:functions:aggregates::lastarg}



|   |   |
|:--|:--------|
| **Description** |Returns the last value of a column. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| **Example** | `last(A)` |

###### `list(arg)` {#docs:current:sql:functions:aggregates::listarg}



|   |   |
|:--|:--------|
| **Description** |Returns a `LIST` containing all the values of a column. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| **Example** | `list(A)` |
| **Alias(es)** | `array_agg` |

###### `max(arg)` {#docs:current:sql:functions:aggregates::maxarg}



|   |   |
|:--|:--------|
| **Description** |Returns the maximum value present in `arg`. This function is [unaffected by distinctness](#::distinct-clause-in-aggregate-functions). |
| **Example** | `max(A)` |

###### `max(arg, n)` {#docs:current:sql:functions:aggregates::maxarg-n}



|   |   |
|:--|:--------|
| **Description** | Returns a `LIST` containing the `arg` values for the "top" `n` rows ordered by `arg` descending. |
| **Example** | `max(A, 2)` |

###### `min(arg)` {#docs:current:sql:functions:aggregates::minarg}



|   |   |
|:--|:--------|
| **Description** |Returns the minimum value present in `arg`. This function is [unaffected by distinctness](#::distinct-clause-in-aggregate-functions). |
| **Example** | `min(A)` |

###### `min(arg, n)` {#docs:current:sql:functions:aggregates::minarg-n}



|   |   |
|:--|:--------|
| **Description** |Returns a `LIST` containing the `arg` values for the "bottom" `n` rows ordered by `arg` ascending. |
| **Example** | `min(A, 2)` |

###### `product(arg)` {#docs:current:sql:functions:aggregates::productarg}



|   |   |
|:--|:--------|
| **Description** |Calculates the product of all non-null values in `arg`. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| **Example** | `product(A)` |

###### `string_agg(arg)` {#docs:current:sql:functions:aggregates::string_aggarg}



|   |   |
|:--|:--------|
| **Description** |Concatenates the column string values with a comma separator (` ,`). This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| **Example** | `string_agg(S, ',')` |
| **Alias(es)** | `group_concat(arg)`, `listagg(arg)` |

###### `string_agg(arg, sep)` {#docs:current:sql:functions:aggregates::string_aggarg-sep}



|   |   |
|:--|:--------|
| **Description** |Concatenates the column string values with a separator. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| **Example** | `string_agg(S, ',')` |
| **Alias(es)** | `group_concat(arg, sep)`, `listagg(arg, sep)` |

###### `sum(arg)` {#docs:current:sql:functions:aggregates::sumarg}



|   |   |
|:--|:--------|
| **Description** |Calculates the sum of all non-null values in `arg` / counts `true` values when `arg` is boolean. The floating-point versions of this function are [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| **Example** | `sum(A)` |

###### `weighted_avg(arg, weight)` {#docs:current:sql:functions:aggregates::weighted_avgarg-weight}



|   |   |
|:--|:--------|
| **Description** |Calculates the weighted average of all non-null values in `arg`, where each value is scaled by its corresponding `weight`. If `weight` is `NULL`, the value will be skipped. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| **Example** | `weighted_avg(A, W)` |
| **Alias(es)** | `wavg(arg, weight)` |

#### Approximate Aggregates {#docs:current:sql:functions:aggregates::approximate-aggregates}

The table below shows the available approximate aggregate functions.

| Function | Description | Example |
|:---|:---|:---|
| `approx_count_distinct(x)` | Calculates the approximate count of distinct elements using HyperLogLog. | `approx_count_distinct(A)` |
| `approx_quantile(x, pos)` | Calculates the approximate quantile using T-Digest. | `approx_quantile(A, 0.5)` |
| `approx_top_k(arg, k)` | Calculates a `LIST` of the `k` approximately most frequent values of `arg` using Filtered Space-Saving. | |
| `reservoir_quantile(x, quantile, sample_size = 8192)` | Calculates the approximate quantile using reservoir sampling, the sample size is optional and uses 8192 as a default size. | `reservoir_quantile(A, 0.5, 1024)` |

#### Statistical Aggregates {#docs:current:sql:functions:aggregates::statistical-aggregates}

The table below shows the available statistical aggregate functions.
They all ignore `NULL` values (in the case of a single input column `x`), or pairs where either input is `NULL` (in the case of two input columns `y` and `x`).

| Function | Description |
|:--|:--------|
| [`corr(y, x)`](#::corry-x) | The correlation coefficient. |
| [`covar_pop(y, x)`](#::covar_popy-x) | The population covariance, which does not include bias correction. |
| [`covar_samp(y, x)`](#::covar_sampy-x) | The sample covariance, which includes Bessel's bias correction. |
| [`entropy(x)`](#::entropyx) | The log-2 entropy of count input-values. |
| [`kurtosis_pop(x)`](#::kurtosis_popx) | The excess kurtosis (Fisher’s definition) without bias correction. |
| [`kurtosis(x)`](#::kurtosisx) | The excess kurtosis (Fisher's definition) with bias correction according to the sample size. |
| [`mad(x)`](#::madx) | The median absolute deviation. Temporal types return a positive `INTERVAL`. |
| [`median(x)`](#::medianx) | The middle value of the set. For even value counts, quantitative values are averaged and ordinal values return the lower value. |
| [`mode(x)`](#::modex)| The most frequent value. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| [`quantile_cont(x, pos)`](#::quantile_contx-pos) | The interpolated `pos`-quantile of `x` for `-1 <= pos <= 1`. Returns the `pos * (n_nonnull_values - 1)`th (zero-indexed, in the specified order) value of `x` or an interpolation between the adjacent values if the index is not an integer. Values of `pos` between `-1` and `0` correspond to counting backwards from `1`. More precisely, `quantile_cont(x, -y) = quantile_cont(x, 1 - y)`. Intuitively, arranges the values of `x` as equispaced *points* on a line, starting at 0 and ending at 1, and returns the (interpolated) value at `pos`. This is Type 7 in Hyndman & Fan (1996). If `pos` is a `LIST` of `FLOAT`s, then the result is a `LIST` of the corresponding interpolated quantiles. |
| [`quantile_disc(x, pos)`](#::quantile_discx-pos) | The discrete `pos`-quantile of `x` for `0 <= pos <= 1`. Returns  the `greatest(ceil(pos * n_nonnull_values) - 1, 0)`th (zero-indexed, in the specified order) value of `x`. Intuitively, assigns to each value of `x` an equisized *sub-interval* (left-open and right-closed except for the initial interval) of the interval `[0, 1]`, and picks the value of the sub-interval that contains `pos`. This is Type 1 in Hyndman & Fan (1996). If `pos` is a `LIST` of `FLOAT`s, then the result is a `LIST` of the corresponding discrete quantiles. |
| [`regr_avgx(y, x)`](#::regr_avgxy-x) | The average of the independent variable for non-`NULL` pairs, where x is the independent variable and y is the dependent variable. |
| [`regr_avgy(y, x)`](#::regr_avgyy-x) | The average of the dependent variable for non-`NULL` pairs, where x is the independent variable and y is the dependent variable. |
| [`regr_count(y, x)`](#::regr_county-x) | The number of non-`NULL` pairs. |
| [`regr_intercept(y, x)`](#::regr_intercepty-x) | The intercept of the univariate linear regression line, where x is the independent variable and y is the dependent variable. |
| [`regr_r2(y, x)`](#::regr_r2y-x) | The squared Pearson correlation coefficient between y and x. Also: The coefficient of determination in a linear regression, where x is the independent variable and y is the dependent variable. |
| [`regr_slope(y, x)`](#::regr_slopey-x) | The slope of the linear regression line, where x is the independent variable and y is the dependent variable. |
| [`regr_sxx(y, x)`](#::regr_sxxy-x) | The sample variance, which includes Bessel's bias correction, of the independent variable for non-`NULL` pairs, where x is the independent variable and y is the dependent variable. |
| [`regr_sxy(y, x)`](#::regr_sxyy-x) | The sample covariance, which includes Bessel's bias correction. |
| [`regr_syy(y, x)`](#::regr_syyy-x) | The sample variance, which includes Bessel's bias correction, of the dependent variable for non-`NULL` pairs, where x is the independent variable and y is the dependent variable. |
| [`skewness(x)`](#::skewnessx) | The skewness. |
| [`sem(x)`](#::semx) | The standard error of the mean. |
| [`stddev_pop(x)`](#::stddev_popx) | The population standard deviation. |
| [`stddev_samp(x)`](#::stddev_sampx) | The sample standard deviation. |
| [`var_pop(x)`](#::var_popx) | The population variance, which does not include bias correction. |
| [`var_samp(x)`](#::var_sampx) | The sample variance, which includes Bessel's bias correction. |

###### `corr(y, x)` {#docs:current:sql:functions:aggregates::corry-x}



|   |   |
|:--|:--------|
| **Description** |The correlation coefficient. |
| **Formula** | `covar_pop(y, x) / (stddev_pop(x) * stddev_pop(y))` |

###### `covar_pop(y, x)` {#docs:current:sql:functions:aggregates::covar_popy-x}



|   |   |
|:--|:--------|
| **Description** |The population covariance, which does not include bias correction. |
| **Formula** | `(sum(x*y) - sum(x) * sum(y) / regr_count(y, x)) / regr_count(y, x)`, `covar_samp(y, x) * (1 - 1 / regr_count(y, x))` |

###### `covar_samp(y, x)` {#docs:current:sql:functions:aggregates::covar_sampy-x}



|   |   |
|:--|:--------|
| **Description** |The sample covariance, which includes Bessel's bias correction. |
| **Formula** | `(sum(x*y) - sum(x) * sum(y) / regr_count(y, x)) / (regr_count(y, x) - 1)`, `covar_pop(y, x) / (1 - 1 / regr_count(y, x))` |
| **Alias(es)** | `regr_sxy(y, x)` |

###### `entropy(x)` {#docs:current:sql:functions:aggregates::entropyx}



|   |   |
|:--|:--------|
| **Description** |The log-2 entropy of count input-values. |
| **Formula** | - |

###### `kurtosis_pop(x)` {#docs:current:sql:functions:aggregates::kurtosis_popx}



|   |   |
|:--|:--------|
| **Description** |The excess kurtosis (Fisher’s definition) without bias correction. |
| **Formula** | - |

###### `kurtosis(x)` {#docs:current:sql:functions:aggregates::kurtosisx}



|   |   |
|:--|:--------|
| **Description** |The excess kurtosis (Fisher's definition) with bias correction according to the sample size. |
| **Formula** | - |

###### `mad(x)` {#docs:current:sql:functions:aggregates::madx}



|   |   |
|:--|:--------|
| **Description** |The median absolute deviation. Temporal types return a positive `INTERVAL`. |
| **Formula** | `median(abs(x - median(x)))` |

###### `median(x)` {#docs:current:sql:functions:aggregates::medianx}



|   |   |
|:--|:--------|
| **Description** |The middle value of the set. For even value counts, quantitative values are averaged and ordinal values return the lower value. |
| **Formula** | `quantile_cont(x, 0.5)` |

###### `mode(x)` {#docs:current:sql:functions:aggregates::modex}



|   |   |
|:--|:--------|
| **Description** |The most frequent value. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). |
| **Formula** | - |

###### `quantile_cont(x, pos)` {#docs:current:sql:functions:aggregates::quantile_contx-pos}



|   |   |
|:--|:--------|
| **Description** |The interpolated `pos`-quantile of `x` for `0 <= pos <= 1`. Returns the `pos * (n_nonnull_values - 1)`th (zero-indexed, in the specified order) value of `x` or an interpolation between the adjacent values if the index is not an integer. Intuitively, arranges the values of `x` as equispaced *points* on a line, starting at 0 and ending at 1, and returns the (interpolated) value at `pos`. This is Type 7 in Hyndman & Fan (1996). If `pos` is a `LIST` of `FLOAT`s, then the result is a `LIST` of the corresponding interpolated quantiles. |
| **Formula** | - |

###### `quantile_disc(x, pos)` {#docs:current:sql:functions:aggregates::quantile_discx-pos}



|   |   |
|:--|:--------|
| **Description** |The discrete `pos`-quantile of `x` for `0 <= pos <= 1`. Returns  the `greatest(ceil(pos * n_nonnull_values) - 1, 0)`th (zero-indexed, in the specified order) value of `x`. Intuitively, assigns to each value of `x` an equisized *sub-interval* (left-open and right-closed except for the initial interval) of the interval `[0, 1]`, and picks the value of the sub-interval that contains `pos`. This is Type 1 in Hyndman & Fan (1996). If `pos` is a `LIST` of `FLOAT`s, then the result is a `LIST` of the corresponding discrete quantiles.  |
| **Formula** | - |
| **Alias(es)** | `quantile` |

###### `regr_avgx(y, x)` {#docs:current:sql:functions:aggregates::regr_avgxy-x}



|   |   |
|:--|:--------|
| **Description** |The average of the independent variable for non-`NULL` pairs, where x is the independent variable and y is the dependent variable. |
| **Formula** | - |

###### `regr_avgy(y, x)` {#docs:current:sql:functions:aggregates::regr_avgyy-x}



|   |   |
|:--|:--------|
| **Description** |The average of the dependent variable for non-`NULL` pairs, where x is the independent variable and y is the dependent variable. |
| **Formula** | - |

###### `regr_count(y, x)` {#docs:current:sql:functions:aggregates::regr_county-x}



|   |   |
|:--|:--------|
| **Description** |The number of non-`NULL` pairs. |
| **Formula** | - |

###### `regr_intercept(y, x)` {#docs:current:sql:functions:aggregates::regr_intercepty-x}



|   |   |
|:--|:--------|
| **Description** |The intercept of the univariate linear regression line, where x is the independent variable and y is the dependent variable. |
| **Formula** | `regr_avgy(y, x) - regr_slope(y, x) * regr_avgx(y, x)` |

###### `regr_r2(y, x)` {#docs:current:sql:functions:aggregates::regr_r2y-x}



|   |   |
|:--|:--------|
| **Description** |The squared Pearson correlation coefficient between y and x. Also: The coefficient of determination in a linear regression, where x is the independent variable and y is the dependent variable. |
| **Formula** | - |

###### `regr_slope(y, x)` {#docs:current:sql:functions:aggregates::regr_slopey-x}



|   |   |
|:--|:--------|
| **Description** |Returns the slope of the linear regression line, where x is the independent variable and y is the dependent variable. |
| **Formula** | `regr_sxy(y, x) / regr_sxx(y, x)` |
| **Alias(es)** | - |

###### `regr_sxx(y, x)` {#docs:current:sql:functions:aggregates::regr_sxxy-x}



|   |   |
|:--|:--------|
| **Description** |The sample variance, which includes Bessel's bias correction, of the independent variable for non-`NULL` pairs, where x is the independent variable and y is the dependent variable. |
| **Formula** | - |

###### `regr_sxy(y, x)` {#docs:current:sql:functions:aggregates::regr_sxyy-x}



|   |   |
|:--|:--------|
| **Description** |The sample covariance, which includes Bessel's bias correction. |
| **Formula** | `(sum(x*y) - sum(x) * sum(y) / regr_count(y, x)) / (regr_count(y, x) - 1)`, `covar_pop(y, x) / (1 - 1 / regr_count(y, x))` |
| **Alias(es)** | `covar_samp(y, x)` |

###### `regr_syy(y, x)` {#docs:current:sql:functions:aggregates::regr_syyy-x}



|   |   |
|:--|:--------|
| **Description** |The sample variance, which includes Bessel's bias correction, of the dependent variable for non-`NULL` pairs, where x is the independent variable and y is the dependent variable. |
| **Formula** | - |

###### `sem(x)` {#docs:current:sql:functions:aggregates::semx}



|   |   |
|:--|:--------|
| **Description** |The standard error of the mean. |
| **Formula** | - |

###### `skewness(x)` {#docs:current:sql:functions:aggregates::skewnessx}



|   |   |
|:--|:--------|
| **Description** |The skewness. |
| **Formula** | - |

###### `stddev_pop(x)` {#docs:current:sql:functions:aggregates::stddev_popx}



|   |   |
|:--|:--------|
| **Description** |The population standard deviation. |
| **Formula** | `sqrt(var_pop(x))` |

###### `stddev_samp(x)` {#docs:current:sql:functions:aggregates::stddev_sampx}



|   |   |
|:--|:--------|
| **Description** |The sample standard deviation. |
| **Formula** | `sqrt(var_samp(x))`|
| **Alias(es)** | `stddev(x)`|

###### `var_pop(x)` {#docs:current:sql:functions:aggregates::var_popx}



|   |   |
|:--|:--------|
| **Description** |The population variance, which does not include bias correction. |
| **Formula** | `(sum(x^2) - sum(x)^2 / count(x)) / count(x)`, `var_samp(y, x) * (1 - 1 / count(x))` |

###### `var_samp(x)` {#docs:current:sql:functions:aggregates::var_sampx}



|   |   |
|:--|:--------|
| **Description** |The sample variance, which includes Bessel's bias correction. |
| **Formula** | `(sum(x^2) - sum(x)^2 / count(x)) / (count(x) - 1)`, `var_pop(y, x) / (1 - 1 / count(x))` |
| **Alias(es)** | `variance(arg, val)` |

#### Ordered Set Aggregate Functions {#docs:current:sql:functions:aggregates::ordered-set-aggregate-functions}

The table below shows the available “ordered set” aggregate functions.
These functions are specified using the `WITHIN GROUP (ORDER BY sort_expression)` syntax,
and they are converted to an equivalent aggregate function that takes the ordering expression
as the first argument.

| Function | Equivalent |
|:---|:---|
| <code>mode() WITHIN GROUP (ORDER BY column [(ASC&#124;DESC)])</code> | <code>mode(column ORDER BY column [(ASC&#124;DESC)])</code> |
| <code>percentile_cont(fraction) WITHIN GROUP (ORDER BY column [(ASC&#124;DESC)])</code> | <code>quantile_cont(column, fraction ORDER BY column [(ASC&#124;DESC)])</code> |
| <code>percentile_cont(fractions) WITHIN GROUP (ORDER BY column [(ASC&#124;DESC)])</code> | <code>quantile_cont(column, fractions ORDER BY column [(ASC&#124;DESC)])</code> |
| <code>percentile_disc(fraction) WITHIN GROUP (ORDER BY column [(ASC&#124;DESC)])</code> | <code>quantile_disc(column, fraction ORDER BY column [(ASC&#124;DESC)])</code> |
| <code>percentile_disc(fractions) WITHIN GROUP (ORDER BY column [(ASC&#124;DESC)])</code> | <code>quantile_disc(column, fractions ORDER BY column [(ASC&#124;DESC)])</code> |

#### Miscellaneous Aggregate Functions {#docs:current:sql:functions:aggregates::miscellaneous-aggregate-functions}

| Function | Description | Alias |
|:--|:---|:--|
| `grouping()` | For queries with `GROUP BY` and either [`ROLLUP` or `GROUPING SETS`](#docs:current:sql:query_syntax:grouping_sets::identifying-grouping-sets-with-grouping_id): Returns an integer identifying which of the argument expressions were used to group on to create the current super-aggregate row. | `grouping_id()` |

### Array Functions {#docs:current:sql:functions:array}



All [`LIST` functions](#docs:current:sql:functions:list) work with the [`ARRAY` data type](#docs:current:sql:data_types:array). Additionally, several `ARRAY`-native functions are also supported.

#### Array-Native Functions {#docs:current:sql:functions:array::array-native-functions}



| Function | Description |
|:--|:-------|
| [`array_cosine_distance(array1, array2)`](#::array_cosine_distancearray1-array2) | Computes the cosine distance between two arrays of the same size. The array elements cannot be `NULL`. The arrays can have any size as long as the size is the same for both arguments. |
| [`array_cosine_similarity(array1, array2)`](#::array_cosine_similarityarray1-array2) | Computes the cosine similarity between two arrays of the same size. The array elements cannot be `NULL`. The arrays can have any size as long as the size is the same for both arguments. |
| [`array_cross_product(array, array)`](#::array_cross_productarray-array) | Computes the cross product of two arrays of size 3. The array elements cannot be `NULL`. |
| [`array_distance(array1, array2)`](#::array_distancearray1-array2) | Computes the distance between two arrays of the same size. The array elements cannot be `NULL`. The arrays can have any size as long as the size is the same for both arguments. |
| [`array_dot_product(array1, array2)`](#::array_inner_productarray1-array2) | Alias for `array_inner_product`. |
| [`array_inner_product(array1, array2)`](#::array_inner_productarray1-array2) | Computes the inner product between two arrays of the same size. The array elements cannot be `NULL`. The arrays can have any size as long as the size is the same for both arguments. |
| [`array_negative_dot_product(array1, array2)`](#::array_negative_inner_productarray1-array2) | Alias for `array_negative_inner_product`. |
| [`array_negative_inner_product(array1, array2)`](#::array_negative_inner_productarray1-array2) | Computes the negative inner product between two arrays of the same size. The array elements cannot be `NULL`. The arrays can have any size as long as the size is the same for both arguments. |
| [`array_value(arg, ...)`](#::array_valuearg-) | Creates an `ARRAY` containing the argument values. |



###### `array_cosine_distance(array1, array2)` {#docs:current:sql:functions:array::array_cosine_distancearray1-array2}



|   |   |
|:--|:--------|
| **Description** |Computes the cosine distance between two arrays of the same size. The array elements cannot be `NULL`. The arrays can have any size as long as the size is the same for both arguments. |
| **Example** | `array_cosine_distance(array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT), array_value(2.0::FLOAT, 3.0::FLOAT, 4.0::FLOAT))` |
| **Result** | `0.007416606` |

###### `array_cosine_similarity(array1, array2)` {#docs:current:sql:functions:array::array_cosine_similarityarray1-array2}



|   |   |
|:--|:--------|
| **Description** |Computes the cosine similarity between two arrays of the same size. The array elements cannot be `NULL`. The arrays can have any size as long as the size is the same for both arguments. |
| **Example** | `array_cosine_similarity(array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT), array_value(2.0::FLOAT, 3.0::FLOAT, 4.0::FLOAT))` |
| **Result** | `0.9925834` |

###### `array_cross_product(array, array)` {#docs:current:sql:functions:array::array_cross_productarray-array}



|   |   |
|:--|:--------|
| **Description** |Computes the cross product of two arrays of size 3. The array elements cannot be `NULL`. |
| **Example** | `array_cross_product(array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT), array_value(2.0::FLOAT, 3.0::FLOAT, 4.0::FLOAT))` |
| **Result** | `[-1.0, 2.0, -1.0]` |

###### `array_distance(array1, array2)` {#docs:current:sql:functions:array::array_distancearray1-array2}



|   |   |
|:--|:--------|
| **Description** |Computes the distance between two arrays of the same size. The array elements cannot be `NULL`. The arrays can have any size as long as the size is the same for both arguments. |
| **Example** | `array_distance(array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT), array_value(2.0::FLOAT, 3.0::FLOAT, 4.0::FLOAT))` |
| **Result** | `1.7320508` |

###### `array_inner_product(array1, array2)` {#docs:current:sql:functions:array::array_inner_productarray1-array2}



|   |   |
|:--|:--------|
| **Description** |Computes the inner product between two arrays of the same size. The array elements cannot be `NULL`. The arrays can have any size as long as the size is the same for both arguments. |
| **Example** | `array_inner_product(array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT), array_value(2.0::FLOAT, 3.0::FLOAT, 4.0::FLOAT))` |
| **Result** | `20.0` |
| **Alias** | `array_dot_product` |

###### `array_negative_inner_product(array1, array2)` {#docs:current:sql:functions:array::array_negative_inner_productarray1-array2}



|   |   |
|:--|:--------|
| **Description** |Computes the negative inner product between two arrays of the same size. The array elements cannot be `NULL`. The arrays can have any size as long as the size is the same for both arguments. |
| **Example** | `array_negative_inner_product(array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT), array_value(2.0::FLOAT, 3.0::FLOAT, 4.0::FLOAT))` |
| **Result** | `-20.0` |
| **Alias** | `array_negative_dot_product` |

###### `array_value(arg, ...)` {#docs:current:sql:functions:array::array_valuearg-}



|   |   |
|:--|:--------|
| **Description** |Creates an `ARRAY` containing the argument values. |
| **Example** | `array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT)` |
| **Result** | `[1.0, 2.0, 3.0]` |


### Bitstring Functions {#docs:current:sql:functions:bitstring}



This section describes functions and operators for examining and manipulating [`BITSTRING`](#docs:current:sql:data_types:bitstring) values.
Bitstrings must be of equal length when performing the bitwise operands AND, OR and XOR. When bit shifting, the original length of the string is preserved.

#### Bitstring Operators {#docs:current:sql:functions:bitstring::bitstring-operators}

The table below shows the available mathematical operators for `BIT` type.



| Operator | Description | Example | Result |
|:---|:---|:---|---:|
| `&` | Bitwise AND | `'10101'::BITSTRING & '10001'::BITSTRING` | `10001` |
| `|` | Bitwise OR | `'1011'::BITSTRING | '0001'::BITSTRING` | `1011` |
| `xor` | Bitwise XOR | `xor('101'::BITSTRING, '001'::BITSTRING)` | `100` |
| `~` | Bitwise NOT | `~('101'::BITSTRING)` | `010` |
| `<<` | Bitwise shift left | `'1001011'::BITSTRING << 3` | `1011000` |
| `>>` | Bitwise shift right | `'1001011'::BITSTRING >> 3` | `0001001` |



#### Bitstring Functions {#docs:current:sql:functions:bitstring::bitstring-functions}

The table below shows the available scalar functions for `BIT` type.

| Name | Description |
|:--|:-------|
| [`bit_count(bitstring)`](#::bit_countbitstring) | Returns the number of set bits in the bitstring. |
| [`bit_length(bitstring)`](#::bit_lengthbitstring) | Returns the number of bits in the bitstring. |
| [`bit_position(substring, bitstring)`](#::bit_positionsubstring-bitstring) | Returns first starting index of the specified substring within bits, or zero if it's not present. The first (leftmost) bit is indexed 1. |
| [`bitstring(bitstring, length)`](#::bitstringbitstring-length) | Returns a bitstring of determined length. |
| [`get_bit(bitstring, index)`](#::get_bitbitstring-index) | Extracts the nth bit from bitstring; the first (leftmost) bit is indexed 0. |
| [`length(bitstring)`](#::lengthbitstring) | Alias for `bit_length`. |
| [`octet_length(bitstring)`](#::octet_lengthbitstring) | Returns the number of bytes in the bitstring. |
| [`set_bit(bitstring, index, new_value)`](#::set_bitbitstring-index-new_value) | Sets the nth bit in bitstring to newvalue; the first (leftmost) bit is indexed 0. Returns a new bitstring. |

###### `bit_count(bitstring)` {#docs:current:sql:functions:bitstring::bit_countbitstring}



|   |   |
|:--|:--------|
| **Description** |Returns the number of set bits in the bitstring. |
| **Example** | `bit_count('1101011'::BITSTRING)` |
| **Result** | `5` |

###### `bit_length(bitstring)` {#docs:current:sql:functions:bitstring::bit_lengthbitstring}



|   |   |
|:--|:--------|
| **Description** |Returns the number of bits in the bitstring. |
| **Example** | `bit_length('1101011'::BITSTRING)` |
| **Result** | `7` |

###### `bit_position(substring, bitstring)` {#docs:current:sql:functions:bitstring::bit_positionsubstring-bitstring}



|   |   |
|:--|:--------|
| **Description** |Returns first starting index of the specified substring within bits, or zero if it's not present. The first (leftmost) bit is indexed 1 |
| **Example** | `bit_position('010'::BITSTRING, '1110101'::BITSTRING)` |
| **Result** | `4` |

###### `bitstring(bitstring, length)` {#docs:current:sql:functions:bitstring::bitstringbitstring-length}



|   |   |
|:--|:--------|
| **Description** |Returns a bitstring of determined length. |
| **Example** | `bitstring('1010'::BITSTRING, 7)` |
| **Result** | `0001010` |

###### `get_bit(bitstring, index)` {#docs:current:sql:functions:bitstring::get_bitbitstring-index}



|   |   |
|:--|:--------|
| **Description** |Extracts the nth bit from bitstring; the first (leftmost) bit is indexed 0. |
| **Example** | `get_bit('0110010'::BITSTRING, 2)` |
| **Result** | `1` |

###### `length(bitstring)` {#docs:current:sql:functions:bitstring::lengthbitstring}



|   |   |
|:--|:--------|
| **Description** |Alias for `bit_length`. |
| **Example** | `length('1101011'::BITSTRING)` |
| **Result** | `7` |

###### `octet_length(bitstring)` {#docs:current:sql:functions:bitstring::octet_lengthbitstring}



|   |   |
|:--|:--------|
| **Description** |Returns the number of bytes in the bitstring. |
| **Example** | `octet_length('1101011'::BITSTRING)` |
| **Result** | `1` |

###### `set_bit(bitstring, index, new_value)` {#docs:current:sql:functions:bitstring::set_bitbitstring-index-new_value}



|   |   |
|:--|:--------|
| **Description** |Sets the nth bit in bitstring to newvalue; the first (leftmost) bit is indexed 0. Returns a new bitstring. |
| **Example** | `set_bit('0110010'::BITSTRING, 2, 0)` |
| **Result** | `0100010` |

#### Bitstring Aggregate Functions {#docs:current:sql:functions:bitstring::bitstring-aggregate-functions}

These aggregate functions are available for `BIT` type.

| Name | Description |
|:--|:-------|
| [`bit_and(arg)`](#::bit_andarg) | Returns the bitwise AND operation performed on all bitstrings in a given expression. |
| [`bit_or(arg)`](#::bit_orarg) | Returns the bitwise OR operation performed on all bitstrings in a given expression. |
| [`bit_xor(arg)`](#::bit_xorarg) | Returns the bitwise XOR operation performed on all bitstrings in a given expression. |
| [`bitstring_agg(arg)`](#::bitstring_aggarg) | Returns a bitstring with bits set for each distinct position defined in `arg`. |
| [`bitstring_agg(arg, min, max)`](#::bitstring_aggarg-min-max) | Returns a bitstring with bits set for each distinct position defined in `arg`. All positions must be within the range [`min`, `max`] or an `Out of Range Error` will be thrown. |

###### `bit_and(arg)` {#docs:current:sql:functions:bitstring::bit_andarg}



|   |   |
|:--|:--------|
| **Description** |Returns the bitwise AND operation performed on all bitstrings in a given expression. |
| **Example** | `bit_and(A)` |

###### `bit_or(arg)` {#docs:current:sql:functions:bitstring::bit_orarg}



|   |   |
|:--|:--------|
| **Description** |Returns the bitwise OR operation performed on all bitstrings in a given expression. |
| **Example** | `bit_or(A)` |

###### `bit_xor(arg)` {#docs:current:sql:functions:bitstring::bit_xorarg}



|   |   |
|:--|:--------|
| **Description** |Returns the bitwise XOR operation performed on all bitstrings in a given expression. |
| **Example** | `bit_xor(A)` |

###### `bitstring_agg(arg)` {#docs:current:sql:functions:bitstring::bitstring_aggarg}



|   |   |
|:--|:--------|
| **Description** |The `bitstring_agg` function takes any integer type as input and returns a bitstring with bits set for each distinct value. The left-most bit represents the smallest value in the column and the right-most bit the maximum value. If possible, the min and max are retrieved from the column statistics. Otherwise, it is also possible to provide the min and max values. |
| **Example** | `bitstring_agg(A)` |

> **Tip.** The combination of `bit_count` and `bitstring_agg` can be used as an alternative to `count(DISTINCT ...)`, with possible performance improvements in cases of low cardinality and dense values.

###### `bitstring_agg(arg, min, max)` {#docs:current:sql:functions:bitstring::bitstring_aggarg-min-max}



|   |   |
|:--|:--------|
| **Description** |Returns a bitstring with bits set for each distinct position defined in `arg`. All positions must be within the range [`min`, `max`] or an `Out of Range Error` will be thrown. |
| **Example** | `bitstring_agg(A, 1, 42)` |

### Blob Functions {#docs:current:sql:functions:blob}



This section describes functions and operators for examining and manipulating [`BLOB` values](#docs:current:sql:data_types:blob).




| Function | Description |
|:--|:-------|
| [`arg1 || arg2`](#::arg1--arg2) | Concatenates two strings, lists, or blobs. Any `NULL` input results in `NULL`. See also [`concat(arg1, arg2, ...)`](#docs:current:sql:functions:text::concatvalue-) and [`list_concat(list1, list2, ...)`](#docs:current:sql:functions:list::list_concatlist_1--list_n). |
| [`base64(blob)`](#::to_base64blob) | Alias for `to_base64`. |
| [`decode(blob[, on_error])`](#decodeblob-on_error) | Converts `blob` to `VARCHAR`. The optional `on_error` parameter controls handling of invalid UTF-8: `'strict'` (default, throws error), `'replace'` (replaces invalid characters with `?`), or `'ignore'` (removes invalid characters). |
| [`encode(string)`](#::encodestring) | Converts the `string` to `BLOB`. Converts UTF-8 characters into literal encoding. |
| [`from_base64(string)`](#::from_base64string) | Converts a base64 encoded `string` to a character string (` BLOB`). |
| [`from_binary(value)`](#::unbinvalue) | Alias for `unbin`. |
| [`from_hex(value)`](#::unhexvalue) | Alias for `unhex`. |
| [`hex(blob)`](#::hexblob) | Converts `blob` to `VARCHAR` using hexadecimal encoding. |
| [`md5(blob)`](#::md5blob) | Returns the MD5 hash of the `blob` as a `VARCHAR`. |
| [`md5_number(blob)`](#::md5_numberblob) | Returns the MD5 hash of the `blob` as a `HUGEINT`. |
| [`octet_length(blob)`](#::octet_lengthblob) | Number of bytes in `blob`. |
| [`read_blob(source)`](#::read_blobsource) | Returns the content from `source` (a filename, a list of filenames, or a glob pattern) as a `BLOB`. See the [`read_blob` guide](#docs:current:guides:file_formats:read_file::read_blob) for more details. |
| [`repeat(blob, count)`](#::repeatblob-count) | Repeats the `blob` `count` number of times. |
| [`sha1(blob)`](#::sha1blob) | Returns a `VARCHAR` with the SHA-1 hash of the `blob`. |
| [`sha256(blob)`](#::sha256blob) | Returns a `VARCHAR` with the SHA-256 hash of the `blob`. |
| [`to_base64(blob)`](#::to_base64blob) | Converts a `blob` to a base64 encoded string. |
| [`to_hex(blob)`](#::hexblob) | Alias for `hex`. |
| [`unbin(value)`](#::unbinvalue) | Converts a `value` from binary representation to a blob. |
| [`unhex(value)`](#::unhexvalue) | Converts a `value` from hexadecimal representation to a blob. |



###### `arg1 || arg2` {#docs:current:sql:functions:blob::arg1--arg2}



|   |   |
|:--|:--------|
| **Description** |Concatenates two strings, lists, or blobs. Any `NULL` input results in `NULL`. See also [`concat(arg1, arg2, ...)`](#docs:current:sql:functions:text::concatvalue-) and [`list_concat(list1, list2, ...)`](#docs:current:sql:functions:list::list_concatlist_1--list_n). |
| **Example 1** | `'Duck' || 'DB'` |
| **Result** | `DuckDB` |
| **Example 2** | `[1, 2, 3] || [4, 5, 6]` |
| **Result** | `[1, 2, 3, 4, 5, 6]` |
| **Example 3** | `'\xAA'::BLOB || '\xBB'::BLOB` |
| **Result** | `\xAA\xBB` |

###### `decode(blob[, on_error])` {#docs:current:sql:functions:blob::decodeblob-on_error}



|   |   |
|:--|:--------|
| **Description** |Converts `blob` to `VARCHAR`. The optional `on_error` parameter controls handling of invalid UTF-8: `'strict'` (default, throws error), `'replace'` (replaces invalid characters with `?`), or `'ignore'` (removes invalid characters). |
| **Example** | `decode('\xC3\xBC'::BLOB)` |
| **Result** | `ü` |
| **Example** | `decode('\xAA'::BLOB, 'replace')` |
| **Result** | `?` |

###### `encode(string)` {#docs:current:sql:functions:blob::encodestring}



|   |   |
|:--|:--------|
| **Description** |Converts the `string` to `BLOB`. Converts UTF-8 characters into literal encoding. |
| **Example** | `encode('my_string_with_ü')` |
| **Result** | `my_string_with_\xC3\xBC` |

###### `from_base64(string)` {#docs:current:sql:functions:blob::from_base64string}



|   |   |
|:--|:--------|
| **Description** |Converts a base64 encoded `string` to a character string (` BLOB`). |
| **Example** | `from_base64('QQ==')` |
| **Result** | `A` |

###### `hex(blob)` {#docs:current:sql:functions:blob::hexblob}



|   |   |
|:--|:--------|
| **Description** |Converts `blob` to `VARCHAR` using hexadecimal encoding. |
| **Example** | `hex('\xAA\xBB'::BLOB)` |
| **Result** | `AABB` |
| **Alias** | `to_hex` |

###### `md5(blob)` {#docs:current:sql:functions:blob::md5blob}



|   |   |
|:--|:--------|
| **Description** |Returns the MD5 hash of the `blob` as a `VARCHAR`. |
| **Example** | `md5('\xAA\xBB'::BLOB)` |
| **Result** | `58cea1f6b2b06520613e09af90dc1c47` |

###### `md5_number(blob)` {#docs:current:sql:functions:blob::md5_numberblob}



|   |   |
|:--|:--------|
| **Description** |Returns the MD5 hash of the `blob` as a `HUGEINT`. |
| **Example** | `md5_number('\xAA\xBB'::BLOB)` |
| **Result** | `94525045605907259200829535064523132504` |

###### `octet_length(blob)` {#docs:current:sql:functions:blob::octet_lengthblob}



|   |   |
|:--|:--------|
| **Description** |Number of bytes in `blob`. |
| **Example** | `octet_length('\xAA\xBB'::BLOB)` |
| **Result** | `2` |

###### `read_blob(source)` {#docs:current:sql:functions:blob::read_blobsource}



|   |   |
|:--|:--------|
| **Description** |Returns the content from `source` (a filename, a list of filenames, or a glob pattern) as a `BLOB`. See the [`read_blob` guide](#docs:current:guides:file_formats:read_file::read_blob) for more details. |
| **Example** | `read_blob('hello.bin')` |
| **Result** | `hello\x0A` |

###### `repeat(blob, count)` {#docs:current:sql:functions:blob::repeatblob-count}



|   |   |
|:--|:--------|
| **Description** |Repeats the `blob` `count` number of times. |
| **Example** | `repeat('\xAA\xBB'::BLOB, 5)` |
| **Result** | `\xAA\xBB\xAA\xBB\xAA\xBB\xAA\xBB\xAA\xBB` |

###### `sha1(blob)` {#docs:current:sql:functions:blob::sha1blob}



|   |   |
|:--|:--------|
| **Description** |Returns a `VARCHAR` with the SHA-1 hash of the `blob`. |
| **Example** | `sha1('\xAA\xBB'::BLOB)` |
| **Result** | `65b1e351a6cbfeb41c927222bc9ef53aad3396b0` |

###### `sha256(blob)` {#docs:current:sql:functions:blob::sha256blob}



|   |   |
|:--|:--------|
| **Description** |Returns a `VARCHAR` with the SHA-256 hash of the `blob`. |
| **Example** | `sha256('\xAA\xBB'::BLOB)` |
| **Result** | `d798d1fac6bd4bb1c11f50312760351013379a0ab6f0a8c0af8a506b96b2525a` |

###### `to_base64(blob)` {#docs:current:sql:functions:blob::to_base64blob}



|   |   |
|:--|:--------|
| **Description** |Converts a `blob` to a base64 encoded string. |
| **Example** | `to_base64('A'::BLOB)` |
| **Result** | `QQ==` |
| **Alias** | `base64` |

###### `unbin(value)` {#docs:current:sql:functions:blob::unbinvalue}



|   |   |
|:--|:--------|
| **Description** |Converts a `value` from binary representation to a blob. |
| **Example** | `unbin('0110')` |
| **Result** | `\x06` |
| **Alias** | `from_binary` |

###### `unhex(value)` {#docs:current:sql:functions:blob::unhexvalue}



|   |   |
|:--|:--------|
| **Description** |Converts a `value` from hexadecimal representation to a blob. |
| **Example** | `unhex('2A')` |
| **Result** | `*` |
| **Alias** | `from_hex` |


### Date Format Functions {#docs:current:sql:functions:dateformat}

The `strftime` and `strptime` functions can be used to convert between [`DATE`](#docs:current:sql:data_types:date) / [`TIMESTAMP`](#docs:current:sql:data_types:timestamp) values and strings. This is often required when parsing CSV files, displaying output to the user or transferring information between programs. Because there are many possible date representations, these functions accept a [format string](#::format-specifiers) that describes how the date or timestamp should be structured.

#### `strftime` Examples {#docs:current:sql:functions:dateformat::strftime-examples}

The [`strftime(timestamp, format)`](#docs:current:sql:functions:timestamp::strftimetimestamp-format) converts timestamps or dates to strings according to the specified pattern.

```sql
SELECT strftime(DATE '1992-03-02', '%d/%m/%Y');
```

```text
02/03/1992
```

```sql
SELECT strftime(TIMESTAMP '1992-03-02 20:32:45', '%A, %-d %B %Y - %I:%M:%S %p');
```

```text
Monday, 2 March 1992 - 08:32:45 PM
```

#### `strptime` Examples {#docs:current:sql:functions:dateformat::strptime-examples}

The [`strptime(text, format)` function](#docs:current:sql:functions:timestamp::strptimetext-format) converts strings to timestamps according to the specified pattern.

```sql
SELECT strptime('02/03/1992', '%d/%m/%Y');
```

```text
1992-03-02 00:00:00
```

```sql
SELECT strptime('Monday, 2 March 1992 - 08:32:45 PM', '%A, %-d %B %Y - %I:%M:%S %p');
```

```text
1992-03-02 20:32:45
```

The `strptime` function throws an error on failure:

```sql
SELECT strptime('02/50/1992', '%d/%m/%Y') AS x;
```

```console
Invalid Input Error: Could not parse string "02/50/1992" according to format specifier "%d/%m/%Y"
02/50/1992
   ^
Error: Month out of range, expected a value between 1 and 12
```

To return `NULL` on failure, use the [`try_strptime` function](#docs:current:sql:functions:timestamp::try_strptimetext-format):

```text
NULL
```

#### CSV Parsing {#docs:current:sql:functions:dateformat::csv-parsing}

The date formats can also be specified during CSV parsing, either in the [`COPY` statement](#docs:current:sql:statements:copy) or in the `read_csv` function. This can be done by either specifying a `DATEFORMAT` or a `TIMESTAMPFORMAT` (or both). `DATEFORMAT` will be used for converting dates, and `TIMESTAMPFORMAT` will be used for converting timestamps. Below are some examples for how to use this.

In a `COPY` statement:

```sql
COPY dates FROM 'test.csv' (DATEFORMAT '%d/%m/%Y', TIMESTAMPFORMAT '%A, %-d %B %Y - %I:%M:%S %p');
```

In a `read_csv` function:

```sql
SELECT *
FROM read_csv('test.csv', dateformat = '%m/%d/%Y', timestampformat = '%A, %-d %B %Y - %I:%M:%S %p');
```

#### Format Specifiers {#docs:current:sql:functions:dateformat::format-specifiers}

Below is a full list of all available format specifiers.

| Specifier | Description | Example |
|:-|:------|:---|
| `%a` | Abbreviated weekday name. | Sun, Mon, ... |
| `%A` | Full weekday name. | Sunday, Monday, ... |
| `%b` | Abbreviated month name. | Jan, Feb, ..., Dec |
| `%B` | Full month name. | January, February, ... |
| `%c` | ISO date and time representation | 1992-03-02 10:30:20 |
| `%d` | Day of the month as a zero-padded decimal. | 01, 02, ..., 31 |
| `%-d` | Day of the month as a decimal number. | 1, 2, ..., 30 |
| `%f` | Microsecond as a decimal number, zero-padded on the left. | 000000 - 999999 |
| `%g` | Millisecond as a decimal number, zero-padded on the left. | 000 - 999 |
| `%G` | ISO 8601 year with century representing the year that contains the greater part of the ISO week (see `%V`). | 0001, 0002, ..., 2013, 2014, ..., 9998, 9999 |
| `%H` | Hour (24-hour clock) as a zero-padded decimal number. | 00, 01, ..., 23 |
| `%-H` | Hour (24-hour clock) as a decimal number. | 0, 1, ..., 23 |
| `%I` | Hour (12-hour clock) as a zero-padded decimal number. | 01, 02, ..., 12 |
| `%-I` | Hour (12-hour clock) as a decimal number. | 1, 2, ... 12 |
| `%j` | Day of the year as a zero-padded decimal number. | 001, 002, ..., 366 |
| `%-j` | Day of the year as a decimal number. | 1, 2, ..., 366 |
| `%m` | Month as a zero-padded decimal number. | 01, 02, ..., 12 |
| `%-m` | Month as a decimal number. | 1, 2, ..., 12 |
| `%M` | Minute as a zero-padded decimal number. | 00, 01, ..., 59 |
| `%-M` | Minute as a decimal number. | 0, 1, ..., 59 |
| `%n` | Nanosecond as a decimal number, zero-padded on the left. | 000000000 - 999999999 |
| `%p` | Locale's AM or PM. | AM, PM |
| `%S` | Second as a zero-padded decimal number. | 00, 01, ..., 59 |
| `%-S` | Second as a decimal number. | 0, 1, ..., 59 |
| `%u` | ISO 8601 weekday as a decimal number where 1 is Monday. | 1, 2, ..., 7 |
| `%U` | Week number of the year. Week 01 starts on the first Sunday of the year, so there can be week 00. Note that this is not compliant with the week date standard in ISO-8601. | 00, 01, ..., 53 |
| `%V` | ISO 8601 week as a decimal number with Monday as the first day of the week. Week 01 is the week containing Jan 4. Note that `%V` is incompatible with year directive `%Y`. Use the ISO year `%G` instead. | 01, ..., 53 |
| `%w` | Weekday as a decimal number. | 0, 1, ..., 6 |
| `%W` | Week number of the year. Week 01 starts on the first Monday of the year, so there can be week 00. Note that this is not compliant with the week date standard in ISO-8601. | 00, 01, ..., 53 |
| `%x` | ISO date representation | 1992-03-02 |
| `%X` | ISO time representation | 10:30:20 |
| `%y` | Year without century as a zero-padded decimal number. Numbers 00 to 68 are turned into 2000 to 2068. Numbers 69 to 99 are turned into 1969 to 1999. | 00, 01, ..., 99 |
| `%-y` | Year without century as a decimal number. Numbers 0 to 68 are turned into 2000 to 2068. Numbers 69 to 99 are turned into 1969 to 1999. | 0, 1, ..., 99 |
| `%Y` | Year with century as a decimal number. | 2013, 2019 etc. |
| `%z` | [Time offset from UTC](https://en.wikipedia.org/wiki/ISO_8601#Time_offsets_from_UTC) in the form ±HH:MM, ±HHMM, or ±HH. | -0700 |
| `%Z` | Time zone name. | Europe/Amsterdam  |
| `%%` | A literal `%` character. | % |

### Date Functions {#docs:current:sql:functions:date}



This section describes functions and operators for examining and manipulating [`DATE`](#docs:current:sql:data_types:date) values.

#### Date Operators {#docs:current:sql:functions:date::date-operators}

The table below shows the available mathematical operators for `DATE` types.

| Operator | Description                          | Example                                                                                                       | Result                                          |
| :------- | :----------------------------------- | :------------------------------------------------------------------------------------------------------------ | :---------------------------------------------- |
| `+`      | addition of days (integers)          | `DATE '1992-03-22' + 5`{:.language-sql .highlight}                                                            | `1992-03-27`                                    |
| `+`      | addition of AN `INTERVAL`            | `DATE '1992-03-22' + INTERVAL 5 DAY`{:.language-sql .highlight}                                               | `1992-03-27 00:00:00`                           |
| `+`      | addition of a variable `INTERVAL`    | `SELECT DATE '1992-03-22' + INTERVAL (d.days) DAY FROM (VALUES (5), (11)) d(days)`{:.language-sql .highlight} | `1992-03-27 00:00:00` and `1992-04-02 00:00:00` |
| `-`      | subtraction of `DATE`s               | `DATE '1992-03-27' - DATE '1992-03-22'`{:.language-sql .highlight}                                            | `5`                                             |
| `-`      | subtraction of an `INTERVAL`         | `DATE '1992-03-27' - INTERVAL 5 DAY`{:.language-sql .highlight}                                               | `1992-03-22 00:00:00`                           |
| `-`      | subtraction of a variable `INTERVAL` | `SELECT DATE '1992-03-27' - INTERVAL (d.days) DAY FROM (VALUES (5), (11)) d(days)`{:.language-sql .highlight} | `1992-03-22 00:00:00` and `1992-03-16 00:00:00` |

Adding to or subtracting from [infinite values](#docs:current:sql:data_types:date::special-values) produces the same infinite value.

#### Date Functions {#docs:current:sql:functions:date::date-functions}

The table below shows the available functions for `DATE` types.
Dates can also be manipulated with the [timestamp functions](#docs:current:sql:functions:timestamp) through type promotion.

| Name                                                                                | Description                                                                                                                                                                                                                                                 |
| :---------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [`date_add(date, interval)`](#::date_adddate-interval)                                | Add the interval to the date and return a `DATETIME` value.                                                                                                                                                                                                 |
| [`date_diff(part, startdate, enddate)`](#::date_diffpart-startdate-enddate)           | The number of [`part`](#docs:current:sql:functions:datepart) boundaries between `startdate` and `enddate`, inclusive of the larger date and exclusive of the smaller date.                                                                     |
| [`date_part(part, date)`](#::date_partpart-date)                                      | Get [subfield](#docs:current:sql:functions:datepart) (equivalent to `extract`).                                                                                                                                                                |
| [`date_sub(part, startdate, enddate)`](#::date_subpart-startdate-enddate)             | The signed length of the interval between `startdate` and `enddate`, truncated to whole multiples of [`part`](#docs:current:sql:functions:datepart).                                                                                           |
| [`date_trunc(part, date)`](#::date_truncpart-date)                                    | Truncate to specified [precision](#docs:current:sql:functions:datepart).                                                                                                                                                                       |
| [`dayname(date)`](#::daynamedate)                                                     | The (English) name of the weekday.                                                                                                                                                                                                                          |
| [`days_in_month(date)`](#::days_in_monthdate)                                         | The number of days in the month of the given date.                                                                                                                                                                                                          |
| [`extract(part from date)`](#::extractpart-from-date)                                 | Get [subfield](#docs:current:sql:functions:datepart) from a date.                                                                                                                                                                              |
| [`greatest(date, date)`](#::greatestdate-date)                                        | The later of two dates.                                                                                                                                                                                                                                     |
| [`isfinite(date)`](#::isfinitedate)                                                   | Returns true if the date is finite, false otherwise.                                                                                                                                                                                                        |
| [`isinf(date)`](#::isinfdate)                                                         | Returns true if the date is infinite, false otherwise.                                                                                                                                                                                                      |
| [`julian(date)`](#::juliandate)                                                       | Extract the Julian Day number from a date.                                                                                                                                                                                                                  |
| [`last_day(date)`](#::last_daydate)                                                   | The last day of the corresponding month in the date.                                                                                                                                                                                                        |
| [`least(date, date)`](#::leastdate-date)                                              | The earlier of two dates.                                                                                                                                                                                                                                   |
| [`make_date(year, month, day)`](#::make_dateyear-month-day)                           | The date for the given parts.                                                                                                                                                                                                                               |
| [`monthname(date)`](#::monthnamedate)                                                 | The (English) name of the month.                                                                                                                                                                                                                            |
| [`strftime(date, format)`](#::strftimedate-format)                                    | Converts a date to a string according to the [format string](#docs:current:sql:functions:dateformat).                                                                                                                                          |
| [`time_bucket(bucket_width, date[, offset])`](#time_bucketbucket_width-date-offset) | Truncate `date` to a grid of width `bucket_width`. The grid is anchored at `2000-01-01[ + offset]` when `bucket_width` is a number of months or coarser units, else `2000-01-03[ + offset]`. Note that `2000-01-03` is a Monday.                            |
| [`time_bucket(bucket_width, date[, origin])`](#time_bucketbucket_width-date-origin) | Truncate `timestamptz` to a grid of width `bucket_width`. The grid is anchored at the `origin` timestamp, which defaults to `2000-01-01` when `bucket_width` is a number of months or coarser units, else `2000-01-03`. Note that `2000-01-03` is a Monday. |
| [`today()`](#::today)                                                                 | Current date (start of current transaction) in the local time zone.                                                                                                                                                                                         |

###### `date_add(date, interval)` {#docs:current:sql:functions:date::date_adddate-interval}



|   |   |
|:--|:--------|
| **Description** |Add the interval to the date and return a `DATETIME` value. |
| **Example** | `date_add(DATE '1992-09-15', INTERVAL 2 MONTH)` |
| **Result** | `1992-11-15 00:00:00` |

###### `date_diff(part, startdate, enddate)` {#docs:current:sql:functions:date::date_diffpart-startdate-enddate}



|   |   |
|:--|:--------|
| **Description** |The number of [`part`](#docs:current:sql:functions:datepart) boundaries between `startdate` and `enddate`, inclusive of the larger date and exclusive of the smaller date. |
| **Example** | `date_diff('month', DATE '1992-09-15', DATE '1992-11-14')` |
| **Result** | `2` |
| **Alias** | `datediff` |

###### `date_part(part, date)` {#docs:current:sql:functions:date::date_partpart-date}



|   |   |
|:--|:--------|
| **Description** |Get the [subfield](#docs:current:sql:functions:datepart) (equivalent to `extract`). |
| **Example** | `date_part('year', DATE '1992-09-20')` |
| **Result** | `1992` |
| **Alias** | `datepart` |

###### `date_sub(part, startdate, enddate)` {#docs:current:sql:functions:date::date_subpart-startdate-enddate}



|   |   |
|:--|:--------|
| **Description** |The signed length of the interval between `startdate` and `enddate`, truncated to whole multiples of [`part`](#docs:current:sql:functions:datepart). |
| **Example** | `date_sub('month', DATE '1992-09-15', DATE '1992-11-14')` |
| **Result** | `1` |
| **Alias** | `datesub` |

###### `date_trunc(part, date)` {#docs:current:sql:functions:date::date_truncpart-date}



|   |   |
|:--|:--------|
| **Description** |Truncate to specified [precision](#docs:current:sql:functions:datepart). Always returns a `TIMESTAMP`, even when the input is a `DATE`. |
| **Example** | `date_trunc('month', DATE '1992-03-07')` |
| **Result** | `1992-03-01 00:00:00` |
| **Alias** | `datetrunc` |

###### `dayname(date)` {#docs:current:sql:functions:date::daynamedate}



|   |   |
|:--|:--------|
| **Description** |The (English) name of the weekday. |
| **Example** | `dayname(DATE '1992-09-20')` |
| **Result** | `Sunday` |

###### `days_in_month(date)` {#docs:current:sql:functions:date::days_in_monthdate}



|   |   |
|:--|:--------|
| **Description** |The number of days in the month of the given date. |
| **Example** | `days_in_month(DATE '1992-02-15')` |
| **Result** | `29` |

###### `extract(part from date)` {#docs:current:sql:functions:date::extractpart-from-date}



|   |   |
|:--|:--------|
| **Description** |Get [subfield](#docs:current:sql:functions:datepart) from a date. |
| **Example** | `extract('year' FROM DATE '1992-09-20')` |
| **Result** | `1992` |

###### `greatest(date, date)` {#docs:current:sql:functions:date::greatestdate-date}



|   |   |
|:--|:--------|
| **Description** |The later of two dates. |
| **Example** | `greatest(DATE '1992-09-20', DATE '1992-03-07')` |
| **Result** | `1992-09-20` |

###### `isfinite(date)` {#docs:current:sql:functions:date::isfinitedate}



|   |   |
|:--|:--------|
| **Description** |Returns `true` if the date is finite, false otherwise. |
| **Example** | `isfinite(DATE '1992-03-07')` |
| **Result** | `true` |

###### `isinf(date)` {#docs:current:sql:functions:date::isinfdate}



|   |   |
|:--|:--------|
| **Description** |Returns `true` if the date is infinite, false otherwise. |
| **Example** | `isinf(DATE '-infinity')` |
| **Result** | `true` |

###### `julian(date)` {#docs:current:sql:functions:date::juliandate}



|   |   |
|:--|:--------|
| **Description** |Extract the Julian Day number from a date. |
| **Example** | `julian(DATE '1992-09-20')` |
| **Result** | `2448886.0` |

###### `last_day(date)` {#docs:current:sql:functions:date::last_daydate}



|   |   |
|:--|:--------|
| **Description** |The last day of the corresponding month in the date. |
| **Example** | `last_day(DATE '1992-09-20')` |
| **Result** | `1992-09-30` |

###### `least(date, date)` {#docs:current:sql:functions:date::leastdate-date}



|   |   |
|:--|:--------|
| **Description** |The earlier of two dates. |
| **Example** | `least(DATE '1992-09-20', DATE '1992-03-07')` |
| **Result** | `1992-03-07` |

###### `make_date(year, month, day)` {#docs:current:sql:functions:date::make_dateyear-month-day}



|   |   |
|:--|:--------|
| **Description** |The date for the given parts. |
| **Example** | `make_date(1992, 9, 20)` |
| **Result** | `1992-09-20` |

###### `monthname(date)` {#docs:current:sql:functions:date::monthnamedate}



|   |   |
|:--|:--------|
| **Description** |The (English) name of the month. |
| **Example** | `monthname(DATE '1992-09-20')` |
| **Result** | `September` |

###### `strftime(date, format)` {#docs:current:sql:functions:date::strftimedate-format}



|   |   |
|:--|:--------|
| **Description** |Converts a date to a string according to the [format string](#docs:current:sql:functions:dateformat). |
| **Example** | `strftime(DATE '1992-01-01', '%a, %-d %B %Y')` |
| **Result** | `Wed, 1 January 1992` |

###### `time_bucket(bucket_width, date[, offset])` {#docs:current:sql:functions:date::time_bucketbucket_width-date-offset}



|   |   |
|:--|:--------|
| **Description** |Truncate `date` to a grid of width `bucket_width`. The grid is anchored at `2000-01-01[ + offset]` when `bucket_width` is a number of months or coarser units, else `2000-01-03[ + offset]`. Note that `2000-01-03` is a Monday. |
| **Example** | `time_bucket(INTERVAL '2 months', DATE '1992-04-20', INTERVAL '1 month')` |
| **Result** | `1992-04-01` |

###### `time_bucket(bucket_width, date[, origin])` {#docs:current:sql:functions:date::time_bucketbucket_width-date-origin}



|   |   |
|:--|:--------|
| **Description** |Truncate `timestamptz` to a grid of width `bucket_width`. The grid is anchored at the `origin` timestamp, which defaults to `2000-01-01` when `bucket_width` is a number of months or coarser units, else `2000-01-03`. Note that `2000-01-03` is a Monday. |
| **Example** | `time_bucket(INTERVAL '2 weeks', DATE '1992-04-20', DATE '1992-04-01')` |
| **Result** | `1992-04-15` |

###### `today()` {#docs:current:sql:functions:date::today}



|   |   |
|:--|:--------|
| **Description** |Current date (start of current transaction) in the local time zone. |
| **Example** | `today()` |
| **Result** | `2022-10-08` |
| **Alias** | `current_date` (no parentheses necessary) |

#### Date Part Extraction Functions {#docs:current:sql:functions:date::date-part-extraction-functions}

There are also dedicated extraction functions to get the [subfields](#docs:current:sql:functions:datepart::part-functions).
A few examples include extracting the day from a date, or the day of the week from a date.

Functions applied to infinite dates will either return the same infinite dates
(e.g., `greatest`) or `NULL` (e.g., `date_part`) depending on what “makes sense”.
In general, if the function needs to examine the parts of the infinite date, the result will be `NULL`.

### Date Part Functions {#docs:current:sql:functions:datepart}



The `date_part`, `date_trunc` and `date_diff` functions can be used to extract or manipulate parts of temporal types such as [`TIMESTAMP`](#docs:current:sql:data_types:timestamp), [`TIMESTAMPTZ`](#docs:current:sql:data_types:timestamp), [`DATE`](#docs:current:sql:data_types:date) and [`INTERVAL`](#docs:current:sql:data_types:interval).

The parts to be extracted or manipulated are specified by one of the strings in the tables below.
The example column provides the corresponding parts of the timestamp `2021-08-03 11:59:44.123456`.
Only the entries of the first table can be extracted from `INTERVAL`s or used to construct them.

> Except for `julian` and `epoch`, which return `DOUBLE`s, all parts are extracted as integers. Since there are no infinite integer values in DuckDB, `NULL`s are returned for infinite timestamps.

#### Part Specifiers Usable as Date Part Specifiers and in Intervals {#docs:current:sql:functions:datepart::part-specifiers-usable-as-date-part-specifiers-and-in-intervals}

| Specifier | Description | Synonyms | Example |
|:--|:--|:---|--:|
| `century` | Gregorian century | `cent`, `centuries`, `c` | `21` |
| `day` | Gregorian day | `days`, `d`, `dayofmonth` | `3` |
| `decade` | Gregorian decade | `dec`, `decades`, `decs` | `202` |
| `hour` | Hours | `hr`, `hours`, `hrs`, `h` | `11` |
| `microseconds` | Sub-minute microseconds | `microsecond`, `us`, `usec`, `usecs`, `usecond`, `useconds` | `44123456` |
| `millennium` | Gregorian millennium | `mil`, `millenniums`, `millenia`, `mils`, `millenium` | `3` |
| `milliseconds` | Sub-minute milliseconds | `millisecond`, `ms`, `msec`, `msecs`, `msecond`, `mseconds` | `44123` |
| `minute` | Minutes | `min`, `minutes`, `mins`, `m` | `59` |
| `month` | Gregorian month | `mon`, `months`, `mons` | `8` |
| `quarter` | Quarter of the year (1-4) | `quarters` | `3` |
| `second` | Seconds | `sec`, `seconds`, `secs`, `s` | `44` |
| `year` | Gregorian year | `yr`, `y`, `years`, `yrs` | `2021` |

#### Part Specifiers Only Usable as Date Part Specifiers {#docs:current:sql:functions:datepart::part-specifiers-only-usable-as-date-part-specifiers}

| Specifier | Description | Synonyms | Example |
|:--|:--|:---|--:|
| `dayofweek` | Day of the week (Sunday = 0, Saturday = 6) | `weekday`, `dow` | `2` |
| `dayofyear` | Day of the year (1-365/366) | `doy` | `215` |
| `epoch` | Seconds since 1970-01-01 | | `1760465850.6698709` |
| `era` | Gregorian era (CE/AD, BCE/BC) | | `1` |
| `isodow` | ISO day of the week (Monday = 1, Sunday = 7) | | `2` |
| `isoyear` | ISO Year number (Starts on Monday of week containing Jan 4th) | | `2021` |
| `julian` | Julian Day number. | | `2459430.4998162435` |
| `timezone_hour` | Time zone offset hour portion | | `0` |
| `timezone_minute` | Time zone offset minute portion | | `0` |
| `timezone` | Time zone offset in seconds | | `0` |
| `week` | Week number | `weeks`, `w` | `31` |
| `yearweek` | ISO year and week number in `YYYYWW` format | | `202131` |

Note that the time zone parts are all zero unless a time zone extension such as [ICU](#docs:current:core_extensions:icu)
has been installed to support `TIMESTAMP WITH TIME ZONE`.

#### Part Functions {#docs:current:sql:functions:datepart::part-functions}

There are dedicated extraction functions to get certain subfields:

| Name | Description |
|:--|:-------|
| [`century(date)`](#::centurydate) | Century. |
| [`day(date)`](#::daydate) | Day. |
| [`dayofmonth(date)`](#::dayofmonthdate) | Day (synonym). |
| [`dayofweek(date)`](#::dayofweekdate) | Numeric weekday (Sunday = 0, Saturday = 6). |
| [`dayofyear(date)`](#::dayofyeardate) | Day of the year (starts from 1, i.e., January 1 = 1). |
| [`decade(date)`](#::decadedate) | Decade (year / 10). |
| [`epoch(date)`](#::epochdate) | Seconds since 1970-01-01. |
| [`era(date)`](#::eradate) | Calendar era. |
| [`hour(date)`](#::hourdate) | Hours. |
| [`isodow(date)`](#::isodowdate) | Numeric ISO weekday (Monday = 1, Sunday = 7). |
| [`isoyear(date)`](#::isoyeardate) | ISO Year number (Starts on Monday of week containing Jan 4th). |
| [`julian(date)`](#::juliandate) | `DOUBLE` Julian Day number. |
| [`microsecond(date)`](#::microseconddate) | Sub-minute microseconds. |
| [`millennium(date)`](#::millenniumdate) | Millennium. |
| [`millisecond(date)`](#::milliseconddate) | Sub-minute milliseconds. |
| [`minute(date)`](#::minutedate) | Minutes. |
| [`month(date)`](#::monthdate) | Month. |
| [`quarter(date)`](#::quarterdate) | Quarter. |
| [`second(date)`](#::seconddate) | Seconds. |
| [`timezone_hour(date)`](#::timezone_hourdate) | Time zone offset hour portion. |
| [`timezone_minute(date)`](#::timezone_minutedate) | Time zone offset minutes portion. |
| [`timezone(date)`](#::timezonedate) | Time zone offset in seconds. |
| [`week(date)`](#::weekdate) | ISO Week. |
| [`weekday(date)`](#::weekdaydate) | Numeric weekday synonym (Sunday = 0, Saturday = 6). |
| [`weekofyear(date)`](#::weekofyeardate) | ISO Week (synonym). |
| [`year(date)`](#::yeardate) | Year. |
| [`yearweek(date)`](#::yearweekdate) | `BIGINT` of combined ISO Year number and 2-digit version of ISO Week number. |

###### `century(date)` {#docs:current:sql:functions:datepart::centurydate}



|   |   |
|:--|:--------|
| **Description** |Century. |
| **Example** | `century(DATE '1992-02-15')` |
| **Result** | `20` |

###### `day(date)` {#docs:current:sql:functions:datepart::daydate}



|   |   |
|:--|:--------|
| **Description** |Day. |
| **Example** | `day(DATE '1992-02-15')` |
| **Result** | `15` |

###### `dayofmonth(date)` {#docs:current:sql:functions:datepart::dayofmonthdate}



|   |   |
|:--|:--------|
| **Description** |Day (synonym). |
| **Example** | `dayofmonth(DATE '1992-02-15')` |
| **Result** | `15` |

###### `dayofweek(date)` {#docs:current:sql:functions:datepart::dayofweekdate}



|   |   |
|:--|:--------|
| **Description** |Numeric weekday (Sunday = 0, Saturday = 6). |
| **Example** | `dayofweek(DATE '1992-02-15')` |
| **Result** | `6` |

###### `dayofyear(date)` {#docs:current:sql:functions:datepart::dayofyeardate}



|   |   |
|:--|:--------|
| **Description** |Day of the year (starts from 1, i.e., January 1 = 1). |
| **Example** | `dayofyear(DATE '1992-02-15')` |
| **Result** | `46` |

###### `decade(date)` {#docs:current:sql:functions:datepart::decadedate}



|   |   |
|:--|:--------|
| **Description** |Decade (year / 10). |
| **Example** | `decade(DATE '1992-02-15')` |
| **Result** | `199` |

###### `epoch(date)` {#docs:current:sql:functions:datepart::epochdate}



|   |   |
|:--|:--------|
| **Description** |Seconds since 1970-01-01. |
| **Example** | `epoch(DATE '1992-02-15')` |
| **Result** | `698112000` |

###### `era(date)` {#docs:current:sql:functions:datepart::eradate}



|   |   |
|:--|:--------|
| **Description** |Calendar era. |
| **Example** | `era(DATE '0044-03-15 (BC)')` |
| **Result** | `0` |

###### `hour(date)` {#docs:current:sql:functions:datepart::hourdate}



|   |   |
|:--|:--------|
| **Description** |Hours. |
| **Example** | `hour(timestamp '2021-08-03 11:59:44.123456')` |
| **Result** | `11` |

###### `isodow(date)` {#docs:current:sql:functions:datepart::isodowdate}



|   |   |
|:--|:--------|
| **Description** |Numeric ISO weekday (Monday = 1, Sunday = 7). |
| **Example** | `isodow(DATE '1992-02-15')` |
| **Result** | `6` |

###### `isoyear(date)` {#docs:current:sql:functions:datepart::isoyeardate}



|   |   |
|:--|:--------|
| **Description** |ISO Year number (Starts on Monday of week containing Jan 4th). |
| **Example** | `isoyear(DATE '2022-01-01')` |
| **Result** | `2021` |


###### `julian(date)` {#docs:current:sql:functions:datepart::juliandate}



|   |   |
|:--|:--------|
| **Description** |`DOUBLE` Julian Day number. |
| **Example** | `julian(DATE '1992-09-20')` |
| **Result** | `2448886.0` |


###### `microsecond(date)` {#docs:current:sql:functions:datepart::microseconddate}



|   |   |
|:--|:--------|
| **Description** |Sub-minute microseconds. |
| **Example** | `microsecond(timestamp '2021-08-03 11:59:44.123456')` |
| **Result** | `44123456` |

###### `millennium(date)` {#docs:current:sql:functions:datepart::millenniumdate}



|   |   |
|:--|:--------|
| **Description** |Millennium. |
| **Example** | `millennium(DATE '1992-02-15')` |
| **Result** | `2` |

###### `millisecond(date)` {#docs:current:sql:functions:datepart::milliseconddate}



|   |   |
|:--|:--------|
| **Description** |Sub-minute milliseconds. |
| **Example** | `millisecond(timestamp '2021-08-03 11:59:44.123456')` |
| **Result** | `44123` |

###### `minute(date)` {#docs:current:sql:functions:datepart::minutedate}



|   |   |
|:--|:--------|
| **Description** |Minutes. |
| **Example** | `minute(timestamp '2021-08-03 11:59:44.123456')` |
| **Result** | `59` |

###### `month(date)` {#docs:current:sql:functions:datepart::monthdate}



|   |   |
|:--|:--------|
| **Description** |Month. |
| **Example** | `month(DATE '1992-02-15')` |
| **Result** | `2` |

###### `quarter(date)` {#docs:current:sql:functions:datepart::quarterdate}



|   |   |
|:--|:--------|
| **Description** |Quarter. |
| **Example** | `quarter(DATE '1992-02-15')` |
| **Result** | `1` |

###### `second(date)` {#docs:current:sql:functions:datepart::seconddate}



|   |   |
|:--|:--------|
| **Description** |Seconds. |
| **Example** | `second(timestamp '2021-08-03 11:59:44.123456')` |
| **Result** | `44` |

###### `timezone_hour(date)` {#docs:current:sql:functions:datepart::timezone_hourdate}



|   |   |
|:--|:--------|
| **Description** |Time zone offset hour portion. |
| **Example** | `timezone_hour(DATE '1992-02-15')` |
| **Result** | `0` |

###### `timezone_minute(date)` {#docs:current:sql:functions:datepart::timezone_minutedate}



|   |   |
|:--|:--------|
| **Description** |Time zone offset minutes portion. |
| **Example** | `timezone_minute(DATE '1992-02-15')` |
| **Result** | `0` |

###### `timezone(date)` {#docs:current:sql:functions:datepart::timezonedate}



|   |   |
|:--|:--------|
| **Description** |Time zone offset in minutes. |
| **Example** | `timezone(DATE '1992-02-15')` |
| **Result** | `0` |

###### `week(date)` {#docs:current:sql:functions:datepart::weekdate}



|   |   |
|:--|:--------|
| **Description** |ISO Week. |
| **Example** | `week(DATE '1992-02-15')` |
| **Result** | `7` |

###### `weekday(date)` {#docs:current:sql:functions:datepart::weekdaydate}



|   |   |
|:--|:--------|
| **Description** |Numeric weekday synonym (Sunday = 0, Saturday = 6). |
| **Example** | `weekday(DATE '1992-02-15')` |
| **Result** | `6` |

###### `weekofyear(date)` {#docs:current:sql:functions:datepart::weekofyeardate}



|   |   |
|:--|:--------|
| **Description** |ISO Week (synonym). |
| **Example** | `weekofyear(DATE '1992-02-15')` |
| **Result** | `7` |

###### `year(date)` {#docs:current:sql:functions:datepart::yeardate}



|   |   |
|:--|:--------|
| **Description** |Year. |
| **Example** | `year(DATE '1992-02-15')` |
| **Result** | `1992` |

###### `yearweek(date)` {#docs:current:sql:functions:datepart::yearweekdate}



|   |   |
|:--|:--------|
| **Description** |`BIGINT` of combined ISO Year number and 2-digit version of ISO Week number. |
| **Example** | `yearweek(DATE '1992-02-15')` |
| **Result** | `199207` |

### Enum Functions {#docs:current:sql:functions:enum}



This section describes functions and operators for examining and manipulating [`ENUM` values](#docs:current:sql:data_types:enum).
The examples assume an enum type created as:

```sql
CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy', 'anxious');
```

These functions can take `NULL` or a specific value of the type as argument(s).
With the exception of `enum_range_boundary`, the result depends only on the type of the argument and not on its value.

| Name | Description |
|:--|:-------|
| [`enum_code(enum_value)`](#::enum_codeenum_value) | Returns the numeric value backing the given enum value. |
| [`enum_first(enum)`](#::enum_firstenum) | Returns the first value of the input enum type. |
| [`enum_last(enum)`](#::enum_lastenum) | Returns the last value of the input enum type. |
| [`enum_range(enum)`](#::enum_rangeenum) | Returns all values of the input enum type as an array. |
| [`enum_range_boundary(enum, enum)`](#::enum_range_boundaryenum-enum) | Returns the range between the two given enum values as an array. |

###### `enum_code(enum_value)` {#docs:current:sql:functions:enum::enum_codeenum_value}



|   |   |
|:--|:--------|
| **Description** |Returns the numeric value backing the given enum value. |
| **Example** | `enum_code('happy'::mood)` |
| **Result** | `2` |

###### `enum_first(enum)` {#docs:current:sql:functions:enum::enum_firstenum}



|   |   |
|:--|:--------|
| **Description** |Returns the first value of the input enum type. |
| **Example** | `enum_first(NULL::mood)` |
| **Result** | `sad` |

###### `enum_last(enum)` {#docs:current:sql:functions:enum::enum_lastenum}



|   |   |
|:--|:--------|
| **Description** |Returns the last value of the input enum type. |
| **Example** | `enum_last(NULL::mood)` |
| **Result** | `anxious` |

###### `enum_range(enum)` {#docs:current:sql:functions:enum::enum_rangeenum}



|   |   |
|:--|:--------|
| **Description** |Returns all values of the input enum type as an array. |
| **Example** | `enum_range(NULL::mood)` |
| **Result** | `[sad, ok, happy, anxious]` |

###### `enum_range_boundary(enum, enum)` {#docs:current:sql:functions:enum::enum_range_boundaryenum-enum}



|   |   |
|:--|:--------|
| **Description** |Returns the range between the two given enum values as an array. The values must be of the same enum type. When the first parameter is `NULL`, the result starts with the first value of the enum type. When the second parameter is `NULL`, the result ends with the last value of the enum type. |
| **Example** | `enum_range_boundary(NULL, 'happy'::mood)` |
| **Result** | `[sad, ok, happy]` |

### Geometry Functions {#docs:current:sql:functions:geometry}

This section describes the functions for for examining and manipulating [`GEOMETRY`](#docs:current:sql:data_types:geometry) values.

> **Note.**: The `spatial` extension provides additional functions for working with `GEOMETRY` values, which are documented in the [Spatial Functions](#docs:current:core_extensions:spatial:functions) section.

#### Geometry Operators {#docs:current:sql:functions:geometry::geometry-operators}

The table below lists the operators that can be used with `GEOMETRY` values.

| Operator | Description | Example | Result |
|:-|:--|:---|:--|
| `&&` | Returns true if the geometries bounding boxes intersect. Equivalent to `ST_IntersectsExtent`. | `'POINT(5 5)'::GEOMETRY && 'LINESTRING(0 0, 10 20)'::GEOMETRY` | `true` |

#### Built-in Geometry Functions {#docs:current:sql:functions:geometry::built-in-geometry-functions}

| Name | Description |
|:-----|:------------|
| [`ST_GeomFromWKB`](#::st_geomfromwkb-function) | Creates a geometry from Well-Known Binary (WKB) representation |
| [`ST_AsWKB`](#::st_aswkb-function) | Returns the Well-Known Binary (WKB) representation of the geometry |
| [`ST_AsWKT`](#::st_aswkt-function) | Returns the Well-Known Text (WKT) representation of the geometry |
| [`ST_Intersects_Extent`](#::st_intersects_extent-function) | Returns true if the geometries bounding boxes intersect |
| [`ST_CRS`](#::st_crs-function) | Returns the Coordinate Reference System (CRS) identifier of the geometry |
| [`ST_SetCRS`](#::st_setcrs-function) | Sets the Coordinate Reference System (CRS) identifier of the geometry |

###### `ST_GeomFromWKB` function {#docs:current:sql:functions:geometry::st_geomfromwkb-function}



|   |   |
|:--|:--------|
| **Description** |Creates a geometry from Well-Known Binary (WKB) representation |
| **Example** | `ST_GeomFromWKB('\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\xF0?\x00\x00\x00\x00\x00\x00\x00@')` |
| **Result** | `POINT(1 2)` |

###### `ST_AsWKB` function {#docs:current:sql:functions:geometry::st_aswkb-function}



|   |   |
|:--|:--------|
| **Description** |Returns the Well-Known Binary (WKB) representation of the geometry |
| **Example** | `ST_AsWKB('POINT(1 2)::GEOMETRY')` |
| **Result** | `\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\xF0?\x00\x00\x00\x00\x00\x00\x00@` |
| **Alias** | `ST_AsBinary` |

###### `ST_AsWKT` function {#docs:current:sql:functions:geometry::st_aswkt-function}



|   |   |
|:--|:--------|
| **Description** |Returns the Well-Known Text (WKT) representation of the geometry |
| **Example** | `ST_AsText('POINT(1 2)'::GEOMETRY)` |
| **Result** | `POINT (1 2)` |
| **Alias** | `ST_AsText` |

###### `ST_Intersects_Extent` function {#docs:current:sql:functions:geometry::st_intersects_extent-function}



|   |   |
|:--|:--------|
| **Description** |Returns true if the geometries bounding boxes intersect |
| **Example** | `'POINT(5 5)'::GEOMETRY && 'LINESTRING(0 0, 10 20)'::GEOMETRY` |
| **Result** | `true` |
| **Alias** | `&&` |

###### `ST_CRS` function {#docs:current:sql:functions:geometry::st_crs-function}



|   |   |
|:--|:--------|
| **Description** |Returns the Coordinate Reference System (CRS) identifier of the geometry |
| **Example** | `ST_CRS('POINT(1 2)'::GEOMETRY('OGC:CRS84'))` |
| **Result** | `OGC:CRS84` |

###### `ST_SetCRS` function {#docs:current:sql:functions:geometry::st_setcrs-function}



|   |   |
|:--|:--------|
| **Description** |Sets the Coordinate Reference System (CRS) identifier of the geometry |
| **Example** | `typeof(ST_SetCRS('POINT(1 2)'::GEOMETRY, 'OGC:CRS84'))` |
| **Result** | `GEOMETRY('OGC:CRS84')` |

### Interval Functions {#docs:current:sql:functions:interval}



This section describes functions and operators for examining and manipulating [`INTERVAL`](#docs:current:sql:data_types:interval) values.

#### Interval Operators {#docs:current:sql:functions:interval::interval-operators}

The table below shows the available mathematical operators for `INTERVAL` types.

| Operator | Description | Example | Result |
|:-|:--|:----|:--|
| `+` | Addition of an `INTERVAL` | `INTERVAL 1 HOUR + INTERVAL 5 HOUR` | `INTERVAL 6 HOUR` |
| `+` | Addition to a `DATE` | `DATE '1992-03-22' + INTERVAL 5 DAY` | `1992-03-27 00:00:00` |
| `+` | Addition to a `TIMESTAMP` | `TIMESTAMP '1992-03-22 01:02:03' + INTERVAL 5 DAY` | `1992-03-27 01:02:03` |
| `+` | Addition to a `TIME` | `TIME '01:02:03' + INTERVAL 5 HOUR` | `06:02:03` |
| `-` | Subtraction of an `INTERVAL` | `INTERVAL 5 HOUR - INTERVAL 1 HOUR` | `INTERVAL 4 HOUR` |
| `-` | Subtraction from a `DATE` | `DATE '1992-03-27' - INTERVAL 5 DAY` | `1992-03-22` |
| `-` | Subtraction from a `TIMESTAMP` | `TIMESTAMP '1992-03-27 01:02:03' - INTERVAL 5 DAY` | `1992-03-22 01:02:03` |
| `-` | Subtraction from a `TIME` | `TIME '06:02:03' - INTERVAL 5 HOUR` | `01:02:03` |

#### Interval Functions {#docs:current:sql:functions:interval::interval-functions}

The table below shows the available scalar functions for `INTERVAL` types.

| Name | Description |
|:--|:-------|
| [`date_part(part, interval)`](#::date_partpart-interval) | Extract [datepart component](#docs:current:sql:functions:datepart) (equivalent to `extract`). See [`INTERVAL`](#docs:current:sql:data_types:interval) for the sometimes surprising rules governing this extraction. |
| [`datepart(part, interval)`](#::datepartpart-interval) | Alias of `date_part`. |
| [`extract(part FROM interval)`](#::extractpart-from-interval) | Alias of `date_part`. |
| [`epoch(interval)`](#::epochinterval) | Get total number of seconds, as double precision floating point number, in interval. |
| [`to_centuries(integer)`](#::to_centuriesinteger) | Construct a century interval. |
| [`to_days(integer)`](#::to_daysinteger) | Construct a day interval. |
| [`to_decades(integer)`](#::to_decadesinteger) | Construct a decade interval. |
| [`to_hours(integer)`](#::to_hoursinteger) | Construct an hour interval. |
| [`to_microseconds(integer)`](#::to_microsecondsinteger) | Construct a microsecond interval. |
| [`to_millennia(integer)`](#::to_millenniainteger) | Construct a millennium interval. |
| [`to_milliseconds(integer)`](#::to_millisecondsinteger) | Construct a millisecond interval. |
| [`to_minutes(integer)`](#::to_minutesinteger) | Construct a minute interval. |
| [`to_months(integer)`](#::to_monthsinteger) | Construct a month interval. |
| [`to_quarters(integer`)](#::to_quartersinteger) | Construct an interval of `integer` quarters. |
| [`to_seconds(integer)`](#::to_secondsinteger) | Construct a second interval. |
| [`to_weeks(integer)`](#::to_weeksinteger) | Construct a week interval. |
| [`to_years(integer)`](#::to_yearsinteger) | Construct a year interval. |

> Only the documented [date part components](#docs:current:sql:functions:datepart) are defined for intervals.

###### `date_part(part, interval)` {#docs:current:sql:functions:interval::date_partpart-interval}



|   |   |
|:--|:--------|
| **Description** |Extract [datepart component](#docs:current:sql:functions:datepart) (equivalent to `extract`). See [`INTERVAL`](#docs:current:sql:data_types:interval) for the sometimes surprising rules governing this extraction. |
| **Example** | `date_part('year', INTERVAL '14 months')` |
| **Result** | `1` |

###### `datepart(part, interval)` {#docs:current:sql:functions:interval::datepartpart-interval}



|   |   |
|:--|:--------|
| **Description** |Alias of `date_part`. |
| **Example** | `datepart('year', INTERVAL '14 months')` |
| **Result** | `1` |

###### `extract(part FROM interval)` {#docs:current:sql:functions:interval::extractpart-from-interval}



|   |   |
|:--|:--------|
| **Description** |Alias of `date_part`. |
| **Example** | `extract('month' FROM INTERVAL '14 months')` |
| **Result** | 2 |

###### `epoch(interval)` {#docs:current:sql:functions:interval::epochinterval}



|   |   |
|:--|:--------|
| **Description** |Get total number of seconds, as double precision floating point number, in interval. |
| **Example** | `epoch(INTERVAL 5 HOUR)` |
| **Result** | `18000.0` |

###### `to_centuries(integer)` {#docs:current:sql:functions:interval::to_centuriesinteger}



|   |   |
|:--|:--------|
| **Description** |Construct a century interval. |
| **Example** | `to_centuries(5)` |
| **Result** | `INTERVAL 500 YEAR` |

###### `to_days(integer)` {#docs:current:sql:functions:interval::to_daysinteger}



|   |   |
|:--|:--------|
| **Description** |Construct a day interval. |
| **Example** | `to_days(5)` |
| **Result** | `INTERVAL 5 DAY` |

###### `to_decades(integer)` {#docs:current:sql:functions:interval::to_decadesinteger}



|   |   |
|:--|:--------|
| **Description** |Construct a decade interval. |
| **Example** | `to_decades(5)` |
| **Result** | `INTERVAL 50 YEAR` |

###### `to_hours(integer)` {#docs:current:sql:functions:interval::to_hoursinteger}



|   |   |
|:--|:--------|
| **Description** |Construct an hour interval. |
| **Example** | `to_hours(5)` |
| **Result** | `INTERVAL 5 HOUR` |

###### `to_microseconds(integer)` {#docs:current:sql:functions:interval::to_microsecondsinteger}



|   |   |
|:--|:--------|
| **Description** |Construct a microsecond interval. |
| **Example** | `to_microseconds(5)` |
| **Result** | `INTERVAL 5 MICROSECOND` |

###### `to_millennia(integer)` {#docs:current:sql:functions:interval::to_millenniainteger}



|   |   |
|:--|:--------|
| **Description** |Construct a millennium interval. |
| **Example** | `to_millennia(5)` |
| **Result** | `INTERVAL 5000 YEAR` |

###### `to_milliseconds(integer)` {#docs:current:sql:functions:interval::to_millisecondsinteger}



|   |   |
|:--|:--------|
| **Description** |Construct a millisecond interval. |
| **Example** | `to_milliseconds(5)` |
| **Result** | `INTERVAL 5 MILLISECOND` |

###### `to_minutes(integer)` {#docs:current:sql:functions:interval::to_minutesinteger}



|   |   |
|:--|:--------|
| **Description** |Construct a minute interval. |
| **Example** | `to_minutes(5)` |
| **Result** | `INTERVAL 5 MINUTE` |

###### `to_months(integer)` {#docs:current:sql:functions:interval::to_monthsinteger}



|   |   |
|:--|:--------|
| **Description** |Construct a month interval. |
| **Example** | `to_months(5)` |
| **Result** | `INTERVAL 5 MONTH` |

###### `to_quarters(integer)` {#docs:current:sql:functions:interval::to_quartersinteger}



|   |   |
|:--|:--------|
| **Description** |Construct an interval of `integer` quarters. |
| **Example** | `to_quarters(5)` |
| **Result** | `INTERVAL 1 YEAR 3 MONTHS` |

###### `to_seconds(integer)` {#docs:current:sql:functions:interval::to_secondsinteger}



|   |   |
|:--|:--------|
| **Description** |Construct a second interval. |
| **Example** | `to_seconds(5)` |
| **Result** | `INTERVAL 5 SECOND` |

###### `to_weeks(integer)` {#docs:current:sql:functions:interval::to_weeksinteger}



|   |   |
|:--|:--------|
| **Description** |Construct a week interval. |
| **Example** | `to_weeks(5)` |
| **Result** | `INTERVAL 35 DAY` |

###### `to_years(integer)` {#docs:current:sql:functions:interval::to_yearsinteger}



|   |   |
|:--|:--------|
| **Description** |Construct a year interval. |
| **Example** | `to_years(5)` |
| **Result** | `INTERVAL 5 YEAR` |

### Lambda Functions {#docs:current:sql:functions:lambda}

> **Deprecated.** DuckDB v1.3 deprecated the old lambda single arrow syntax (` x -> x + 1`)
> in favor of the Python-style syntax (` lambda x : x + 1`).
>
> DuckDB v1.3 also introduces a new setting to configure the lambda syntax.
>
> ```sql
> SET lambda_syntax = 'DEFAULT';
> SET lambda_syntax = 'ENABLE_SINGLE_ARROW';
> SET lambda_syntax = 'DISABLE_SINGLE_ARROW';
> ```
>
> Currently, `DEFAULT` enables both syntax styles, i.e.,
> the old single arrow syntax and the Python-style syntax.
>
> DuckDB v1.5 is the last release supporting the single arrow syntax without explicitly enabling it.
>
> DuckDB v2.0 will disable the single arrow syntax by default.
>
> DuckDB v2.1 will remove the `lambda_syntax` flag and fully deprecates the single arrow syntax,
> so the old behavior will no longer be possible.

Lambda functions enable the use of more complex and flexible expressions in queries.
DuckDB supports several scalar functions that operate on [`LIST`s](#docs:current:sql:data_types:list) and
accept lambda functions as parameters
in the form `lambda ⟨parameter1⟩, ⟨parameter2⟩, ... : ⟨expression⟩`{:.language-sql .highlight}.
If the lambda function has only one parameter, then the parentheses can be omitted.
The parameters can have any names.
For example, the following are all valid lambda functions:

* `lambda param : param > 1`{:.language-sql .highlight}
* `lambda s : contains(concat(s, 'DB'), 'duck')`{:.language-sql .highlight}
* `lambda acc, x : acc + x`{:.language-sql .highlight}

#### Scalar Functions That Accept Lambda Functions {#docs:current:sql:functions:lambda::scalar-functions-that-accept-lambda-functions}






| Function | Description |
|:--|:-------|
| [`apply(list, lambda(x))`](#::list_transformlist-lambdax) | Alias for `list_transform`. |
| [`array_apply(list, lambda(x))`](#::list_transformlist-lambdax) | Alias for `list_transform`. |
| [`array_filter(list, lambda(x))`](#::list_filterlist-lambdax) | Alias for `list_filter`. |
| [`array_reduce(list, lambda(x, y)[, initial_value])`](#list_reducelist-lambdax-y-initial_value) | Alias for `list_reduce`. |
| [`array_transform(list, lambda(x))`](#::list_transformlist-lambdax) | Alias for `list_transform`. |
| [`filter(list, lambda(x))`](#::list_filterlist-lambdax) | Alias for `list_filter`. |
| [`list_apply(list, lambda(x))`](#::list_transformlist-lambdax) | Alias for `list_transform`. |
| [`list_filter(list, lambda(x))`](#::list_filterlist-lambdax) | Constructs a list from those elements of the input `list` for which the `lambda` function returns `true`. DuckDB must be able to cast the `lambda` function's return type to `BOOL`. The return type of `list_filter` is the same as the input list's. See [`list_filter` examples](#docs:current:sql:functions:lambda::list_filter-examples). |
| [`list_reduce(list, lambda(x, y)[, initial_value])`](#list_reducelist-lambdax-y-initial_value) | Reduces all elements of the input `list` into a single scalar value by executing the `lambda` function on a running result and the next list element. The `lambda` function has an optional `initial_value` argument. See [`list_reduce` examples](#docs:current:sql:functions:lambda::list_reduce-examples). |
| [`list_transform(list, lambda(x))`](#::list_transformlist-lambdax) | Returns a list that is the result of applying the `lambda` function to each element of the input `list`. The return type is defined by the return type of the `lambda` function. See [`list_transform` examples](#docs:current:sql:functions:lambda::list_transform-examples). |
| [`reduce(list, lambda(x, y)[, initial_value])`](#list_reducelist-lambdax-y-initial_value) | Alias for `list_reduce`. |



###### `list_filter(list, lambda(x))` {#docs:current:sql:functions:lambda::list_filterlist-lambdax}



|   |   |
|:--|:--------|
| **Description** |Constructs a list from those elements of the input `list` for which the `lambda` function returns `true`. DuckDB must be able to cast the `lambda` function's return type to `BOOL`. The return type of `list_filter` is the same as the input list's. See [`list_filter` examples](#docs:current:sql:functions:lambda::list_filter-examples). |
| **Example** | `list_filter([3, 4, 5], lambda x : x > 4)` |
| **Result** | `[5]` |
| **Aliases** | `array_filter`, `filter` |

###### `list_reduce(list, lambda(x, y)[, initial_value])` {#docs:current:sql:functions:lambda::list_reducelist-lambdax-y-initial_value}



|   |   |
|:--|:--------|
| **Description** |Reduces all elements of the input `list` into a single scalar value by executing the `lambda` function on a running result and the next list element. The `lambda` function has an optional `initial_value` argument. See [`list_reduce` examples](#docs:current:sql:functions:lambda::list_reduce-examples). |
| **Example** | `list_reduce([1, 2, 3], lambda x, y : x + y)` |
| **Result** | `6` |
| **Aliases** | `array_reduce`, `reduce` |

###### `list_transform(list, lambda(x))` {#docs:current:sql:functions:lambda::list_transformlist-lambdax}



|   |   |
|:--|:--------|
| **Description** |Returns a list that is the result of applying the `lambda` function to each element of the input `list`. The return type is defined by the return type of the `lambda` function. See [`list_transform` examples](#docs:current:sql:functions:lambda::list_transform-examples). |
| **Example** | `list_transform([1, 2, 3], lambda x : x + 1)` |
| **Result** | `[2, 3, 4]` |
| **Aliases** | `apply`, `array_apply`, `array_transform`, `list_apply` |





#### Nesting Lambda Functions {#docs:current:sql:functions:lambda::nesting-lambda-functions}

All scalar functions can be arbitrarily nested. For example, nested lambda functions to get all squares of even list elements:

```sql
SELECT list_transform(
        list_filter([0, 1, 2, 3, 4, 5], lambda x: x % 2 = 0),
        lambda y: y * y
    );
```

```text
[0, 4, 16]
```

Nested lambda function to add each element of the first list to the sum of the second list:

```sql
SELECT list_transform(
        [1, 2, 3],
        lambda x :
            list_reduce([4, 5, 6], lambda a, b: a + b) + x
    );
```

```text
[16, 17, 18]
```

#### Scoping {#docs:current:sql:functions:lambda::scoping}

Lambda functions conform to scoping rules in the following order:

* inner lambda parameters
* outer lambda parameters
* column names
* macro parameters

```sql
CREATE TABLE tbl (x INTEGER);
INSERT INTO tbl VALUES (10);
SELECT list_apply(
            [1, 2],
            lambda x: list_apply([4], lambda x: x + tbl.x)[1] + x
    )
FROM tbl;
```

```text
[15, 16]
```

#### Indexes as Parameters {#docs:current:sql:functions:lambda::indexes-as-parameters}

All lambda functions accept an optional extra parameter that represents the index of the current element.
This is always the last parameter of the lambda function (e.g., `i` in `(x, i)`), and is 1-based (i.e., the first element has index 1).

Get all elements that are larger than their index:

```sql
SELECT list_filter([1, 3, 1, 5], lambda x, i: x > i);
```

```text
[3, 5]
```

#### Examples {#docs:current:sql:functions:lambda::examples}

##### `list_transform` Examples {#docs:current:sql:functions:lambda::list_transform-examples}

Incrementing each list element by one:

```sql
SELECT list_transform([1, 2, NULL, 3], lambda x: x + 1);
```

```text
[2, 3, NULL, 4]
```

Transforming strings:

```sql
SELECT list_transform(['Duck', 'Goose', 'Sparrow'], lambda s: concat(s, 'DB'));
```

```text
[DuckDB, GooseDB, SparrowDB]
```

Combining lambda functions with other functions:

```sql
SELECT list_transform([5, NULL, 6], lambda x: coalesce(x, 0) + 1);
```

```text
[6, 1, 7]
```

##### `list_filter` Examples {#docs:current:sql:functions:lambda::list_filter-examples}

Filter out negative values:

```sql
SELECT list_filter([5, -6, NULL, 7], lambda x: x > 0);
```

```text
[5, 7]
```

Divisible by 2 and 5:

```sql
SELECT list_filter(
        list_filter([2, 4, 3, 1, 20, 10, 3, 30], lambda x: x % 2 = 0),
        lambda y: y % 5 = 0
    );
```

```text
[20, 10, 30]
```

In combination with `range(...)` to construct lists:

```sql
SELECT list_filter([1, 2, 3, 4], lambda x: x > #1) FROM range(4);
```

```text
[1, 2, 3, 4]
[2, 3, 4]
[3, 4]
[4]
```

##### `list_reduce` Examples {#docs:current:sql:functions:lambda::list_reduce-examples}

Sum of all list elements:

```sql
SELECT list_reduce([1, 2, 3, 4], lambda acc, x: acc + x);
```

```text
10
```

Only add up list elements if they are greater than 2:

```sql
SELECT list_reduce(
        list_filter([1, 2, 3, 4], lambda x: x > 2),
        lambda acc, x: acc + x
    );
```

```text
7
```

Concat all list elements:

```sql
SELECT list_reduce(['DuckDB', 'is', 'awesome'], lambda acc, x: concat(acc, ' ', x));
```

```text
DuckDB is awesome
```

Concatenate elements with the index without an initial value:

```sql
SELECT list_reduce(
        ['a', 'b', 'c', 'd'],
        lambda x, y, i: x || ' - ' || CAST(i AS VARCHAR) || ' - ' || y
    );
```

```text
a - 2 - b - 3 - c - 4 - d
```

Concatenate elements with the index with an initial value:

```sql
SELECT list_reduce(
        ['a', 'b', 'c', 'd'],
        lambda x, y, i: x || ' - ' || CAST(i AS VARCHAR) || ' - ' || y, 'INITIAL'
    );
```

```text
INITIAL - 1 - a - 2 - b - 3 - c - 4 - d
```

#### Limitations {#docs:current:sql:functions:lambda::limitations}

Subqueries in lambda expressions are currently not supported.
For example:

```sql
SELECT list_apply([1, 2, 3], lambda x: (SELECT 42) + x);
```

```console
Binder Error:
subqueries in lambda expressions are not supported
```

### List Functions {#docs:current:sql:functions:list}






| Function | Description |
|:--|:-------|
| [`list[index]`](#listindex) | Extracts a single list element using a (1-based) `index`. |
| [`list[begin[:end][:step]]`](#listbeginendstep) | Extracts a sublist using [slice conventions](#docs:current:sql:functions:list::slicing). Negative values are accepted. |
| [`list1 && list2`](#::list_has_anylist1-list2) | Alias for `list_has_any`. |
| [`list1 <-> list2`](#::list_distancelist1-list2) | Alias for `list_distance`. |
| [`list1 <=> list2`](#::list_cosine_distancelist1-list2) | Alias for `list_cosine_distance`. |
| [`list1 <@ list2`](#::list_has_alllist1-list2) | Alias for `list_has_all`. |
| [`list1 @> list2`](#::list_has_alllist1-list2) | Alias for `list_has_all`. |
| [`arg1 || arg2`](#::arg1--arg2) | Concatenates two strings, lists, or blobs. Any `NULL` input results in `NULL`. See also [`concat(arg1, arg2, ...)`](#docs:current:sql:functions:text::concatvalue-) and [`list_concat(list1, list2, ...)`](#docs:current:sql:functions:list::list_concatlist_1--list_n). |
| [`aggregate(list, function_name, ...)`](#::list_aggregatelist-function_name-) | Alias for `list_aggregate`. |
| [`apply(list, lambda(x))`](#::list_transformlist-lambdax) | Alias for `list_transform`. |
| [`array_aggr(list, function_name, ...)`](#::list_aggregatelist-function_name-) | Alias for `list_aggregate`. |
| [`array_aggregate(list, function_name, ...)`](#::list_aggregatelist-function_name-) | Alias for `list_aggregate`. |
| [`array_append(list, element)`](#::list_appendlist-element) | Alias for `list_append`. |
| [`array_apply(list, lambda(x))`](#::list_transformlist-lambdax) | Alias for `list_transform`. |
| [`array_cat(list_1, ..., list_n)`](#::list_concatlist_1--list_n) | Alias for `list_concat`. |
| [`array_concat(list_1, ..., list_n)`](#::list_concatlist_1--list_n) | Alias for `list_concat`. |
| [`array_contains(list, element)`](#::list_containslist-element) | Alias for `list_contains`. |
| [`array_distinct(list)`](#::list_distinctlist) | Alias for `list_distinct`. |
| [`array_extract(list, index)`](#::array_extractlist-index) | Extracts the `index`th (1-based) value from the `list`. |
| [`array_filter(list, lambda(x))`](#::list_filterlist-lambdax) | Alias for `list_filter`. |
| [`array_grade_up(list[, col1][, col2])`](#list_grade_uplist-col1-col2) | Alias for `list_grade_up`. |
| [`array_has(list, element)`](#::list_containslist-element) | Alias for `list_contains`. |
| [`array_has_all(list1, list2)`](#::list_has_alllist1-list2) | Alias for `list_has_all`. |
| [`array_has_any(list1, list2)`](#::list_has_anylist1-list2) | Alias for `list_has_any`. |
| [`array_indexof(list, element)`](#::list_positionlist-element) | Alias for `list_position`. |
| [`array_intersect(list1, list2)`](#::list_intersectlist1-list2) | Alias for `list_intersect`. |
| [`array_length(list)`](#::lengthlist) | Alias for `length`. |
| [`array_pop_back(list)`](#::array_pop_backlist) | Returns the `list` without the last element. |
| [`array_pop_front(list)`](#::array_pop_frontlist) | Returns the `list` without the first element. |
| [`array_position(list, element)`](#::list_positionlist-element) | Alias for `list_position`. |
| [`array_prepend(element, list)`](#::list_prependelement-list) | Alias for `list_prepend`. |
| [`array_push_back(list, element)`](#::list_appendlist-element) | Alias for `list_append`. |
| [`array_push_front(list, element)`](#::array_push_frontlist-element) | Prepends `element` to `list`. |
| [`array_reduce(list, lambda(x,y)[, initial_value])`](#list_reducelist-lambdaxy-initial_value) | Alias for `list_reduce`. |
| [`array_resize(list, size[[, value]])`](#list_resizelist-size-value) | Alias for `list_resize`. |
| [`array_reverse(list)`](#::list_reverselist) | Alias for `list_reverse`. |
| [`array_reverse_sort(list[, col1])`](#list_reverse_sortlist-col1) | Alias for `list_reverse_sort`. |
| [`array_select(value_list, index_list)`](#::list_selectvalue_list-index_list) | Alias for `list_select`. |
| [`array_slice(list, begin, end)`](#::list_slicelist-begin-end) | Alias for `list_slice`. |
| [`array_slice(list, begin, end, step)`](#::list_slicelist-begin-end-step) | Alias for `list_slice`. |
| [`array_sort(list[, col1][, col2])`](#list_sortlist-col1-col2) | Alias for `list_sort`. |
| [`array_to_string(list, delimiter)`](#::array_to_stringlist-delimiter) | Concatenates list/array elements using an optional `delimiter`. |
| [`array_to_string_comma_default(array)`](#::array_to_string_comma_defaultarray) | Concatenates list/array elements with a comma delimiter. |
| [`array_transform(list, lambda(x))`](#::list_transformlist-lambdax) | Alias for `list_transform`. |
| [`array_unique(list)`](#::list_uniquelist) | Alias for `list_unique`. |
| [`array_where(value_list, mask_list)`](#::list_wherevalue_list-mask_list) | Alias for `list_where`. |
| [`array_zip(list_1, ..., list_n[, truncate])`](#list_ziplist_1--list_n-truncate) | Alias for `list_zip`. |
| [`char_length(list)`](#::lengthlist) | Alias for `length`. |
| [`character_length(list)`](#::lengthlist) | Alias for `length`. |
| [`concat(value, ...)`](#::concatvalue-) | Concatenates multiple strings or lists. `NULL` inputs are skipped. See also [operator `||`](#::arg1--arg2). |
| [`contains(list, element)`](#::containslist-element) | Returns `true` if the `list` contains the `element`. |
| [`filter(list, lambda(x))`](#::list_filterlist-lambdax) | Alias for `list_filter`. |
| [`flatten(nested_list)`](#::flattennested_list) | [Flattens](#::flattening) a nested list by one level. |
| [`generate_series(start[, stop][, step])`](#generate_seriesstart-stop-step) | Creates a list of values between `start` and `stop` - the stop parameter is inclusive. |
| [`grade_up(list[, col1][, col2])`](#list_grade_uplist-col1-col2) | Alias for `list_grade_up`. |
| [`len(list)`](#::lengthlist) | Alias for `length`. |
| [`length(list)`](#::lengthlist) | Returns the length of the `list`. |
| [`list_aggr(list, function_name, ...)`](#::list_aggregatelist-function_name-) | Alias for `list_aggregate`. |
| [`list_aggregate(list, function_name, ...)`](#::list_aggregatelist-function_name-) | Executes the aggregate function `function_name` on the elements of `list`. See the [List Aggregates](#::list-aggregates) section for more details. |
| [`list_any_value(list)`](#::list_any_valuelist) | Applies aggregate function [`any_value`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_append(list, element)`](#::list_appendlist-element) | Appends `element` to `list`. |
| [`list_apply(list, lambda(x))`](#::list_transformlist-lambdax) | Alias for `list_transform`. |
| [`list_approx_count_distinct(list)`](#::list_approx_count_distinctlist) | Applies aggregate function [`approx_count_distinct`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_avg(list)`](#::list_avglist) | Applies aggregate function [`avg`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_bit_and(list)`](#::list_bit_andlist) | Applies aggregate function [`bit_and`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_bit_or(list)`](#::list_bit_orlist) | Applies aggregate function [`bit_or`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_bit_xor(list)`](#::list_bit_xorlist) | Applies aggregate function [`bit_xor`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_bool_and(list)`](#::list_bool_andlist) | Applies aggregate function [`bool_and`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_bool_or(list)`](#::list_bool_orlist) | Applies aggregate function [`bool_or`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_cat(list_1, ..., list_n)`](#::list_concatlist_1--list_n) | Alias for `list_concat`. |
| [`list_concat(list_1, ..., list_n)`](#::list_concatlist_1--list_n) | Concatenates lists. `NULL` inputs are skipped. See also [operator `||`](#::arg1--arg2). |
| [`list_contains(list, element)`](#::list_containslist-element) | Returns true if the list contains the element. |
| [`list_cosine_distance(list1, list2)`](#::list_cosine_distancelist1-list2) | Computes the cosine distance between two same-sized lists. |
| [`list_cosine_similarity(list1, list2)`](#::list_cosine_similaritylist1-list2) | Computes the cosine similarity between two same-sized lists. |
| [`list_count(list)`](#::list_countlist) | Applies aggregate function [`count`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_distance(list1, list2)`](#::list_distancelist1-list2) | Calculates the Euclidean distance between two points with coordinates given in two inputs lists of equal length. |
| [`list_distinct(list)`](#::list_distinctlist) | Removes all duplicates and `NULL` values from a list. Does not preserve the original order. |
| [`list_dot_product(list1, list2)`](#::list_inner_productlist1-list2) | Alias for `list_inner_product`. |
| [`list_element(list, index)`](#::list_extractlist-index) | Alias for `list_extract`. |
| [`list_entropy(list)`](#::list_entropylist) | Applies aggregate function [`entropy`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_extract(list, index)`](#::list_extractlist-index) | Extract the `index`th (1-based) value from the list. |
| [`list_filter(list, lambda(x))`](#::list_filterlist-lambdax) | Constructs a list from those elements of the input `list` for which the `lambda` function returns `true`. DuckDB must be able to cast the `lambda` function's return type to `BOOL`. The return type of `list_filter` is the same as the input list's. See [`list_filter` examples](#docs:current:sql:functions:lambda::list_filter-examples). |
| [`list_first(list)`](#::list_firstlist) | Applies aggregate function [`first`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_grade_up(list[, col1][, col2])`](#list_grade_uplist-col1-col2) | Works like [`list_sort`](#::list_sortlist-col1-col2), but the results are the indexes that correspond to the position in the original list instead of the actual values. |
| [`list_has(list, element)`](#::list_containslist-element) | Alias for `list_contains`. |
| [`list_has_all(list1, list2)`](#::list_has_alllist1-list2) | Returns true if all elements of list2 are in list1. NULLs are ignored. |
| [`list_has_any(list1, list2)`](#::list_has_anylist1-list2) | Returns true if the lists have any element in common. NULLs are ignored. |
| [`list_histogram(list)`](#::list_histogramlist) | Applies aggregate function [`histogram`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_indexof(list, element)`](#::list_positionlist-element) | Alias for `list_position`. |
| [`list_inner_product(list1, list2)`](#::list_inner_productlist1-list2) | Computes the inner product between two same-sized lists. |
| [`list_intersect(list1, list2)`](#::list_intersectlist1-list2) | Returns a list of all the elements that exist in both `list1` and `list2`, without duplicates. |
| [`list_kurtosis(list)`](#::list_kurtosislist) | Applies aggregate function [`kurtosis`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_kurtosis_pop(list)`](#::list_kurtosis_poplist) | Applies aggregate function [`kurtosis_pop`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_last(list)`](#::list_lastlist) | Applies aggregate function [`last`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_mad(list)`](#::list_madlist) | Applies aggregate function [`mad`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_max(list)`](#::list_maxlist) | Applies aggregate function [`max`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_median(list)`](#::list_medianlist) | Applies aggregate function [`median`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_min(list)`](#::list_minlist) | Applies aggregate function [`min`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_mode(list)`](#::list_modelist) | Applies aggregate function [`mode`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_negative_dot_product(list1, list2)`](#::list_negative_inner_productlist1-list2) | Alias for `list_negative_inner_product`. |
| [`list_negative_inner_product(list1, list2)`](#::list_negative_inner_productlist1-list2) | Computes the negative inner product between two same-sized lists. |
| [`list_pack(arg, ...)`](#::list_valuearg-) | Alias for `list_value`. |
| [`list_position(list, element)`](#::list_positionlist-element) | Returns the index of the `element` if the `list` contains the `element`. If the `element` is not found, it returns `NULL`. |
| [`list_prepend(element, list)`](#::list_prependelement-list) | Prepends `element` to `list`. |
| [`list_product(list)`](#::list_productlist) | Applies aggregate function [`product`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_reduce(list, lambda(x,y)[, initial_value])`](#list_reducelist-lambdaxy-initial_value) | Reduces all elements of the input `list` into a single scalar value by executing the `lambda` function on a running result and the next list element. The `lambda` function has an optional `initial_value` argument. See [`list_reduce` examples](#docs:current:sql:functions:lambda::list_reduce-examples). |
| [`list_resize(list, size[[, value]])`](#list_resizelist-size-value) | Resizes the `list` to contain `size` elements. Initializes new elements with `value` or `NULL` if `value` is not set. |
| [`list_reverse(list)`](#::list_reverselist) | Reverses the `list`. |
| [`list_reverse_sort(list[, col1])`](#list_reverse_sortlist-col1) | Sorts the elements of the list in reverse order. See the [Sorting Lists](#::sorting-lists) section for more details about sorting order and `NULL` values. |
| [`list_select(value_list, index_list)`](#::list_selectvalue_list-index_list) | Returns a list based on the elements selected by the `index_list`. |
| [`list_sem(list)`](#::list_semlist) | Applies aggregate function [`sem`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_skewness(list)`](#::list_skewnesslist) | Applies aggregate function [`skewness`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_slice(list, begin, end)`](#::list_slicelist-begin-end) | Extracts a sublist or substring using [slice conventions](#docs:current:sql:functions:list::slicing). Negative values are accepted. |
| [`list_slice(list, begin, end, step)`](#::list_slicelist-begin-end-step) | list_slice with added step feature. |
| [`list_sort(list[, col1][, col2])`](#list_sortlist-col1-col2) | Sorts the elements of the list. See the [Sorting Lists](#::sorting-lists) section for more details about sorting order and `NULL` values. |
| [`list_stddev_pop(list)`](#::list_stddev_poplist) | Applies aggregate function [`stddev_pop`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_stddev_samp(list)`](#::list_stddev_samplist) | Applies aggregate function [`stddev_samp`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_string_agg(list)`](#::list_string_agglist) | Applies aggregate function [`string_agg`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_sum(list)`](#::list_sumlist) | Applies aggregate function [`sum`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_transform(list, lambda(x))`](#::list_transformlist-lambdax) | Returns a list that is the result of applying the `lambda` function to each element of the input `list`. The return type is defined by the return type of the `lambda` function. See [`list_transform` examples](#docs:current:sql:functions:lambda::list_transform-examples). |
| [`list_unique(list)`](#::list_uniquelist) | Counts the unique elements of a `list`. |
| [`list_value(arg, ...)`](#::list_valuearg-) | Creates a LIST containing the argument values. |
| [`list_var_pop(list)`](#::list_var_poplist) | Applies aggregate function [`var_pop`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_var_samp(list)`](#::list_var_samplist) | Applies aggregate function [`var_samp`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| [`list_where(value_list, mask_list)`](#::list_wherevalue_list-mask_list) | Returns a list with the `BOOLEAN`s in `mask_list` applied as a mask to the `value_list`. |
| [`list_zip(list_1, ..., list_n[, truncate])`](#list_ziplist_1--list_n-truncate) | Zips n `LIST`s to a new `LIST` whose length will be that of the longest list. Its elements are structs of n elements from each list `list_1`, …, `list_n`, missing elements are replaced with `NULL`. If `truncate` is set, all lists are truncated to the smallest list length. |
| [`range(start[, stop][, step])`](#rangestart-stop-step) | Creates a list of values between `start` and `stop` - the stop parameter is exclusive. |
| [`reduce(list, lambda(x,y)[, initial_value])`](#list_reducelist-lambdaxy-initial_value) | Alias for `list_reduce`. |
| [`repeat(list, count)`](#::repeatlist-count) | Repeats the `list` `count` number of times. |
| [`unnest(list)`](#::unnestlist) | Unnests a list by one level. Note that this is a special function that alters the cardinality of the result. See the [unnest page](#docs:current:sql:query_syntax:unnest) for more details. |
| [`unpivot_list(arg, ...)`](#::unpivot_listarg-) | Identical to list_value, but generated as part of unpivot for better error messages. |



###### `list[index]` {#docs:current:sql:functions:list::listindex}



|   |   |
|:--|:--------|
| **Description** |Extracts a single list element using a (1-based) `index`. |
| **Example** | `[4, 5, 6][3]` |
| **Result** | `6` |
| **Alias** | `list_extract` |

###### `list[begin[:end][:step]]` {#docs:current:sql:functions:list::listbeginendstep}



|   |   |
|:--|:--------|
| **Description** |Extracts a sublist using [slice conventions](#docs:current:sql:functions:list::slicing). Negative values are accepted. |
| **Example** | `[4, 5, 6][3]` |
| **Result** | `6` |
| **Alias** | `list_slice` |

###### `arg1 || arg2` {#docs:current:sql:functions:list::arg1--arg2}



|   |   |
|:--|:--------|
| **Description** |Concatenates two strings, lists, or blobs. Any `NULL` input results in `NULL`. See also [`concat(arg1, arg2, ...)`](#docs:current:sql:functions:text::concatvalue-) and [`list_concat(list1, list2, ...)`](#docs:current:sql:functions:list::list_concatlist_1--list_n). |
| **Example 1** | `'Duck' || 'DB'` |
| **Result** | `DuckDB` |
| **Example 2** | `[1, 2, 3] || [4, 5, 6]` |
| **Result** | `[1, 2, 3, 4, 5, 6]` |
| **Example 3** | `'\xAA'::BLOB || '\xBB'::BLOB` |
| **Result** | `\xAA\xBB` |

###### `array_extract(list, index)` {#docs:current:sql:functions:list::array_extractlist-index}



|   |   |
|:--|:--------|
| **Description** |Extracts the `index`th (1-based) value from the `list`. |
| **Example** | `array_extract([4, 5, 6], 3)` |
| **Result** | `6` |

###### `array_pop_back(list)` {#docs:current:sql:functions:list::array_pop_backlist}



|   |   |
|:--|:--------|
| **Description** |Returns the `list` without the last element. |
| **Example** | `array_pop_back([4, 5, 6])` |
| **Result** | `[4, 5]` |

###### `array_pop_front(list)` {#docs:current:sql:functions:list::array_pop_frontlist}



|   |   |
|:--|:--------|
| **Description** |Returns the `list` without the first element. |
| **Example** | `array_pop_front([4, 5, 6])` |
| **Result** | `[5, 6]` |

###### `array_push_front(list, element)` {#docs:current:sql:functions:list::array_push_frontlist-element}



|   |   |
|:--|:--------|
| **Description** |Prepends `element` to `list`. |
| **Example** | `array_push_front([4, 5, 6], 3)` |
| **Result** | `[3, 4, 5, 6]` |

###### `array_to_string(list, delimiter)` {#docs:current:sql:functions:list::array_to_stringlist-delimiter}



|   |   |
|:--|:--------|
| **Description** |Concatenates list/array elements using an optional `delimiter`. |
| **Example 1** | `array_to_string([1, 2, 3], '-')` |
| **Result** | `1-2-3` |
| **Example 2** | `array_to_string(['aa', 'bb', 'cc'], '')` |
| **Result** | `aabbcc` |

###### `array_to_string_comma_default(array)` {#docs:current:sql:functions:list::array_to_string_comma_defaultarray}



|   |   |
|:--|:--------|
| **Description** |Concatenates list/array elements with a comma delimiter. |
| **Example** | `array_to_string_comma_default(['Banana', 'Apple', 'Melon'])` |
| **Result** | `Banana,Apple,Melon` |

###### `concat(value, ...)` {#docs:current:sql:functions:list::concatvalue-}



|   |   |
|:--|:--------|
| **Description** |Concatenates multiple strings or lists. `NULL` inputs are skipped. See also [operator `||`](#::arg1--arg2). |
| **Example 1** | `concat('Hello', ' ', 'World')` |
| **Result** | `Hello World` |
| **Example 2** | `concat([1, 2, 3], NULL, [4, 5, 6])` |
| **Result** | `[1, 2, 3, 4, 5, 6]` |

###### `contains(list, element)` {#docs:current:sql:functions:list::containslist-element}



|   |   |
|:--|:--------|
| **Description** |Returns `true` if the `list` contains the `element`. |
| **Example** | `contains([1, 2, NULL], 1)` |
| **Result** | `true` |

###### `flatten(nested_list)` {#docs:current:sql:functions:list::flattennested_list}



|   |   |
|:--|:--------|
| **Description** |[Flattens](#::flattening) a nested list by one level. |
| **Example** | `flatten([[1, 2, 3], [4, 5]])` |
| **Result** | `[1, 2, 3, 4, 5]` |

###### `generate_series(start[, stop][, step])` {#docs:current:sql:functions:list::generate_seriesstart-stop-step}



|   |   |
|:--|:--------|
| **Description** |Creates a list of values between `start` and `stop` - the stop parameter is inclusive. |
| **Example** | `generate_series(2, 5, 3)` |
| **Result** | `[2, 5]` |

###### `length(list)` {#docs:current:sql:functions:list::lengthlist}



|   |   |
|:--|:--------|
| **Description** |Returns the length of the `list`. |
| **Example** | `length([1,2,3])` |
| **Result** | `3` |
| **Aliases** | `char_length`, `character_length`, `len` |

###### `list_aggregate(list, function_name, ...)` {#docs:current:sql:functions:list::list_aggregatelist-function_name-}



|   |   |
|:--|:--------|
| **Description** |Executes the aggregate function `function_name` on the elements of `list`. See the [List Aggregates](#::list-aggregates) section for more details. |
| **Example** | `list_aggregate([1, 2, NULL], 'min')` |
| **Result** | `1` |
| **Aliases** | `aggregate`, `array_aggr`, `array_aggregate`, `list_aggr` |

###### `list_any_value(list)` {#docs:current:sql:functions:list::list_any_valuelist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`any_value`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_any_value([3,3,9])` |
| **Result** | `3` |

###### `list_append(list, element)` {#docs:current:sql:functions:list::list_appendlist-element}



|   |   |
|:--|:--------|
| **Description** |Appends `element` to `list`. |
| **Example** | `list_append([2, 3], 4)` |
| **Result** | `[2, 3, 4]` |
| **Aliases** | `array_append`, `array_push_back` |

###### `list_approx_count_distinct(list)` {#docs:current:sql:functions:list::list_approx_count_distinctlist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`approx_count_distinct`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_approx_count_distinct([3,3,9])` |
| **Result** | `2` |

###### `list_avg(list)` {#docs:current:sql:functions:list::list_avglist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`avg`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_avg([3,3,9])` |
| **Result** | `5.0` |

###### `list_bit_and(list)` {#docs:current:sql:functions:list::list_bit_andlist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`bit_and`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_bit_and([3,3,9])` |
| **Result** | `1` |

###### `list_bit_or(list)` {#docs:current:sql:functions:list::list_bit_orlist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`bit_or`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_bit_or([3,3,9])` |
| **Result** | `11` |

###### `list_bit_xor(list)` {#docs:current:sql:functions:list::list_bit_xorlist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`bit_xor`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_bit_xor([3,3,9])` |
| **Result** | `9` |

###### `list_bool_and(list)` {#docs:current:sql:functions:list::list_bool_andlist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`bool_and`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_bool_and([true, false])` |
| **Result** | `false` |

###### `list_bool_or(list)` {#docs:current:sql:functions:list::list_bool_orlist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`bool_or`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_bool_or([true, false])` |
| **Result** | `true` |

###### `list_concat(list_1, ..., list_n)` {#docs:current:sql:functions:list::list_concatlist_1--list_n}



|   |   |
|:--|:--------|
| **Description** |Concatenates lists. `NULL` inputs are skipped. See also [operator `||`](#::arg1--arg2). |
| **Example** | `list_concat([2, 3], [4, 5, 6], [7])` |
| **Result** | `[2, 3, 4, 5, 6, 7]` |
| **Aliases** | `list_cat`, `array_concat`, `array_cat` |

###### `list_contains(list, element)` {#docs:current:sql:functions:list::list_containslist-element}



|   |   |
|:--|:--------|
| **Description** |Returns true if the list contains the element. |
| **Example** | `list_contains([1, 2, NULL], 1)` |
| **Result** | `true` |
| **Aliases** | `array_contains`, `array_has`, `list_has` |

###### `list_cosine_distance(list1, list2)` {#docs:current:sql:functions:list::list_cosine_distancelist1-list2}



|   |   |
|:--|:--------|
| **Description** |Computes the cosine distance between two same-sized lists. |
| **Example** | `list_cosine_distance([1, 2, 3], [1, 2, 3])` |
| **Result** | `0.0` |
| **Alias** | `<=>` |

###### `list_cosine_similarity(list1, list2)` {#docs:current:sql:functions:list::list_cosine_similaritylist1-list2}



|   |   |
|:--|:--------|
| **Description** |Computes the cosine similarity between two same-sized lists. |
| **Example** | `list_cosine_similarity([1, 2, 3], [1, 2, 3])` |
| **Result** | `1.0` |

###### `list_count(list)` {#docs:current:sql:functions:list::list_countlist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`count`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_count([3,3,9])` |
| **Result** | `3` |

###### `list_distance(list1, list2)` {#docs:current:sql:functions:list::list_distancelist1-list2}



|   |   |
|:--|:--------|
| **Description** |Calculates the Euclidean distance between two points with coordinates given in two inputs lists of equal length. |
| **Example** | `list_distance([1, 2, 3], [1, 2, 5])` |
| **Result** | `2.0` |
| **Alias** | `<->` |

###### `list_distinct(list)` {#docs:current:sql:functions:list::list_distinctlist}



|   |   |
|:--|:--------|
| **Description** |Removes all duplicates and `NULL` values from a list. Does not preserve the original order. |
| **Example** | `list_distinct([1, 1, NULL, -3, 1, 5])` |
| **Result** | `[5, -3, 1]` |
| **Alias** | `array_distinct` |

###### `list_entropy(list)` {#docs:current:sql:functions:list::list_entropylist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`entropy`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_entropy([3,3,9])` |
| **Result** | `0.9182958340544893` |

###### `list_extract(list, index)` {#docs:current:sql:functions:list::list_extractlist-index}



|   |   |
|:--|:--------|
| **Description** |Extract the `index`th (1-based) value from the list. |
| **Example** | `list_extract([4, 5, 6], 3)` |
| **Result** | `6` |
| **Alias** | `list_element` |

###### `list_filter(list, lambda(x))` {#docs:current:sql:functions:list::list_filterlist-lambdax}



|   |   |
|:--|:--------|
| **Description** |Constructs a list from those elements of the input `list` for which the `lambda` function returns `true`. DuckDB must be able to cast the `lambda` function's return type to `BOOL`. The return type of `list_filter` is the same as the input list's. See [`list_filter` examples](#docs:current:sql:functions:lambda::list_filter-examples). |
| **Example** | `list_filter([3, 4, 5], lambda x : x > 4)` |
| **Result** | `[5]` |
| **Aliases** | `array_filter`, `filter` |

###### `list_first(list)` {#docs:current:sql:functions:list::list_firstlist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`first`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_first([3,3,9])` |
| **Result** | `3` |

###### `list_grade_up(list[, col1][, col2])` {#docs:current:sql:functions:list::list_grade_uplist-col1-col2}



|   |   |
|:--|:--------|
| **Description** |Works like [`list_sort`](#::list_sortlist-col1-col2), but the results are the indexes that correspond to the position in the original list instead of the actual values. |
| **Example** | `list_grade_up([3, 6, 1, 2])` |
| **Result** | `[3, 4, 1, 2]` |
| **Aliases** | `array_grade_up`, `grade_up` |

###### `list_has_all(list1, list2)` {#docs:current:sql:functions:list::list_has_alllist1-list2}



|   |   |
|:--|:--------|
| **Description** |Returns true if all elements of list2 are in list1. NULLs are ignored. |
| **Example** | `list_has_all([1, 2, 3], [2, 3])` |
| **Result** | `true` |
| **Aliases** | `<@`, `@>`, `array_has_all` |

###### `list_has_any(list1, list2)` {#docs:current:sql:functions:list::list_has_anylist1-list2}



|   |   |
|:--|:--------|
| **Description** |Returns true if the lists have any element in common. NULLs are ignored. |
| **Example** | `list_has_any([1, 2, 3], [2, 3, 4])` |
| **Result** | `true` |
| **Aliases** | `&&`, `array_has_any` |

###### `list_histogram(list)` {#docs:current:sql:functions:list::list_histogramlist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`histogram`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_histogram([3,3,9])` |
| **Result** | `{3=2, 9=1}` |

###### `list_inner_product(list1, list2)` {#docs:current:sql:functions:list::list_inner_productlist1-list2}



|   |   |
|:--|:--------|
| **Description** |Computes the inner product between two same-sized lists. |
| **Example** | `list_inner_product([1, 2, 3], [1, 2, 3])` |
| **Result** | `14.0` |
| **Alias** | `list_dot_product` |

###### `list_intersect(list1, list2)` {#docs:current:sql:functions:list::list_intersectlist1-list2}



|   |   |
|:--|:--------|
| **Description** |Returns a list of all the elements that exist in both `list1` and `list2`, without duplicates. |
| **Example** | `list_intersect([1, 2, 3], [2, 3, 4])` |
| **Result** | `[3, 2]` |
| **Alias** | `array_intersect` |

###### `list_kurtosis(list)` {#docs:current:sql:functions:list::list_kurtosislist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`kurtosis`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_kurtosis([3,3,9])` |
| **Result** | `NULL` |

###### `list_kurtosis_pop(list)` {#docs:current:sql:functions:list::list_kurtosis_poplist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`kurtosis_pop`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_kurtosis_pop([3,3,9])` |
| **Result** | `-1.4999999999999978` |

###### `list_last(list)` {#docs:current:sql:functions:list::list_lastlist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`last`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_last([3,3,9])` |
| **Result** | `9` |

###### `list_mad(list)` {#docs:current:sql:functions:list::list_madlist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`mad`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_mad([3,3,9])` |
| **Result** | `0.0` |

###### `list_max(list)` {#docs:current:sql:functions:list::list_maxlist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`max`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_max([3,3,9])` |
| **Result** | `9` |

###### `list_median(list)` {#docs:current:sql:functions:list::list_medianlist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`median`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_median([3,3,9])` |
| **Result** | `3.0` |

###### `list_min(list)` {#docs:current:sql:functions:list::list_minlist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`min`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_min([3,3,9])` |
| **Result** | `3` |

###### `list_mode(list)` {#docs:current:sql:functions:list::list_modelist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`mode`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_mode([3,3,9])` |
| **Result** | `3` |

###### `list_negative_inner_product(list1, list2)` {#docs:current:sql:functions:list::list_negative_inner_productlist1-list2}



|   |   |
|:--|:--------|
| **Description** |Computes the negative inner product between two same-sized lists. |
| **Example** | `list_negative_inner_product([1, 2, 3], [1, 2, 3])` |
| **Result** | `-14.0` |
| **Alias** | `list_negative_dot_product` |

###### `list_position(list, element)` {#docs:current:sql:functions:list::list_positionlist-element}



|   |   |
|:--|:--------|
| **Description** |Returns the index of the `element` if the `list` contains the `element`. If the `element` is not found, it returns `NULL`. |
| **Example** | `list_position([1, 2, NULL], 2)` |
| **Result** | `2` |
| **Aliases** | `array_indexof`, `array_position`, `list_indexof` |

###### `list_prepend(element, list)` {#docs:current:sql:functions:list::list_prependelement-list}



|   |   |
|:--|:--------|
| **Description** |Prepends `element` to `list`. |
| **Example** | `list_prepend(3, [4, 5, 6])` |
| **Result** | `[3, 4, 5, 6]` |
| **Alias** | `array_prepend` |

###### `list_product(list)` {#docs:current:sql:functions:list::list_productlist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`product`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_product([3,3,9])` |
| **Result** | `81.0` |

###### `list_reduce(list, lambda(x,y)[, initial_value])` {#docs:current:sql:functions:list::list_reducelist-lambdaxy-initial_value}



|   |   |
|:--|:--------|
| **Description** |Reduces all elements of the input `list` into a single scalar value by executing the `lambda` function on a running result and the next list element. The `lambda` function has an optional `initial_value` argument. See [`list_reduce` examples](#docs:current:sql:functions:lambda::list_reduce-examples). |
| **Example** | `list_reduce([1, 2, 3], lambda x, y : x + y)` |
| **Result** | `6` |
| **Aliases** | `array_reduce`, `reduce` |

###### `list_resize(list, size[[, value]])` {#docs:current:sql:functions:list::list_resizelist-size-value}



|   |   |
|:--|:--------|
| **Description** |Resizes the `list` to contain `size` elements. Initializes new elements with `value` or `NULL` if `value` is not set. |
| **Example** | `list_resize([1, 2, 3], 5, 0)` |
| **Result** | `[1, 2, 3, 0, 0]` |
| **Alias** | `array_resize` |

###### `list_reverse(list)` {#docs:current:sql:functions:list::list_reverselist}



|   |   |
|:--|:--------|
| **Description** |Reverses the `list`. |
| **Example** | `list_reverse([3, 6, 1, 2])` |
| **Result** | `[2, 1, 6, 3]` |
| **Alias** | `array_reverse` |

###### `list_reverse_sort(list[, col1])` {#docs:current:sql:functions:list::list_reverse_sortlist-col1}



|   |   |
|:--|:--------|
| **Description** |Sorts the elements of the list in reverse order. See the [Sorting Lists](#::sorting-lists) section for more details about sorting order and `NULL` values. |
| **Example** | `list_reverse_sort([3, 6, 1, 2])` |
| **Result** | `[6, 3, 2, 1]` |
| **Alias** | `array_reverse_sort` |

###### `list_select(value_list, index_list)` {#docs:current:sql:functions:list::list_selectvalue_list-index_list}



|   |   |
|:--|:--------|
| **Description** |Returns a list based on the elements selected by the `index_list`. |
| **Example** | `list_select([10, 20, 30, 40], [1, 4])` |
| **Result** | `[10, 40]` |
| **Alias** | `array_select` |

###### `list_sem(list)` {#docs:current:sql:functions:list::list_semlist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`sem`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_sem([3,3,9])` |
| **Result** | `1.6329931618554523` |

###### `list_skewness(list)` {#docs:current:sql:functions:list::list_skewnesslist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`skewness`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_skewness([3,3,9])` |
| **Result** | `1.7320508075688796` |

###### `list_slice(list, begin, end)` {#docs:current:sql:functions:list::list_slicelist-begin-end}



|   |   |
|:--|:--------|
| **Description** |Extracts a sublist or substring using [slice conventions](#docs:current:sql:functions:list::slicing). Negative values are accepted. |
| **Example** | `list_slice([4, 5, 6], 2, 3)` |
| **Result** | `[5, 6]` |
| **Alias** | `array_slice` |

###### `list_slice(list, begin, end, step)` {#docs:current:sql:functions:list::list_slicelist-begin-end-step}



|   |   |
|:--|:--------|
| **Description** |list_slice with added step feature. |
| **Example** | `list_slice([4, 5, 6], 1, 3, 2)` |
| **Result** | `[4, 6]` |
| **Alias** | `array_slice` |

###### `list_sort(list[, col1][, col2])` {#docs:current:sql:functions:list::list_sortlist-col1-col2}



|   |   |
|:--|:--------|
| **Description** |Sorts the elements of the list. See the [Sorting Lists](#::sorting-lists) section for more details about sorting order and `NULL` values. |
| **Example** | `list_sort([3, 6, 1, 2])` |
| **Result** | `[1, 2, 3, 6]` |
| **Alias** | `array_sort` |

###### `list_stddev_pop(list)` {#docs:current:sql:functions:list::list_stddev_poplist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`stddev_pop`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_stddev_pop([3,3,9])` |
| **Result** | `2.8284271247461903` |

###### `list_stddev_samp(list)` {#docs:current:sql:functions:list::list_stddev_samplist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`stddev_samp`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_stddev_samp([3,3,9])` |
| **Result** | `3.4641016151377544` |

###### `list_string_agg(list)` {#docs:current:sql:functions:list::list_string_agglist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`string_agg`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_string_agg([3,3,9])` |
| **Result** | `3,3,9` |

###### `list_sum(list)` {#docs:current:sql:functions:list::list_sumlist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`sum`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_sum([3,3,9])` |
| **Result** | `15` |

###### `list_transform(list, lambda(x))` {#docs:current:sql:functions:list::list_transformlist-lambdax}



|   |   |
|:--|:--------|
| **Description** |Returns a list that is the result of applying the `lambda` function to each element of the input `list`. The return type is defined by the return type of the `lambda` function. See [`list_transform` examples](#docs:current:sql:functions:lambda::list_transform-examples). |
| **Example** | `list_transform([1, 2, 3], lambda x : x + 1)` |
| **Result** | `[2, 3, 4]` |
| **Aliases** | `apply`, `array_apply`, `array_transform`, `list_apply` |

###### `list_unique(list)` {#docs:current:sql:functions:list::list_uniquelist}



|   |   |
|:--|:--------|
| **Description** |Counts the unique elements of a `list`. |
| **Example** | `list_unique([1, 1, NULL, -3, 1, 5])` |
| **Result** | `3` |
| **Alias** | `array_unique` |

###### `list_value(arg, ...)` {#docs:current:sql:functions:list::list_valuearg-}



|   |   |
|:--|:--------|
| **Description** |Creates a LIST containing the argument values. |
| **Example** | `list_value(4, 5, 6)` |
| **Result** | `[4, 5, 6]` |
| **Alias** | `list_pack` |

###### `list_var_pop(list)` {#docs:current:sql:functions:list::list_var_poplist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`var_pop`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_var_pop([3,3,9])` |
| **Result** | `8.0` |

###### `list_var_samp(list)` {#docs:current:sql:functions:list::list_var_samplist}



|   |   |
|:--|:--------|
| **Description** |Applies aggregate function [`var_samp`](#docs:current:sql:functions:aggregates::general-aggregate-functions) to the `list`. |
| **Example** | `list_var_samp([3,3,9])` |
| **Result** | `12.0` |

###### `list_where(value_list, mask_list)` {#docs:current:sql:functions:list::list_wherevalue_list-mask_list}



|   |   |
|:--|:--------|
| **Description** |Returns a list with the `BOOLEAN`s in `mask_list` applied as a mask to the `value_list`. |
| **Example** | `list_where([10, 20, 30, 40], [true, false, false, true])` |
| **Result** | `[10, 40]` |
| **Alias** | `array_where` |

###### `list_zip(list_1, ..., list_n[, truncate])` {#docs:current:sql:functions:list::list_ziplist_1--list_n-truncate}



|   |   |
|:--|:--------|
| **Description** |Zips n `LIST`s to a new `LIST` whose length will be that of the longest list. Its elements are structs of n elements from each list `list_1`, …, `list_n`, missing elements are replaced with `NULL`. If `truncate` is set, all lists are truncated to the smallest list length. |
| **Example 1** | `list_zip([1, 2], [3, 4], [5, 6])` |
| **Result** | `[(1, 3, 5), (2, 4, 6)]` |
| **Example 2** | `list_zip([1, 2], [3, 4], [5, 6, 7])` |
| **Result** | `[(1, 3, 5), (2, 4, 6), (NULL, NULL, 7)]` |
| **Example 3** | `list_zip([1, 2], [3, 4], [5, 6, 7], true)` |
| **Result** | `[(1, 3, 5), (2, 4, 6)]` |
| **Alias** | `array_zip` |

###### `range(start[, stop][, step])` {#docs:current:sql:functions:list::rangestart-stop-step}



|   |   |
|:--|:--------|
| **Description** |Creates a list of values between `start` and `stop` - the stop parameter is exclusive. |
| **Example** | `range(2, 5, 3)` |
| **Result** | `[2]` |

###### `repeat(list, count)` {#docs:current:sql:functions:list::repeatlist-count}



|   |   |
|:--|:--------|
| **Description** |Repeats the `list` `count` number of times. |
| **Example** | `repeat([1, 2, 3], 5)` |
| **Result** | `[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]` |

###### `unnest(list)` {#docs:current:sql:functions:list::unnestlist}



|   |   |
|:--|:--------|
| **Description** |Unnests a list by one level. Note that this is a special function that alters the cardinality of the result. See the [unnest page](#docs:current:sql:query_syntax:unnest) for more details. |
| **Example** | `unnest([1, 2, 3])` |
| **Result** | Multiple rows: `'1'`, `'2'`, `'3'` |

###### `unpivot_list(arg, ...)` {#docs:current:sql:functions:list::unpivot_listarg-}



|   |   |
|:--|:--------|
| **Description** |Identical to list_value, but generated as part of unpivot for better error messages. |
| **Example** | `unpivot_list(4, 5, 6)` |
| **Result** | `[4, 5, 6]` |




#### List Operators {#docs:current:sql:functions:list::list-operators}

The following operators are supported for lists:



| Operator | Description | Example | Result |
|-|--|---|-|
| `&&`  | Alias for [`list_has_any`](#::list_has_anylist1-list2).                                                                   | `[1, 2, 3, 4, 5] && [2, 5, 5, 6]` | `true`               |
| `@>`  | Alias for [`list_has_all`](#::list_has_alllist1-list2), where the list on the **right** of the operator is the sublist. | `[1, 2, 3, 4] @> [3, 4, 3]`       | `true`               |
| `<@`  | Alias for [`list_has_all`](#::list_has_alllist1-list2), where the list on the **left** of the operator is the sublist.  | `[1, 4] <@ [1, 2, 3, 4]`          | `true`               |
| `||`  | Similar to [`list_concat`](#::list_concatlist_1--list_n), except any `NULL` input results in `NULL`.                        | `[1, 2, 3] || [4, 5, 6]`          | `[1, 2, 3, 4, 5, 6]` |
| `<=>` | Alias for [`list_cosine_distance`](#::list_cosine_distancelist1-list2).                                                   | `[1, 2, 3] <=> [1, 2, 5]`         | `0.007416606`        |
| `<->` | Alias for [`list_distance`](#::list_distancelist1-list2).                                                                 | `[1, 2, 3] <-> [1, 2, 5]`         | `2.0`                |



#### List Comprehension {#docs:current:sql:functions:list::list-comprehension}

Python-style list comprehension can be used to compute expressions over elements in a list. For example:

```sql
SELECT [lower(x) FOR x IN strings] AS strings
FROM (VALUES (['Hello', '', 'World'])) t(strings);
```



|     strings      |
|------------------|
| [hello, , world] |

```sql
SELECT [upper(x) FOR x IN strings IF len(x) > 0] AS strings
FROM (VALUES (['Hello', '', 'World'])) t(strings);
```



|    strings     |
|----------------|
| [HELLO, WORLD] |

List comprehensions can also use the position of the list elements by adding a second variable.
In the following example, we use `x, i`, where `x` is the value and `i` is the position:

```sql
SELECT [4, 5, 6] AS l, [x FOR x, i IN l IF i != 2] AS filtered;
```



|     l     | filtered |
|-----------|----------|
| [4, 5, 6] | [4, 6]   |

Under the hood, `[f(x) FOR x IN l IF g(x)]` is translated to:

```sql
l.list_apply(lambda x, i: {'filter': g(x, i), 'result': f(x, i)})
    .list_filter(lambda x: x.filter)
    .list_apply(lambda x: x.result)
```

#### Range Functions {#docs:current:sql:functions:list::range-functions}

DuckDB offers two range functions, [`range(start, stop, step)`](#::range) and [`generate_series(start, stop, step)`](#::generate_series), and their variants with default arguments for `stop` and `step`. The two functions' behavior is different regarding their `stop` argument. This is documented below.

##### `range` {#docs:current:sql:functions:list::range}

The `range` function creates a list of values in the range between `start` and `stop`.
The `start` parameter is inclusive, while the `stop` parameter is exclusive.
The default value of `start` is 0 and the default value of `step` is 1.

Based on the number of arguments, the following variants of `range` exist.

###### `range(stop)` {#docs:current:sql:functions:list::rangestop}

```sql
SELECT range(5);
```

```text
[0, 1, 2, 3, 4]
```

###### `range(start, stop)` {#docs:current:sql:functions:list::rangestart-stop}

```sql
SELECT range(2, 5);
```

```text
[2, 3, 4]
```

###### `range(start, stop, step)` {#docs:current:sql:functions:list::rangestart-stop-step}

```sql
SELECT range(2, 5, 3);
```

```text
[2]
```

##### `generate_series` {#docs:current:sql:functions:list::generate_series}

The `generate_series` function creates a list of values in the range between `start` and `stop`.
Both the `start` and the `stop` parameters are inclusive.
The default value of `start` is 0 and the default value of `step` is 1.
Based on the number of arguments, the following variants of `generate_series` exist.

###### `generate_series(stop)` {#docs:current:sql:functions:list::generate_seriesstop}

```sql
SELECT generate_series(5);
```

```text
[0, 1, 2, 3, 4, 5]
```

###### `generate_series(start, stop)` {#docs:current:sql:functions:list::generate_seriesstart-stop}

```sql
SELECT generate_series(2, 5);
```

```text
[2, 3, 4, 5]
```

###### `generate_series(start, stop, step)` {#docs:current:sql:functions:list::generate_seriesstart-stop-step}

```sql
SELECT generate_series(2, 5, 3);
```

```text
[2, 5]
```

###### `generate_subscripts(arr, dim)` {#docs:current:sql:functions:list::generate_subscriptsarr-dim}

The `generate_subscripts(arr, dim)` function generates indexes along the `dim`th dimension of array `arr`.

```sql
SELECT generate_subscripts([4, 5, 6], 1) AS i;
```

| i |
|--:|
| 1 |
| 2 |
| 3 |

##### Date Ranges {#docs:current:sql:functions:list::date-ranges}

Date ranges are also supported for `TIMESTAMP` and `TIMESTAMP WITH TIME ZONE` values.
Note that for these types, the `stop` and `step` arguments have to be specified explicitly (a default value is not provided).

###### `range` for Date Ranges {#docs:current:sql:functions:list::range-for-date-ranges}

```sql
SELECT *
FROM range(DATE '1992-01-01', DATE '1992-03-01', INTERVAL '1' MONTH);
```

|        range        |
|---------------------|
| 1992-01-01 00:00:00 |
| 1992-02-01 00:00:00 |

###### `generate_series` for Date Ranges {#docs:current:sql:functions:list::generate_series-for-date-ranges}

```sql
SELECT *
FROM generate_series(DATE '1992-01-01', DATE '1992-03-01', INTERVAL '1' MONTH);
```

|   generate_series   |
|---------------------|
| 1992-01-01 00:00:00 |
| 1992-02-01 00:00:00 |
| 1992-03-01 00:00:00 |

#### Slicing {#docs:current:sql:functions:list::slicing}

The function [`list_slice`](#::list_slicelist-begin-end) can be used to extract a sublist from a list. The following variants exist:

* `list_slice(list, begin, end)`
* `list_slice(list, begin, end, step)`
* `array_slice(list, begin, end)`
* `array_slice(list, begin, end, step)`
* `list[begin:end]`
* `list[begin:end:step]`

The arguments are as follows:

* `list`
    * Is the list to be sliced
* `begin`
    * Is the index of the first element to be included in the slice
    * When `begin < 0` the index is counted from the end of the list
    * When `begin < 0` and `-begin > length`, `begin` is clamped to the beginning of the list
    * When `begin > length`, the result is an empty list
    * **Bracket Notation:** When `begin` is omitted, it defaults to the beginning of the list
* `end`
    * Is the index of the last element to be included in the slice
    * When `end < 0` the index is counted from the end of the list
    * When `end > length`, end is clamped to `length`
    * When `end < begin`, the result is an empty list
    * **Bracket Notation:** When `end` is omitted, it defaults to the end of the list. When `end` is omitted and a `step` is provided, `end` must be replaced with a `-`
* `step` *(optional)*
    * Is the step size between elements in the slice
    * When `step < 0` the slice is reversed, and `begin` and `end` are swapped
    * Must be non-zero

Examples:

```sql
SELECT list_slice([1, 2, 3, 4, 5], 2, 4);
```

```text
[2, 3, 4]
```

```sql
SELECT ([1, 2, 3, 4, 5])[2:4:2];
```

```text
[2, 4]
```

```sql
SELECT([1, 2, 3, 4, 5])[4:2:-2];
```

```text
[4, 2]
```

```sql
SELECT ([1, 2, 3, 4, 5])[:];
```

```text
[1, 2, 3, 4, 5]
```

```sql
SELECT ([1, 2, 3, 4, 5])[:-:2];
```

```text
[1, 3, 5]
```

```sql
SELECT ([1, 2, 3, 4, 5])[:-:-2];
```

```text
[5, 3, 1]
```

#### List Aggregates {#docs:current:sql:functions:list::list-aggregates}

The function [`list_aggregate`](#::list_aggregatelist-function_name-) allows the execution of arbitrary existing aggregate functions on the elements of a list. Its first argument is the list (column), its second argument is the aggregate function name, e.g., `min`, `histogram` or `sum`.

`list_aggregate` accepts additional arguments after the aggregate function name. These extra arguments are passed directly to the aggregate function, which serves as the second argument of `list_aggregate`.

Order-sensitive aggregate functions are applied in the order of the list. The `ORDER BY`, `DISTINCT` and `FILTER` clauses are not supported by `list_aggregate`.
They may instead be emulated using `list_sort`, `list_grade_up`, `list_select`, `list_distinct` and `list_filter`.

```sql
SELECT list_aggregate([1, 2, -4, NULL], 'min');
```

```text
-4
```

```sql
SELECT list_aggregate([2, 4, 8, 42], 'sum');
```

```text
56
```

```sql
SELECT list_aggregate([[1, 2], [NULL], [2, 10, 3]], 'last');
```

```text
[2, 10, 3]
```

```sql
SELECT list_aggregate([2, 4, 8, 42], 'string_agg', '|');
```

```text
2|4|8|42
```

##### `list_*` Rewrite Functions {#docs:current:sql:functions:list::list_-rewrite-functions}

The following is a list of existing rewrites. Rewrites simplify the use of the list aggregate function by only taking the list (column) as their argument. `list_avg`, `list_var_samp`, `list_var_pop`, `list_stddev_pop`, `list_stddev_samp`, `list_sem`, `list_approx_count_distinct`, `list_bit_xor`, `list_bit_or`, `list_bit_and`, `list_bool_and`, `list_bool_or`, `list_count`, `list_entropy`, `list_last`, `list_first`, `list_kurtosis`, `list_kurtosis_pop`, `list_min`, `list_max`, `list_product`, `list_skewness`, `list_sum`, `list_string_agg`, `list_mode`, `list_median`, `list_mad` and `list_histogram`.

```sql
SELECT list_min([1, 2, -4, NULL]);
```

```text
-4
```

```sql
SELECT list_sum([2, 4, 8, 42]);
```

```text
56
```

```sql
SELECT list_last([[1, 2], [NULL], [2, 10, 3]]);
```

```text
[2, 10, 3]
```

###### `array_to_string` {#docs:current:sql:functions:list::array_to_string}

Concatenates list/array elements using an optional delimiter.

```sql
SELECT array_to_string([1, 2, 3], '-') AS str;
```

```text
1-2-3
```

This is equivalent to the following SQL:

```sql
SELECT list_aggr([1, 2, 3], 'string_agg', '-') AS str;
```

```text
1-2-3
```

#### Sorting Lists {#docs:current:sql:functions:list::sorting-lists}

The function `list_sort` sorts the elements of a list either in ascending or descending order.
In addition, it allows specifying whether `NULL` values should be moved to the beginning or to the end of the list.
It has the same sorting behavior as DuckDB's `ORDER BY` clause.
Therefore, (nested) values compare the same in `list_sort` as in `ORDER BY`.

By default, if no modifiers are provided, DuckDB sorts `ASC NULLS LAST`.
I.e., the values are sorted in ascending order and `NULL` values are placed last.
This is identical to the default sort order of SQLite.
The default sort order can be changed using [`PRAGMA` statements.](#..:query_syntax:orderby).

`list_sort` leaves it open to the user whether they want to use the default sort order or a custom order.
`list_sort` takes up to two additional optional parameters.
The second parameter provides the sort order and can be either `ASC` or `DESC`.
The third parameter provides the `NULL` order and can be either `NULLS FIRST` or `NULLS LAST`.

This query uses the default sort order and the default `NULL` order.

```sql
SELECT list_sort([1, 3, NULL, 5, NULL, -5]);
```

```sql
[-5, 1, 3, 5, NULL, NULL]
```

This query provides the sort order.
The `NULL` order uses the configurable default value.

```sql
SELECT list_sort([1, 3, NULL, 2], 'ASC');
```

```sql
[1, 2, 3, NULL]
```

This query provides both the sort order and the `NULL` order.

```sql
SELECT list_sort([1, 3, NULL, 2], 'DESC', 'NULLS FIRST');
```

```sql
[NULL, 3, 2, 1]
```

`list_reverse_sort` has an optional second parameter providing the `NULL` sort order.
It can be either `NULLS FIRST` or `NULLS LAST`.

This query uses the default `NULL` sort order.

```sql
SELECT list_sort([1, 3, NULL, 5, NULL, -5]);
```

```sql
[-5, 1, 3, 5, NULL, NULL]
```

This query provides the `NULL` sort order.

```sql
SELECT list_reverse_sort([1, 3, NULL, 2], 'NULLS LAST');
```

```sql
[3, 2, 1, NULL]
```

#### Flattening {#docs:current:sql:functions:list::flattening}

The flatten function is a scalar function that converts a list of lists into a single list by concatenating each sub-list together.
Note that this only flattens one level at a time, not all levels of sub-lists.

Convert a list of lists into a single list:

```sql
SELECT
    flatten([
        [1, 2],
        [3, 4]
    ]);
```

```text
[1, 2, 3, 4]
```

If the list has multiple levels of lists, only the first level of sub-lists is concatenated into a single list:

```sql
SELECT
    flatten([
        [
            [1, 2],
            [3, 4],
        ],
        [
            [5, 6],
            [7, 8],
        ]
    ]);
```

```text
[[1, 2], [3, 4], [5, 6], [7, 8]]
```

In general, the input to the flatten function should be a list of lists (not a single level list).
However, the flatten function has specific behavior when handling empty lists and `NULL` values.

If the input list is empty, return an empty list:

```sql
SELECT flatten([]);
```

```text
[]
```

If the entire input to flatten is `NULL`, return `NULL`:

```sql
SELECT flatten(NULL);
```

```text
NULL
```

If a list whose only entry is `NULL` is flattened, return an empty list:

```sql
SELECT flatten([NULL]);
```

```text
[]
```

If the sub-list in a list of lists only contains `NULL`, do not modify the sub-list:

```sql
-- (Note the extra set of parentheses vs. the prior example)
SELECT flatten([[NULL]]);
```

```text
[NULL]
```

Even if the only contents of each sub-list is `NULL`, still concatenate them together. Note that no de-duplication occurs when flattening. See `list_distinct` function for de-duplication:

```sql
SELECT flatten([[NULL], [NULL]]);
```

```text
[NULL, NULL]
```

#### Lambda Functions {#docs:current:sql:functions:list::lambda-functions}

DuckDB supports lambda functions in the form `lambda parameter1, parameter2, ...:  expression`.
For details, see the [lambda functions page](#docs:current:sql:functions:lambda).

#### Related Functions {#docs:current:sql:functions:list::related-functions}

* The [aggregate functions](#docs:current:sql:functions:aggregates) `list` and `histogram` produce lists and lists of structs.
* The [`unnest` function](#docs:current:sql:query_syntax:unnest) is used to unnest a list by one level.

### Map Functions {#docs:current:sql:functions:map}



| Name | Description |
|:--|:-------|
| [`cardinality(map)`](#::cardinalitymap) | Return the size of the map (or the number of entries in the map). |
| [`element_at(map, key)`](#::element_atmap-key) | Return the value for a given `key` as a list, or an empty list if the key is not contained in the map. The type of the key provided in the second parameter must match the type of the map's keys; else, an error is thrown. |
| [`map_concat(maps...)`](#::map_concatmaps) | Returns a map created from merging the input `maps`. On key collision the value is taken from the last map with that key. |
| [`map_contains(map, key)`](#::map_containsmap-key) | Checks if a map contains a given key. |
| [`map_contains_entry(map, key, value)`](#::map_contains_entrymap-key-value) | Check if a map contains a given key-value pair. |
| [`map_contains_value(map, value)`](#::map_contains_valuemap-value) | Checks if a map contains a given value. |
| [`map_entries(map)`](#::map_entriesmap) | Return a list of struct(k, v) for each key-value pair in the map. |
| [`map_extract(map, key)`](#::map_extractmap-key) | Return the value for a given `key` as a list, or an empty list if the key is not contained in the map. The type of the key provided in the second parameter must match the type of the map's keys; else, an error is thrown. |
| [`map_extract_value(map, key)`](#::map_extract_valuemap-key) | Returns the value for a given `key` or `NULL` if the `key` is not contained in the map. The type of the key provided in the second parameter must match the type of the map's keys; else, an error is thrown. |
| [`map_from_entries(STRUCT(k, v)[])`](#map_from_entriesstructk-v) | Returns a map created from the entries of the array. |
| [`map_keys(map)`](#::map_keysmap) | Return a list of all keys in the map. |
| [`map_values(map)`](#::map_valuesmap) | Return a list of all values in the map. |
| [`map()`](#::map) | Returns an empty map. |
| [`map[entry]`](#mapentry) | Returns the value for a given `key` or `NULL` if the `key` is not contained in the map. The type of the key provided in the second parameter must match the type of the map's keys; else, an error is thrown. |

###### `cardinality(map)` {#docs:current:sql:functions:map::cardinalitymap}



|   |   |
|:--|:--------|
| **Description** |Return the size of the map (or the number of entries in the map). |
| **Example** | `cardinality(map([4, 2], ['a', 'b']))` |
| **Result** | `2` |

###### `element_at(map, key)` {#docs:current:sql:functions:map::element_atmap-key}



|   |   |
|:--|:--------|
| **Description** |Return the value for a given `key` as a list, or an empty list if the key is not contained in the map. The type of the key provided in the second parameter must match the type of the map's keys; else, an error is thrown. |
| **Example** | `element_at(map([100, 5], [42, 43]), 100)` |
| **Result** | `[42]` |
| **Aliases** | `map_extract(map, key)` |

###### `map_concat(maps...)` {#docs:current:sql:functions:map::map_concatmaps}



|   |   |
|:--|:--------|
| **Description** |Returns a map created from merging the input `maps`. On key collision the value is taken from the last map with that key. |
| **Example** | `map_concat(MAP {'key1': 10, 'key2': 20}, MAP {'key3': 30}, MAP {'key2': 5})` |
| **Result** | `{key1=10, key2=5, key3=30}` |

###### `map_contains(map, key)` {#docs:current:sql:functions:map::map_containsmap-key}



|   |   |
|:--|:--------|
| **Description** |Checks if a map contains a given key. |
| **Example** | `map_contains(MAP {'key1': 10, 'key2': 20, 'key3': 30}, 'key2')` |
| **Result** | `true` |

###### `map_contains_entry(map, key, value)` {#docs:current:sql:functions:map::map_contains_entrymap-key-value}



|   |   |
|:--|:--------|
| **Description** |Check if a map contains a given key-value pair. |
| **Example** | `map_contains_entry(MAP {'key1': 10, 'key2': 20, 'key3': 30}, 'key2', 20)` |
| **Result** | `true` |

###### `map_contains_value(map, value)` {#docs:current:sql:functions:map::map_contains_valuemap-value}



|   |   |
|:--|:--------|
| **Description** |Checks if a map contains a given value. |
| **Example** | `map_contains_value(MAP {'key1': 10, 'key2': 20, 'key3': 30}, 20)` |
| **Result** | `true` |

###### `map_entries(map)` {#docs:current:sql:functions:map::map_entriesmap}



|   |   |
|:--|:--------|
| **Description** |Return a list of struct(k, v) for each key-value pair in the map. |
| **Example** | `map_entries(map([100, 5], [42, 43]))` |
| **Result** | `[{'key': 100, 'value': 42}, {'key': 5, 'value': 43}]` |

###### `map_extract(map, key)` {#docs:current:sql:functions:map::map_extractmap-key}



|   |   |
|:--|:--------|
| **Description** |Return the value for a given `key` as a list, or `NULL` if the key is not contained in the map. The type of the key provided in the second parameter must match the type of the map's keys else an error is returned. |
| **Example** | `map_extract(map([100, 5], [42, 43]), 100)` |
| **Result** | `[42]` |
| **Aliases** | `element_at(map, key)` |

###### `map_extract_value(map, key)` {#docs:current:sql:functions:map::map_extract_valuemap-key}



|   |   |
|:--|:--------|
| **Description** |Returns the value for a given `key` or `NULL` if the `key` is not contained in the map. The type of the key provided in the second parameter must match the type of the map's keys; else, an error is thrown. |
| **Example** | `map_extract_value(map([100, 5], [42, 43]), 100);` |
| **Result** | `42` |
| **Aliases** | `map[key]` |

###### `map_from_entries(STRUCT(k, v)[])` {#docs:current:sql:functions:map::map_from_entriesstructk-v}



|   |   |
|:--|:--------|
| **Description** |Returns a map created from the entries of the array. |
| **Example** | `map_from_entries([{k: 5, v: 'val1'}, {k: 3, v: 'val2'}])` |
| **Result** | `{5=val1, 3=val2}` |

###### `map_keys(map)` {#docs:current:sql:functions:map::map_keysmap}



|   |   |
|:--|:--------|
| **Description** |Return a list of all keys in the map. |
| **Example** | `map_keys(map([100, 5], [42,43]))` |
| **Result** | `[100, 5]` |

###### `map_values(map)` {#docs:current:sql:functions:map::map_valuesmap}



|   |   |
|:--|:--------|
| **Description** |Return a list of all values in the map. |
| **Example** | `map_values(map([100, 5], [42, 43]))` |
| **Result** | `[42, 43]` |

###### `map()` {#docs:current:sql:functions:map::map}



|   |   |
|:--|:--------|
| **Description** |Returns an empty map. |
| **Example** | `map()` |
| **Result** | `{}` |

###### `map[entry]` {#docs:current:sql:functions:map::mapentry}



|   |   |
|:--|:--------|
| **Description** |Returns the value for a given `key` or `NULL` if the `key` is not contained in the map. The type of the key provided in the second parameter must match the type of the map's keys; else, an error is thrown. |
| **Example** | `map([100, 5], ['a', 'b'])[100]` |
| **Result** | `a` |
| **Aliases** | `map_extract_value(map, key)` |

### Nested Functions {#docs:current:sql:functions:nested}

There are five [nested data types](#docs:current:sql:data_types:overview::nested--composite-types):

| Name | Type page | Functions page |
|--|---|---|
| `ARRAY`  | [`ARRAY` type](#docs:current:sql:data_types:array)   | [`ARRAY` functions](#docs:current:sql:functions:array)   |
| `LIST`   | [`LIST` type](#docs:current:sql:data_types:list)     | [`LIST` functions](#docs:current:sql:functions:list)     |
| `MAP`    | [`MAP` type](#docs:current:sql:data_types:map)       | [`MAP` functions](#docs:current:sql:functions:map)       |
| `STRUCT` | [`STRUCT` type](#docs:current:sql:data_types:struct) | [`STRUCT` functions](#docs:current:sql:functions:struct) |
| `UNION`  | [`UNION` type](#docs:current:sql:data_types:union)   | [`UNION` functions](#docs:current:sql:functions:union)   |

### Numeric Functions {#docs:current:sql:functions:numeric}



#### Numeric Operators {#docs:current:sql:functions:numeric::numeric-operators}

The table below shows the available mathematical operators for [numeric types](#docs:current:sql:data_types:numeric).



| Operator | Description | Example | Result |
|-|-----|--|-|
| `+`      | Addition                  | `2 + 3`   | `5`   |
| `-`      | Subtraction               | `2 - 3`   | `-1`  |
| `*`      | Multiplication            | `2 * 3`   | `6`   |
| `/`      | Float division            | `5 / 2`   | `2.5` |
| `//`     | Division                  | `5 // 2`  | `2`   |
| `%`      | Modulo (remainder)        | `5 % 4`   | `1`   |
| `**`     | Exponent                  | `3 ** 4`  | `81`  |
| `^`      | Exponent (alias for `**`) | `3 ^ 4`   | `81`  |
| `&`      | Bitwise AND               | `91 & 15` | `11`  |
| `|`      | Bitwise OR                | `32 | 3`  | `35`  |
| `<<`     | Bitwise shift left        | `1 << 4`  | `16`  |
| `>>`     | Bitwise shift right       | `8 >> 2`  | `2`   |
| `~`      | Bitwise negation          | `~15`     | `-16` |
| `!`      | Factorial of `x`          | `4!`      | `24`  |



##### Division and Modulo Operators {#docs:current:sql:functions:numeric::division-and-modulo-operators}

There are two division operators: `/` and `//`.
They are equivalent when at least one of the operands is a `FLOAT` or a `DOUBLE`.
When both operands are integers, `/` performs floating points division (` 5 / 2 = 2.5`) while `//` performs integer division (` 5 // 2 = 2`).

##### Supported Types {#docs:current:sql:functions:numeric::supported-types}

The modulo, bitwise, negation, and factorial operators work only on integral data types,
whereas the others are available for all numeric data types.

#### Numeric Functions {#docs:current:sql:functions:numeric::numeric-functions}

The table below shows the available mathematical functions.

| Name | Description |
|:--|:-------|
| [`@(x)`](#::x) | Absolute value. Parentheses are optional if `x` is a column name. |
| [`abs(x)`](#::absx) | Absolute value. |
| [`acos(x)`](#::acosx) | Computes the inverse cosine of `x`. |
| [`acosh(x)`](#::acoshx) | Computes the inverse hyperbolic cosine of `x`. |
| [`add(x, y)`](#::addx-y) | Alias for `x + y`. |
| [`asin(x)`](#::asinx) | Computes the inverse sine of `x`. |
| [`asinh(x)`](#::asinhx) | Computes the inverse hyperbolic sine of `x`. |
| [`atan(x)`](#::atanx) | Computes the inverse tangent of `x`. |
| [`atanh(x)`](#::atanhx) | Computes the inverse hyperbolic tangent of `x`. |
| [`atan2(y, x)`](#::atan2y-x) | Computes the inverse tangent of `(y, x)`. |
| [`bit_count(x)`](#::bit_countx) | Returns the number of bits that are set. |
| [`cbrt(x)`](#::cbrtx) | Returns the cube root of the number. |
| [`ceil(x)`](#::ceilx) | Rounds the number up. |
| [`ceiling(x)`](#::ceilingx) | Rounds the number up. Alias of `ceil`. |
| [`cos(x)`](#::cosx) | Computes the cosine of `x`. |
| [`cot(x)`](#::cotx) | Computes the cotangent of `x`. |
| [`degrees(x)`](#::degreesx) | Converts radians to degrees. |
| [`divide(x, y)`](#::dividex-y) | Alias for `x // y`. |
| [`even(x)`](#::evenx) | Round to next even number by rounding away from zero. |
| [`exp(x)`](#::expx) | Computes `e ** x`. |
| [`factorial(x)`](#::factorialx) | See the `!` operator. Computes the product of the current integer and all integers below it. |
| [`fdiv(x, y)`](#::fdivx-y) | Performs integer division (` x // y`) but returns a `DOUBLE` value. |
| [`floor(x)`](#::floorx) | Rounds the number down. |
| [`fmod(x, y)`](#::fmodx-y) | Calculates the modulo value. Always returns a `DOUBLE` value. |
| [`gamma(x)`](#::gammax) | Interpolation of the factorial of `x - 1`. Fractional inputs are allowed. |
| [`gcd(x, y)`](#::gcdx-y) | Computes the greatest common divisor of `x` and `y`. |
| [`greatest_common_divisor(x, y)`](#::greatest_common_divisorx-y) | Computes the greatest common divisor of `x` and `y`. |
| [`greatest(x1, x2, ...)`](#::greatestx1-x2-) | Selects the largest value. |
| [`isfinite(x)`](#::isfinitex) | Returns true if the floating point value is finite, false otherwise. |
| [`isinf(x)`](#::isinfx) | Returns true if the floating point value is infinite, false otherwise. |
| [`isnan(x)`](#::isnanx) | Returns true if the floating point value is not a number, false otherwise. |
| [`lcm(x, y)`](#::lcmx-y) | Computes the least common multiple of `x` and `y`. |
| [`least_common_multiple(x, y)`](#::least_common_multiplex-y) | Computes the least common multiple of `x` and `y`. |
| [`least(x1, x2, ...)`](#::leastx1-x2-) | Selects the smallest value. |
| [`lgamma(x)`](#::lgammax) | Computes the log of the `gamma` function. |
| [`ln(x)`](#::lnx) | Computes the natural logarithm of `x`. |
| [`log(x)`](#::logx) | Computes the base-10 logarithm of `x`. |
| [`log10(x)`](#::log10x) | Alias of `log`. Computes the base-10 logarithm of `x`. |
| [`log2(x)`](#::log2x) | Computes the base-2 log of `x`. |
| [`multiply(x, y)`](#::multiplyx-y) | Alias for `x * y`. |
| [`nextafter(x, y)`](#::nextafterx-y) | Return the next floating point value after `x` in the direction of `y`. |
| [`pi()`](#::pi) | Returns the value of pi. |
| [`pow(x, y)`](#::powx-y) | Computes `x` to the power of `y`. |
| [`power(x, y)`](#::powerx-y) | Alias of `pow`. Computes `x` to the power of `y`. |
| [`radians(x)`](#::radiansx) | Converts degrees to radians. |
| [`random()`](#::random) | Returns a random number `x` in the range `0.0 <= x < 1.0`. |
| [`round_even(v NUMERIC, s INTEGER)`](#::round_evenv-numeric-s-integer) | Alias of `roundbankers(v, s)`. Round to `s` decimal places using the [_rounding half to even_ rule](https://en.wikipedia.org/wiki/Rounding#Rounding_half_to_even). Values `s < 0` are allowed. |
| [`roundbankers(v NUMERIC, s INTEGER)`](#::round_evenv-numeric-s-integer) | Alias of `round_even(v, s)`. Round to `s` decimal places using the [_rounding half to even_ rule](https://en.wikipedia.org/wiki/Rounding#Rounding_half_to_even). Values `s < 0` are allowed. |
| [`round(v NUMERIC, s INTEGER)`](#::roundv-numeric-s-integer) | Round to `s` decimal places. Values `s < 0` are allowed. |
| [`setseed(x)`](#::setseedx) | Sets the seed to be used for the random function. |
| [`sign(x)`](#::signx) | Returns the sign of `x` as -1, 0 or 1. |
| [`signbit(x)`](#::signbitx) | Returns whether the signbit is set or not. |
| [`sin(x)`](#::sinx) | Computes the sin of `x`. |
| [`sqrt(x)`](#::sqrtx) | Returns the square root of the number. |
| [`subtract(x, y)`](#::subtractx-y) | Alias for `x - y`. |
| [`tan(x)`](#::tanx) | Computes the tangent of `x`. |
| [`trunc(x)`](#::truncx) | Truncates the number. |
| [`xor(x, y)`](#::xorx-y) | Bitwise XOR. |

###### `@(x)` {#docs:current:sql:functions:numeric::x}



|   |   |
|:--|:--------|
| **Description** |Absolute value. Parentheses are optional if `x` is a column name. |
| **Example** | `@(-17.4)` |
| **Result** | `17.4` |
| **Alias** | `abs` |

###### `abs(x)` {#docs:current:sql:functions:numeric::absx}



|   |   |
|:--|:--------|
| **Description** |Absolute value. |
| **Example** | `abs(-17.4)` |
| **Result** | `17.4` |
| **Alias** | `@` |

###### `acos(x)` {#docs:current:sql:functions:numeric::acosx}



|   |   |
|:--|:--------|
| **Description** |Computes the inverse cosine of `x`. |
| **Example** | `acos(0.5)` |
| **Result** | `1.0471975511965976` |

###### `acosh(x)` {#docs:current:sql:functions:numeric::acoshx}



|   |   |
|:--|:--------|
| **Description** |Computes the inverse hyperbolic cosine of `x`. |
| **Example** | `acosh(1.5)` |
| **Result** | `0.9624236501192069` |

###### `add(x, y)` {#docs:current:sql:functions:numeric::addx-y}



|   |   |
|:--|:--------|
| **Description** |Alias for `x + y`. |
| **Example** | `add(2, 3)` |
| **Result** | `5` |

###### `asin(x)` {#docs:current:sql:functions:numeric::asinx}



|   |   |
|:--|:--------|
| **Description** |Computes the inverse sine of `x`. |
| **Example** | `asin(0.5)` |
| **Result** | `0.5235987755982989` |

###### `asinh(x)` {#docs:current:sql:functions:numeric::asinhx}



|   |   |
|:--|:--------|
| **Description** |Computes the inverse hyperbolic sine of `x`. |
| **Example** | `asinh(0.5)` |
| **Result** | `0.48121182505960347` |

###### `atan(x)` {#docs:current:sql:functions:numeric::atanx}



|   |   |
|:--|:--------|
| **Description** |Computes the inverse tangent of `x`. |
| **Example** | `atan(0.5)` |
| **Result** | `0.4636476090008061` |

###### `atanh(x)` {#docs:current:sql:functions:numeric::atanhx}



|   |   |
|:--|:--------|
| **Description** |Computes the inverse hyperbolic tangent of `x`. |
| **Example** | `atanh(0.5)` |
| **Result** | `0.5493061443340549` |

###### `atan2(y, x)` {#docs:current:sql:functions:numeric::atan2y-x}



|   |   |
|:--|:--------|
| **Description** |Computes the inverse tangent (y, x). |
| **Example** | `atan2(0.5, 0.5)` |
| **Result** | `0.7853981633974483` |

###### `bit_count(x)` {#docs:current:sql:functions:numeric::bit_countx}



|   |   |
|:--|:--------|
| **Description** |Returns the number of bits that are set. |
| **Example** | `bit_count(31)` |
| **Result** | `5` |

###### `cbrt(x)` {#docs:current:sql:functions:numeric::cbrtx}



|   |   |
|:--|:--------|
| **Description** |Returns the cube root of the number. |
| **Example** | `cbrt(8)` |
| **Result** | `2` |

###### `ceil(x)` {#docs:current:sql:functions:numeric::ceilx}



|   |   |
|:--|:--------|
| **Description** |Rounds the number up. |
| **Example** | `ceil(17.4)` |
| **Result** | `18` |

###### `ceiling(x)` {#docs:current:sql:functions:numeric::ceilingx}



|   |   |
|:--|:--------|
| **Description** |Rounds the number up. Alias of `ceil`. |
| **Example** | `ceiling(17.4)` |
| **Result** | `18` |

###### `cos(x)` {#docs:current:sql:functions:numeric::cosx}



|   |   |
|:--|:--------|
| **Description** |Computes the cosine of `x`. |
| **Example** | `cos(pi() / 3)` |
| **Result** | `0.5000000000000001 ` |

###### `cot(x)` {#docs:current:sql:functions:numeric::cotx}



|   |   |
|:--|:--------|
| **Description** |Computes the cotangent of `x`. |
| **Example** | `cot(0.5)` |
| **Result** | `1.830487721712452` |

###### `degrees(x)` {#docs:current:sql:functions:numeric::degreesx}



|   |   |
|:--|:--------|
| **Description** |Converts radians to degrees. |
| **Example** | `degrees(pi())` |
| **Result** | `180` |

###### `divide(x, y)` {#docs:current:sql:functions:numeric::dividex-y}



|   |   |
|:--|:--------|
| **Description** |Alias for `x // y`. |
| **Example** | `divide(5, 2)` |
| **Result** | `2` |

###### `even(x)` {#docs:current:sql:functions:numeric::evenx}



|   |   |
|:--|:--------|
| **Description** |Round to next even number by rounding away from zero. |
| **Example** | `even(2.9)` |
| **Result** | `4` |

###### `exp(x)` {#docs:current:sql:functions:numeric::expx}



|   |   |
|:--|:--------|
| **Description** |Computes `e ** x`. |
| **Example** | `exp(0.693)` |
| **Result** | `2` |

###### `factorial(x)` {#docs:current:sql:functions:numeric::factorialx}



|   |   |
|:--|:--------|
| **Description** |See the `!` operator. Computes the product of the current integer and all integers below it. |
| **Example** | `factorial(4)` |
| **Result** | `24` |

###### `fdiv(x, y)` {#docs:current:sql:functions:numeric::fdivx-y}



|   |   |
|:--|:--------|
| **Description** |Performs integer division (` x // y`) but returns a `DOUBLE` value. |
| **Example** | `fdiv(5, 2)` |
| **Result** | `2.0` |

###### `floor(x)` {#docs:current:sql:functions:numeric::floorx}



|   |   |
|:--|:--------|
| **Description** |Rounds the number down. |
| **Example** | `floor(17.4)` |
| **Result** | `17` |

###### `fmod(x, y)` {#docs:current:sql:functions:numeric::fmodx-y}



|   |   |
|:--|:--------|
| **Description** |Calculates the modulo value. Always returns a `DOUBLE` value. |
| **Example** | `fmod(5, 2)` |
| **Result** | `1.0` |

###### `gamma(x)` {#docs:current:sql:functions:numeric::gammax}



|   |   |
|:--|:--------|
| **Description** |Interpolation of the factorial of `x - 1`. Fractional inputs are allowed. |
| **Example** | `gamma(5.5)` |
| **Result** | `52.34277778455352` |

###### `gcd(x, y)` {#docs:current:sql:functions:numeric::gcdx-y}



|   |   |
|:--|:--------|
| **Description** |Computes the greatest common divisor of `x` and `y`. |
| **Example** | `gcd(42, 57)` |
| **Result** | `3` |

###### `greatest_common_divisor(x, y)` {#docs:current:sql:functions:numeric::greatest_common_divisorx-y}



|   |   |
|:--|:--------|
| **Description** |Computes the greatest common divisor of `x` and `y`. |
| **Example** | `greatest_common_divisor(42, 57)` |
| **Result** | `3` |

###### `greatest(x1, x2, ...)` {#docs:current:sql:functions:numeric::greatestx1-x2-}



|   |   |
|:--|:--------|
| **Description** |Selects the largest value. |
| **Example** | `greatest(3, 2, 4, 4)` |
| **Result** | `4` |

###### `isfinite(x)` {#docs:current:sql:functions:numeric::isfinitex}



|   |   |
|:--|:--------|
| **Description** |Returns true if the floating point value is finite, false otherwise. |
| **Example** | `isfinite(5.5)` |
| **Result** | `true` |

###### `isinf(x)` {#docs:current:sql:functions:numeric::isinfx}



|   |   |
|:--|:--------|
| **Description** |Returns true if the floating point value is infinite, false otherwise. |
| **Example** | `isinf('Infinity'::float)` |
| **Result** | `true` |

###### `isnan(x)` {#docs:current:sql:functions:numeric::isnanx}



|   |   |
|:--|:--------|
| **Description** |Returns true if the floating point value is not a number, false otherwise. |
| **Example** | `isnan('NaN'::float)` |
| **Result** | `true` |

###### `lcm(x, y)` {#docs:current:sql:functions:numeric::lcmx-y}



|   |   |
|:--|:--------|
| **Description** |Computes the least common multiple of `x` and `y`. |
| **Example** | `lcm(42, 57)` |
| **Result** | `798` |

###### `least_common_multiple(x, y)` {#docs:current:sql:functions:numeric::least_common_multiplex-y}



|   |   |
|:--|:--------|
| **Description** |Computes the least common multiple of `x` and `y`. |
| **Example** | `least_common_multiple(42, 57)` |
| **Result** | `798` |

###### `least(x1, x2, ...)` {#docs:current:sql:functions:numeric::leastx1-x2-}



|   |   |
|:--|:--------|
| **Description** |Selects the smallest value. |
| **Example** | `least(3, 2, 4, 4)` |
| **Result** | `2` |

###### `lgamma(x)` {#docs:current:sql:functions:numeric::lgammax}



|   |   |
|:--|:--------|
| **Description** |Computes the log of the `gamma` function. |
| **Example** | `lgamma(2)` |
| **Result** | `0` |

###### `ln(x)` {#docs:current:sql:functions:numeric::lnx}



|   |   |
|:--|:--------|
| **Description** |Computes the natural logarithm of `x`. |
| **Example** | `ln(2)` |
| **Result** | `0.693` |

###### `log(x)` {#docs:current:sql:functions:numeric::logx}



|   |   |
|:--|:--------|
| **Description** |Computes the base-10 log of `x`. |
| **Example** | `log(100)` |
| **Result** | `2` |

###### `log10(x)` {#docs:current:sql:functions:numeric::log10x}



|   |   |
|:--|:--------|
| **Description** |Alias of `log`. Computes the base-10 log of `x`. |
| **Example** | `log10(1000)` |
| **Result** | `3` |

###### `log2(x)` {#docs:current:sql:functions:numeric::log2x}



|   |   |
|:--|:--------|
| **Description** |Computes the base-2 log of `x`. |
| **Example** | `log2(8)` |
| **Result** | `3` |

###### `multiply(x, y)` {#docs:current:sql:functions:numeric::multiplyx-y}



|   |   |
|:--|:--------|
| **Description** |Alias for `x * y`. |
| **Example** | `multiply(2, 3)` |
| **Result** | `6` |

###### `nextafter(x, y)` {#docs:current:sql:functions:numeric::nextafterx-y}



|   |   |
|:--|:--------|
| **Description** |Return the next floating point value after `x` in the direction of `y`. |
| **Example** | `nextafter(1::float, 2::float)` |
| **Result** | `1.0000001` |

###### `pi()` {#docs:current:sql:functions:numeric::pi}



|   |   |
|:--|:--------|
| **Description** |Returns the value of pi. |
| **Example** | `pi()` |
| **Result** | `3.141592653589793` |

###### `pow(x, y)` {#docs:current:sql:functions:numeric::powx-y}



|   |   |
|:--|:--------|
| **Description** |Computes `x` to the power of `y`. |
| **Example** | `pow(2, 3)` |
| **Result** | `8` |

###### `power(x, y)` {#docs:current:sql:functions:numeric::powerx-y}



|   |   |
|:--|:--------|
| **Description** |Alias of `pow`. Computes `x` to the power of `y`. |
| **Example** | `power(2, 3)` |
| **Result** | `8` |

###### `radians(x)` {#docs:current:sql:functions:numeric::radiansx}



|   |   |
|:--|:--------|
| **Description** |Converts degrees to radians. |
| **Example** | `radians(90)` |
| **Result** | `1.5707963267948966` |

###### `random()` {#docs:current:sql:functions:numeric::random}



|   |   |
|:--|:--------|
| **Description** |Returns a random number `x` in the range `0.0 <= x < 1.0`. |
| **Example** | `random()` |
| **Result** | various |

###### `round_even(v NUMERIC, s INTEGER)` {#docs:current:sql:functions:numeric::round_evenv-numeric-s-integer}



|   |   |
|:--|:--------|
| **Description** |Alias of `roundbankers(v, s)`. Round to `s` decimal places using the [_rounding half to even_ rule](https://en.wikipedia.org/wiki/Rounding#Rounding_half_to_even). Values `s < 0` are allowed. |
| **Example** | `round_even(24.5, 0)` |
| **Result** | `24.0` |

###### `roundbankers(v NUMERIC, s INTEGER)` {#docs:current:sql:functions:numeric::roundbankersv-numeric-s-integer}



|   |   |
|:--|:--------|
| **Description** |Alias of `round_even(v, s)`. Round to `s` decimal places using the [_rounding half to even_ rule](https://en.wikipedia.org/wiki/Rounding#Rounding_half_to_even). Values `s < 0` are allowed. |
| **Example** | `roundbankers(24.5, 0)` |
| **Result** | `24.0` |

###### `round(v NUMERIC, s INTEGER)` {#docs:current:sql:functions:numeric::roundv-numeric-s-integer}



|   |   |
|:--|:--------|
| **Description** |Round to `s` decimal places. Values `s < 0` are allowed. |
| **Example** | `round(42.4332, 2)` |
| **Result** | `42.43` |

###### `setseed(x)` {#docs:current:sql:functions:numeric::setseedx}



|   |   |
|:--|:--------|
| **Description** |Sets the seed to be used for the random function. |
| **Example** | `setseed(0.42)` |

###### `sign(x)` {#docs:current:sql:functions:numeric::signx}



|   |   |
|:--|:--------|
| **Description** |Returns the sign of `x` as -1, 0 or 1. |
| **Example** | `sign(-349)` |
| **Result** | `-1` |

###### `signbit(x)` {#docs:current:sql:functions:numeric::signbitx}



|   |   |
|:--|:--------|
| **Description** |Returns whether the signbit is set or not. |
| **Example** | `signbit(-1.0)` |
| **Result** | `true` |

###### `sin(x)` {#docs:current:sql:functions:numeric::sinx}



|   |   |
|:--|:--------|
| **Description** |Computes the sin of `x`. |
| **Example** | `sin(pi() / 6)` |
| **Result** | `0.49999999999999994` |

###### `sqrt(x)` {#docs:current:sql:functions:numeric::sqrtx}



|   |   |
|:--|:--------|
| **Description** |Returns the square root of the number. |
| **Example** | `sqrt(9)` |
| **Result** | `3` |

###### `subtract(x, y)` {#docs:current:sql:functions:numeric::subtractx-y}



|   |   |
|:--|:--------|
| **Description** |Alias for `x - y`. |
| **Example** | `subtract(2, 3)` |
| **Result** | `-1` |

###### `tan(x)` {#docs:current:sql:functions:numeric::tanx}



|   |   |
|:--|:--------|
| **Description** |Computes the tangent of `x`. |
| **Example** | `tan(pi() / 4)` |
| **Result** | `0.9999999999999999` |

###### `trunc(x)` {#docs:current:sql:functions:numeric::truncx}



|   |   |
|:--|:--------|
| **Description** |Truncates the number. |
| **Example** | `trunc(17.4)` |
| **Result** | `17` |

###### `xor(x, y)` {#docs:current:sql:functions:numeric::xorx-y}



|   |   |
|:--|:--------|
| **Description** |Bitwise XOR. |
| **Example** | `xor(17, 5)` |
| **Result** | `20` |

### Pattern Matching {#docs:current:sql:functions:pattern_matching}

There are four separate approaches to pattern matching provided by DuckDB:
the traditional SQL [`LIKE` operator](#::like),
the more recent [`SIMILAR TO` operator](#::similar-to) (added in SQL:1999),
a [`GLOB` operator](#::glob),
and POSIX-style [regular expressions](#::regular-expressions).

#### `LIKE` {#docs:current:sql:functions:pattern_matching::like}



The `LIKE` expression returns `true` if the string matches the supplied pattern. (As expected, the `NOT LIKE` expression returns `false` if `LIKE` returns `true`, and vice versa. An equivalent expression is `NOT (string LIKE pattern)`.)

If pattern does not contain percent signs or underscores, then the pattern only represents the string itself; in that case `LIKE` acts like the equals operator. An underscore (` _`) in pattern stands for (matches) any single character; a percent sign (` %`) matches any sequence of zero or more characters.

`LIKE` pattern matching always covers the entire string. Therefore, if it's desired to match a sequence anywhere within a string, the pattern must start and end with a percent sign.

Some examples:

```sql
SELECT 'abc' LIKE 'abc';    -- true
SELECT 'abc' LIKE 'a%' ;    -- true
SELECT 'abc' LIKE '_b_';    -- true
SELECT 'abc' LIKE 'c';      -- false
SELECT 'abc' LIKE 'c%' ;    -- false
SELECT 'abc' LIKE '%c';     -- true
SELECT 'abc' NOT LIKE '%c'; -- false
```

The keyword `ILIKE` can be used instead of `LIKE` to make the match case-insensitive according to the active locale:

```sql
SELECT 'abc' ILIKE '%C'; -- true
```

```sql
SELECT 'abc' NOT ILIKE '%C'; -- false
```

To search within a string for a character that is a wildcard (` %` or `_`), the pattern must use an `ESCAPE` clause and an escape character to indicate the wildcard should be treated as a literal character instead of a wildcard. See an example below.

Additionally, the function `like_escape` has the same functionality as a `LIKE` expression with an `ESCAPE` clause, but using function syntax. See the [Text Functions Docs](#docs:current:sql:functions:text) for details.

Search for strings with 'a' then a literal percent sign then 'c':

```sql
SELECT 'a%c' LIKE 'a$%c' ESCAPE '$'; -- true
SELECT 'azc' LIKE 'a$%c' ESCAPE '$'; -- false
```

Case-insensitive ILIKE with ESCAPE:

```sql
SELECT 'A%c' ILIKE 'a$%c' ESCAPE '$'; -- true
```

There are also alternative characters that can be used as keywords in place of `LIKE` expressions. These enhance PostgreSQL compatibility.



| PostgreSQL-style | `LIKE`-style |
| :--------------- | :----------- |
| `~~`             | `LIKE`       |
| `!~~`            | `NOT LIKE`   |
| `~~*`            | `ILIKE`      |
| `!~~*`           | `NOT ILIKE`  |

#### `SIMILAR TO` {#docs:current:sql:functions:pattern_matching::similar-to}



The `SIMILAR TO` operator returns true or false depending on whether its pattern matches the given string. It is similar to `LIKE`, except that it interprets the pattern using a [regular expression](#docs:current:sql:functions:regular_expressions). Like `LIKE`, the `SIMILAR TO` operator succeeds only if its pattern matches the entire string; this is unlike common regular expression behavior where the pattern can match any part of the string.

A regular expression is a character sequence that is an abbreviated definition of a set of strings (a regular set). A string is said to match a regular expression if it is a member of the regular set described by the regular expression. As with `LIKE`, pattern characters match string characters exactly unless they are special characters in the regular expression language — but regular expressions use different special characters than `LIKE` does.

Some examples:

```sql
SELECT 'abc' SIMILAR TO 'abc';       -- true
SELECT 'abc' SIMILAR TO 'a';         -- false
SELECT 'abc' SIMILAR TO '.*(b|d).*'; -- true
SELECT 'abc' SIMILAR TO '(b|c).*';   -- false
SELECT 'abc' NOT SIMILAR TO 'abc';   -- false
```

> In PostgreSQL, `~` is equivalent to `SIMILAR TO`
> and `!~` is equivalent to `NOT SIMILAR TO`.
> In DuckDB, these equivalences do not hold currently,
> see the [PostgreSQL compatibility page](#docs:current:sql:dialect:postgresql_compatibility).

#### Globbing {#docs:current:sql:functions:pattern_matching::globbing}

DuckDB supports file name expansion, also known as globbing, for discovering files.
DuckDB's glob syntax uses the question mark (` ?`) wildcard to match any single character and the asterisk (` *`) to match zero or more characters.
In addition, you can use the bracket syntax (` [...]`) to match any single character contained within the brackets, or within the character range specified by the brackets. An exclamation mark (` !`) may be used inside the first bracket to search for a character that is not contained within the brackets.
To learn more, visit the [“glob (programming)” Wikipedia page](https://en.wikipedia.org/wiki/Glob_(programming)).

##### `GLOB` {#docs:current:sql:functions:pattern_matching::glob}



The `GLOB` operator returns `true` or `false` if the string matches the `GLOB` pattern. The `GLOB` operator is most commonly used when searching for filenames that follow a specific pattern (for example a specific file extension).

Some examples:

```sql
SELECT 'best.txt' GLOB '*.txt';            -- true
SELECT 'best.txt' GLOB '????.txt';         -- true
SELECT 'best.txt' GLOB '?.txt';            -- false
SELECT 'best.txt' GLOB '[abc]est.txt';     -- true
SELECT 'best.txt' GLOB '[a-z]est.txt';     -- true
```

The bracket syntax is case-sensitive:

```sql
SELECT 'Best.txt' GLOB '[a-z]est.txt';     -- false
SELECT 'Best.txt' GLOB '[a-zA-Z]est.txt';  -- true
```

The `!` applies to all characters within the brackets:

```sql
SELECT 'Best.txt' GLOB '[!a-zA-Z]est.txt'; -- false
```

To negate a GLOB operator, negate the entire expression:

```sql
SELECT NOT 'best.txt' GLOB '*.txt';        -- false
```

Three tildes (` ~~~`) may also be used in place of the `GLOB` keyword.

| GLOB-style | Symbolic-style |
| :--------- | :------------- |
| `GLOB`     | `~~~`          |

##### Glob Function to Find Filenames {#docs:current:sql:functions:pattern_matching::glob-function-to-find-filenames}

The glob pattern matching syntax can also be used to search for filenames using the `glob` table function.
It accepts one parameter: the path to search (which may include glob patterns).

Search the current directory for all files:

```sql
SELECT * FROM glob('*');
```



| file          |
| ------------- |
| duckdb.exe    |
| test.csv      |
| test.json     |
| test.parquet  |
| test2.csv     |
| test2.parquet |
| todos.json    |

##### Globbing Semantics {#docs:current:sql:functions:pattern_matching::globbing-semantics}

DuckDB's globbing implementation follows the semantics of [Python's `glob`](https://docs.python.org/3/library/glob.html) and not the `glob` used in the shell.
A notable difference is the behavior of the `**/` construct: `**/⟨filename⟩`{:.language-sql .highlight} will not return a file with `⟨filename⟩`{:.language-sql .highlight} in top-level directory.
For example, with a `README.md` file present in the directory, the following query finds it:

```sql
SELECT * FROM glob('README.md');
```



| file      |
| --------- |
| README.md |

However, the following query returns an empty result:

```sql
SELECT * FROM glob('**/README.md');
```

Meanwhile, the globbing of Bash, Zsh, etc. finds the file using the same syntax:

```batch
ls **/README.md
```

```text
README.md
```

#### Regular Expressions {#docs:current:sql:functions:pattern_matching::regular-expressions}

DuckDB's regular expression support is documented on the [Regular Expressions page](#docs:current:sql:functions:regular_expressions).
DuckDB supports some PostgreSQL-style operators for regular expression matching:

| PostgreSQL-style | Equivalent expression                                                                                    |
| :--------------- | :------------------------------------------------------------------------------------------------------- |
| `~`              | [`regexp_full_match`](#docs:current:sql:functions:text::regexp_full_matchstring-regex)       |
| `!~`             | `NOT` [`regexp_full_match`](#docs:current:sql:functions:text::regexp_full_matchstring-regex) |
| `~*`             | (not supported)                                                                                          |
| `!~*`            | (not supported)                                                                                          |

### Regular Expressions {#docs:current:sql:functions:regular_expressions}



DuckDB offers [pattern matching operators](#docs:current:sql:functions:pattern_matching)
([`LIKE`](#docs:current:sql:functions:pattern_matching::like),
[`SIMILAR TO`](#docs:current:sql:functions:pattern_matching::similar-to),
[`GLOB`](#docs:current:sql:functions:pattern_matching::glob)),
as well as support for regular expressions via functions.

#### Regular Expression Syntax {#docs:current:sql:functions:regular_expressions::regular-expression-syntax}

DuckDB uses the [RE2 library](https://github.com/google/re2) as its regular expression engine. For the regular expression syntax, see the [RE2 docs](https://github.com/google/re2/wiki/Syntax).

#### Functions {#docs:current:sql:functions:regular_expressions::functions}

All functions accept an optional set of [options](#::options-for-regular-expression-functions).

| Name | Description |
|:--|:-------|
| [`regexp_extract(string, pattern[, group = 0][, options])`](#regexp_extractstring-pattern-group--0-options) | If `string` contains the regexp `pattern`, returns the capturing group specified by optional parameter `group`; otherwise, returns the empty string. The `group` must be a constant value. If no `group` is given, it defaults to 0. A set of optional [`options`](#::options-for-regular-expression-functions) can be set. |
| [`regexp_extract(string, pattern, name_list[, options])`](#regexp_extractstring-pattern-name_list-options) | If `string` contains the regexp `pattern`, returns the capturing groups as a struct with corresponding names from `name_list`; otherwise, returns a struct with the same keys and empty strings as values. |
| [`regexp_extract_all(string, regex[, group = 0][, options])`](#regexp_extract_allstring-regex-group--0-options) | Finds non-overlapping occurrences of `regex` in `string` and returns the corresponding values of `group`. |
| [`regexp_extract_all(string, regex, name_list[, options])`](#regexp_extract_allstring-regex-name_list-options) | Finds non-overlapping occurrences of `regex` in `string` and returns the capturing groups as a list of structs with corresponding names from `name_list`. |
| [`regexp_full_match(string, regex[, options])`](#regexp_full_matchstring-regex-options) | Returns `true` if the entire `string` matches the `regex`. |
| [`regexp_matches(string, pattern[, options])`](#regexp_matchesstring-pattern-options) | Returns `true` if `string` contains the regexp `pattern`, `false` otherwise. |
| [`regexp_replace(string, pattern, replacement[, options])`](#regexp_replacestring-pattern-replacement-options) | If `string` contains the regexp `pattern`, replaces the matching part with `replacement`. By default, only the first occurrence is replaced. A set of optional [`options`](#::options-for-regular-expression-functions), including the global flag `g`, can be set. |
| [`regexp_split_to_array(string, regex[, options])`](#regexp_split_to_arraystring-regex-options) | Alias of `string_split_regex`. Splits the `string` along the `regex`. |
| [`regexp_split_to_table(string, regex[, options])`](#regexp_split_to_tablestring-regex-options) | Splits the `string` along the `regex` and returns a row for each part. |

###### `regexp_extract(string, pattern[, group = 0][, options])` {#docs:current:sql:functions:regular_expressions::regexp_extractstring-pattern-group--0-options}



|   |   |
|:--|:--------|
| **Description** |If `string` contains the regexp `pattern`, returns the capturing group specified by optional parameter `group`; otherwise, returns the empty string. The `group` must be a constant value. If no `group` is given, it defaults to 0. A set of optional [`options`](#::options-for-regular-expression-functions) can be set. |
| **Example** | `regexp_extract('abc', '([a-z])(b)', 1)` |
| **Result** | `a` |

###### `regexp_extract(string, pattern, name_list[, options])` {#docs:current:sql:functions:regular_expressions::regexp_extractstring-pattern-name_list-options}



|   |   |
|:--|:--------|
| **Description** |If `string` contains the regexp `pattern`, returns the capturing groups as a struct with corresponding names from `name_list`; otherwise, returns a struct with the same keys and empty strings as values. A set of optional [`options`](#::options-for-regular-expression-functions) can be set. |
| **Example** | `regexp_extract('2023-04-15', '(\d+)-(\d+)-(\d+)', ['y', 'm', 'd'])` |
| **Result** | `{'y':'2023', 'm':'04', 'd':'15'}` |

###### `regexp_extract_all(string, regex[, group = 0][, options])` {#docs:current:sql:functions:regular_expressions::regexp_extract_allstring-regex-group--0-options}



|   |   |
|:--|:--------|
| **Description** |Finds non-overlapping occurrences of `regex` in `string` and returns the corresponding values of `group`. A set of optional [`options`](#::options-for-regular-expression-functions) can be set. |
| **Example** | `regexp_extract_all('Peter: 33, Paul:14', '(\w+):\s*(\d+)', 2)` |
| **Result** | `[33, 14]` |

###### `regexp_extract_all(string, regex, name_list[, options])` {#docs:current:sql:functions:regular_expressions::regexp_extract_allstring-regex-name_list-options}



|   |   |
|:--|:--------|
| **Description** |Finds non-overlapping occurrences of `regex` in `string` and returns the capturing groups as a list of structs with corresponding names from `name_list`. A set of optional [`options`](#::options-for-regular-expression-functions) can be set. |
| **Example** | `regexp_extract_all('Peter: 33, Paul: 14', '(\w+):\s*(\d+)', ['name', 'age'])` |
| **Result** | `[{'name': Peter, 'age': 33}, {'name': Paul, 'age': 14}]` |

###### `regexp_full_match(string, regex[, options])` {#docs:current:sql:functions:regular_expressions::regexp_full_matchstring-regex-options}



|   |   |
|:--|:--------|
| **Description** |Returns `true` if the entire `string` matches the `regex`. A set of optional [`options`](#::options-for-regular-expression-functions) can be set. |
| **Example** | `regexp_full_match('anabanana', '(an)*')` |
| **Result** | `false` |

###### `regexp_matches(string, pattern[, options])` {#docs:current:sql:functions:regular_expressions::regexp_matchesstring-pattern-options}



|   |   |
|:--|:--------|
| **Description** |Returns `true` if `string` contains the regexp `pattern`, `false` otherwise. A set of optional [`options`](#::options-for-regular-expression-functions) can be set. |
| **Example** | `regexp_matches('anabanana', '(an)*')` |
| **Result** | `true` |

###### `regexp_replace(string, pattern, replacement[, options])` {#docs:current:sql:functions:regular_expressions::regexp_replacestring-pattern-replacement-options}



|   |   |
|:--|:--------|
| **Description** |If `string` contains the regexp `pattern`, replaces the matching part with `replacement`. By default, only the first occurrence is replaced. A set of optional [`options`](#::options-for-regular-expression-functions), including the global flag `g`, can be set. |
| **Example** | `regexp_replace('hello', '[lo]', '-')` |
| **Result** | `he-lo` |

###### `regexp_split_to_array(string, regex[, options])` {#docs:current:sql:functions:regular_expressions::regexp_split_to_arraystring-regex-options}



|   |   |
|:--|:--------|
| **Description** |Alias of `string_split_regex`. Splits the `string` along the `regex`. A set of optional [`options`](#::options-for-regular-expression-functions) can be set. |
| **Example** | `regexp_split_to_array('hello world; 42', ';? ')` |
| **Result** | `['hello', 'world', '42']` |

###### `regexp_split_to_table(string, regex[, options])` {#docs:current:sql:functions:regular_expressions::regexp_split_to_tablestring-regex-options}



|   |   |
|:--|:--------|
| **Description** |Splits the `string` along the `regex` and returns a row for each part. A set of optional [`options`](#::options-for-regular-expression-functions) can be set. |
| **Example** | `regexp_split_to_table('hello world; 42', ';? ')` |
| **Result** | Three rows: `'hello'`, `'world'`, `'42'` |

The `regexp_matches` function is similar to the `SIMILAR TO` operator, however, it does not require the entire string to match. Instead, `regexp_matches` returns `true` if the string merely contains the pattern (unless the special tokens `^` and `$` are used to anchor the regular expression to the start and end of the string). Below are some examples:

```sql
SELECT regexp_matches('abc', 'abc');       -- true
SELECT regexp_matches('abc', '^abc$');     -- true
SELECT regexp_matches('abc', 'a');         -- true
SELECT regexp_matches('abc', '^a$');       -- false
SELECT regexp_matches('abc', '.*(b|d).*'); -- true
SELECT regexp_matches('abc', '(b|c).*');   -- true
SELECT regexp_matches('abc', '^(b|c).*');  -- false
SELECT regexp_matches('abc', '(?i)A');     -- true
SELECT regexp_matches('abc', 'A', 'i');    -- true
```

#### Options for Regular Expression Functions {#docs:current:sql:functions:regular_expressions::options-for-regular-expression-functions}

The regex functions support the following `options`.

| Option | Description |
|:---|:---|
| `'c'`               | Case-sensitive matching                             |
| `'i'`               | Case-insensitive matching                           |
| `'l'`               | Match literals instead of regular expression tokens |
| `'m'`, `'n'`, `'p'` | Newline sensitive matching                          |
| `'g'`               | Global replace, only available for `regexp_replace` |
| `'s'`               | Non-newline sensitive matching                      |

For example:

```sql
SELECT regexp_matches('abcd', 'ABC', 'c'); -- false
SELECT regexp_matches('abcd', 'ABC', 'i'); -- true
SELECT regexp_matches('ab^/$cd', '^/$', 'l'); -- true
SELECT regexp_matches(E'hello\nworld', 'hello.world', 'p'); -- false
SELECT regexp_matches(E'hello\nworld', 'hello.world', 's'); -- true
```

##### Using `regexp_matches` {#docs:current:sql:functions:regular_expressions::using-regexp_matches}

The `regexp_matches` operator will be optimized to the `LIKE` operator when possible. To achieve best performance, the `'c'` option (case-sensitive matching) should be passed if applicable. Note that by default the [`RE2` library](#::regular-expression-syntax) doesn't match the `.` character to newline.

| Original | Optimized equivalent |
|:---|:---|
| `regexp_matches('hello world', '^hello', 'c')`      | `prefix('hello world', 'hello')` |
| `regexp_matches('hello world', 'world$', 'c')`      | `suffix('hello world', 'world')` |
| `regexp_matches('hello world', 'hello.world', 'c')` | `LIKE 'hello_world'`             |
| `regexp_matches('hello world', 'he.*rld', 'c')`     | `LIKE '%he%rld'`                 |

##### Using `regexp_replace` {#docs:current:sql:functions:regular_expressions::using-regexp_replace}

The `regexp_replace` function can be used to replace the part of a string that matches the regexp pattern with a replacement string. The notation `\d` (where `d` is a number indicating the group) can be used to refer to groups captured in the regular expression in the replacement string. Note that by default, `regexp_replace` only replaces the first occurrence of the regular expression. To replace all occurrences, use the global replace (` g`) flag.

Some examples for using `regexp_replace`:

```sql
SELECT regexp_replace('abc', '(b|c)', 'X');        -- aXc
SELECT regexp_replace('abc', '(b|c)', 'X', 'g');   -- aXX
SELECT regexp_replace('abc', '(b|c)', '\1\1\1\1'); -- abbbbc
SELECT regexp_replace('abc', '(.*)c', '\1e');      -- abe
SELECT regexp_replace('abc', '(a)(b)', '\2\1');    -- bac
```

##### Using `regexp_extract` {#docs:current:sql:functions:regular_expressions::using-regexp_extract}

The `regexp_extract` function is used to extract a part of a string that matches the regexp pattern.
A specific capturing group within the pattern can be extracted using the `group` parameter. If `group` is not specified, it defaults to 0, extracting the first match with the whole pattern.

```sql
SELECT regexp_extract('abc', '.b.');           -- abc
SELECT regexp_extract('abc', '.b.', 0);        -- abc
SELECT regexp_extract('abc', '.b.', 1);        -- (empty)
SELECT regexp_extract('abc', '([a-z])(b)', 1); -- a
SELECT regexp_extract('abc', '([a-z])(b)', 2); -- b
```

The `regexp_extract` function also supports a `name_list` argument, which is a `LIST` of strings. Using `name_list`, the `regexp_extract` will return the corresponding capture groups as fields of a `STRUCT`:

```sql
SELECT regexp_extract('2023-04-15', '(\d+)-(\d+)-(\d+)', ['y', 'm', 'd']);
```

```text
{'y': 2023, 'm': 04, 'd': 15}
```

```sql
SELECT regexp_extract('2023-04-15 07:59:56', '^(\d+)-(\d+)-(\d+) (\d+):(\d+):(\d+)', ['y', 'm', 'd']);
```

```text
{'y': 2023, 'm': 04, 'd': 15}
```

```sql
SELECT regexp_extract('duckdb_0_7_1', '^(\w+)_(\d+)_(\d+)', ['tool', 'major', 'minor', 'fix']);
```

```console
Binder Error:
Not enough group names in regexp_extract
```

If the number of column names is less than the number of capture groups, then only the first groups are returned.
If the number of column names is greater, then an error is generated.

#### Limitations {#docs:current:sql:functions:regular_expressions::limitations}

Regular expressions only support 9 capture groups: `\1`, `\2`, `\3`, ..., `\9`.
Capture groups with two or more digits are not supported.

### Struct Functions {#docs:current:sql:functions:struct}



| Name | Description |
|:--|:-------|
| [`struct.entry`](#::structentry) | Dot notation that serves as an alias for `struct_extract` from named `STRUCT`s. |
| [`struct[entry]`](#structentry) | Bracket notation that serves as an alias for `struct_extract` from named `STRUCT`s. |
| [`struct[idx]`](#structidx) | Bracket notation that serves as an alias for `struct_extract` from unnamed `STRUCT`s (tuples), using an index (1-based). |
| [`row(any, ...)`](#::rowany-) | Create an unnamed `STRUCT` (tuple) containing the argument values. |
| [`struct_concat(structs...)`](#::struct_concatstructs) | Merge the multiple `structs` into a single `STRUCT`. |
| [`struct_contains(struct, entry)`](#::struct_containsstruct-entry) | Check if the `STRUCT` contains the specified entry. |
| [`struct_extract(struct, 'entry')`](#::struct_extractstruct-entry) | Extract the named entry from the `STRUCT`. |
| [`struct_extract(struct, idx)`](#::struct_extractstruct-idx) | Extract the entry from an unnamed `STRUCT` (tuple) using an index (1-based). |
| [`struct_extract_at(struct, idx)`](#::struct_extract_atstruct-idx) | Extract the entry from a `STRUCT` (tuple) using an index (1-based). |
| [`struct_insert(struct, name := any, ...)`](#::struct_insertstruct-name--any-) | Add field(s) to an existing `STRUCT`. |
| [`struct_pack(name := any, ...)`](#::struct_packname--any-) | Create a `STRUCT` containing the argument values. The entry name will be the bound variable name. |
| [`struct_position(struct, entry)`](#::struct_positionstruct-entry) | Return the index of the entry within the `STRUCT` (1-based), or `NULL` if not found. |
| [`struct_update(struct, name := any, ...)`](#::struct_updatestruct-name--any-) | Add or update field(s) of an existing `STRUCT`. |
| [`struct_values(struct)`](#::struct_valuesstruct) | Return the values of a `STRUCT` as an unnamed `STRUCT` (tuple). |

###### `struct.entry` {#docs:current:sql:functions:struct::structentry}



|   |   |
|:--|:--------|
| **Description** |Dot notation that serves as an alias for `struct_extract` from named `STRUCT`s. |
| **Example** | `({'i': 3, 's': 'string'}).i` |
| **Result** | `3` |

###### `struct[entry]` {#docs:current:sql:functions:struct::structentry}



|   |   |
|:--|:--------|
| **Description** |Bracket notation that serves as an alias for `struct_extract` from named `STRUCT`s. |
| **Example** | `({'i': 3, 's': 'string'})['i']` |
| **Result** | `3` |

###### `struct[idx]` {#docs:current:sql:functions:struct::structidx}



|   |   |
|:--|:--------|
| **Description** |Bracket notation that serves as an alias for `struct_extract` from unnamed `STRUCT`s (tuples), using an index (1-based). |
| **Example** | `(row(42, 84))[1]` |
| **Result** | `42` |

###### `row(any, ...)` {#docs:current:sql:functions:struct::rowany-}



|   |   |
|:--|:--------|
| **Description** |Create an unnamed `STRUCT` (tuple) containing the argument values. |
| **Example** | `row(i, i % 4, i / 4)` |
| **Result** | `(10, 2, 2.5)` |

###### `struct_concat(structs...)` {#docs:current:sql:functions:struct::struct_concatstructs}



|   |   |
|:--|:--------|
| **Description** |Merge the multiple `structs` into a single `STRUCT`. |
| **Example** | `struct_concat(struct_pack(i := 4), struct_pack(s := 'string'))` |
| **Result** | `{'i': 4, 's': string}` |

###### `struct_contains(struct, entry)` {#docs:current:sql:functions:struct::struct_containsstruct-entry}



|   |   |
|:--|:--------|
| **Description** |Check if the `STRUCT` contains the specified entry. |
| **Example** | `struct_contains(row(1, 2, 3), 2)` |
| **Result** | `true` |
| **Alias** | `struct_has` |

###### `struct_extract(struct, 'entry')` {#docs:current:sql:functions:struct::struct_extractstruct-entry}



|   |   |
|:--|:--------|
| **Description** |Extract the named entry from the `STRUCT`. |
| **Example** | `struct_extract({'i': 3, 'v2': 3, 'v3': 0}, 'i')` |
| **Result** | `3` |

###### `struct_extract(struct, idx)` {#docs:current:sql:functions:struct::struct_extractstruct-idx}



|   |   |
|:--|:--------|
| **Description** |Extract the entry from an unnamed `STRUCT` (tuple) using an index (1-based). |
| **Example** | `struct_extract(row(42, 84), 1)` |
| **Result** | `42` |

###### `struct_extract_at(struct, idx)` {#docs:current:sql:functions:struct::struct_extract_atstruct-idx}



|   |   |
|:--|:--------|
| **Description** |Extract the entry from a `STRUCT` (tuple) using an index (1-based). |
| **Example** | `struct_extract_at({'v1': 10, 'v2': 20, 'v3': 3}, 20)` |
| **Result** | `20` |

###### `struct_insert(struct, name := any, ...)` {#docs:current:sql:functions:struct::struct_insertstruct-name--any-}



|   |   |
|:--|:--------|
| **Description** |Add field(s) to an existing `STRUCT`. |
| **Example** | `struct_insert({'a': 1}, b := 2)` |
| **Result** | `{'a': 1, 'b': 2}` |

###### `struct_pack(name := any, ...)` {#docs:current:sql:functions:struct::struct_packname--any-}



|   |   |
|:--|:--------|
| **Description** |Create a `STRUCT` containing the argument values. The entry name will be the bound variable name. |
| **Example** | `struct_pack(i := 4, s := 'string')` |
| **Result** | `{'i': 4, 's': string}` |

###### `struct_position(struct, entry)` {#docs:current:sql:functions:struct::struct_positionstruct-entry}



|   |   |
|:--|:--------|
| **Description** |Return the index of the entry within the `STRUCT` (1-based), or `NULL` if not found. |
| **Example** | `struct_position(row(1, 2, 3), 2)` |
| **Result** | `2` |
| **Alias** | `struct_indexof` |

###### `struct_update(struct, name := any, ...)` {#docs:current:sql:functions:struct::struct_updatestruct-name--any-}



|   |   |
|:--|:--------|
| **Description** |Add or update field(s) of an existing `STRUCT`. |
| **Example** | `struct_update({'a': 1, 'b': 2}, b := 3, c := 4)` |
| **Result** | `{'a': 1, 'b': 3, 'c': 4}` |

###### `struct_values(struct)` {#docs:current:sql:functions:struct::struct_valuesstruct}



|   |   |
|:--|:--------|
| **Description** |Return the values of a `STRUCT` as an unnamed `STRUCT` (tuple). |
| **Example** | `struct_values({'a': 1, 'b': 2, 'c': 3})` |
| **Result** | `(1, 2, 3)` |

### Text Functions {#docs:current:sql:functions:text}



#### Text Functions and Operators {#docs:current:sql:functions:text::text-functions-and-operators}

This section describes functions and operators for examining and manipulating [`STRING` values](#docs:current:sql:data_types:text).




| Function | Description |
|:--|:-------|
| [`string[index]`](#stringindex) | Extracts a single character using a (1-based) `index`. |
| [`string[begin:end]`](#stringbeginend) | Extracts a string using [slice conventions](#docs:current:sql:functions:list::slicing) similar to Python. Missing `begin` or `end` arguments are interpreted as the beginning or end of the list respectively. Negative values are accepted. |
| [`string LIKE target`](#::string-like-target) | Returns `true` if the `string` matches the like specifier (see [Pattern Matching](#docs:current:sql:functions:pattern_matching)). |
| [`string SIMILAR TO regex`](#::string-similar-to-regex) | Returns `true` if the `string` matches the `regex` (see [Pattern Matching](#docs:current:sql:functions:pattern_matching)). |
| [`string ^@ search_string`](#::starts_withstring-search_string) | Alias for `starts_with`. |
| [`arg1 || arg2`](#::arg1--arg2) | Concatenates two strings, lists, or blobs. Any `NULL` input results in `NULL`. See also [`concat(arg1, arg2, ...)`](#docs:current:sql:functions:text::concatvalue-) and [`list_concat(list1, list2, ...)`](#docs:current:sql:functions:list::list_concatlist_1--list_n). |
| [`array_extract(string, index)`](#::array_extractstring-index) | Extracts a single character from a `string` using a (1-based) `index`. |
| [`array_slice(list, begin, end)`](#::array_slicelist-begin-end) | Extracts a sublist or substring using [slice conventions](#docs:current:sql:functions:list::slicing). Negative values are accepted. |
| [`ascii(string)`](#::asciistring) | Returns an integer that represents the Unicode code point of the first character of the `string`. |
| [`bar(x, min, max[, width])`](#barx-min-max-width) | Draws a band whose width is proportional to (` x - min`) and equal to `width` characters when `x` = `max`. `width` defaults to 80. |
| [`base64(blob)`](#::to_base64blob) | Alias for `to_base64`. |
| [`bin(string)`](#::binstring) | Converts the `string` to binary representation. |
| [`bit_length(string)`](#::bit_lengthstring) | Number of bits in a `string`. |
| [`char_length(string)`](#::lengthstring) | Alias for `length`. |
| [`character_length(string)`](#::lengthstring) | Alias for `length`. |
| [`chr(code_point)`](#::chrcode_point) | Returns a character which is corresponding the ASCII code value or Unicode code point. |
| [`concat(value, ...)`](#::concatvalue-) | Concatenates multiple strings or lists. `NULL` inputs are skipped. See also [operator `||`](#::arg1--arg2). |
| [`concat_ws(separator, string, ...)`](#::concat_wsseparator-string-) | Concatenates many strings, separated by `separator`. `NULL` inputs are skipped. |
| [`contains(string, search_string)`](#::containsstring-search_string) | Returns `true` if `search_string` is found within `string`. Note that [collations](#docs:current:sql:expressions:collations) are not supported. |
| [`ends_with(string, search_string)`](#::suffixstring-search_string) | Alias for `suffix`. |
| [`format(format, ...)`](#::formatformat-) | Formats a string using the [fmt syntax](#::fmt-syntax). |
| [`formatReadableDecimalSize(integer)`](#::formatreadabledecimalsizeinteger) | Converts `integer` to a human-readable representation using units based on powers of 10 (KB, MB, GB, etc.). |
| [`format_bytes(integer)`](#::format_bytesinteger) | Converts `integer` to a human-readable representation using units based on powers of 2 (KiB, MiB, GiB, etc.). |
| [`from_base64(string)`](#::from_base64string) | Converts a base64 encoded `string` to a character string (` BLOB`). |
| [`from_binary(value)`](#::unbinvalue) | Alias for `unbin`. |
| [`from_hex(value)`](#::unhexvalue) | Alias for `unhex`. |
| [`greatest(arg1, ...)`](#::greatestarg1-) | Returns the largest value in lexicographical order. Note that lowercase characters are considered larger than uppercase characters and [collations](#docs:current:sql:expressions:collations) are not supported. |
| [`hash(value, ...)`](#::hashvalue-) | Returns a `UBIGINT` with the hash of the `value`. Note that this is not a cryptographic hash. |
| [`hex(string)`](#::hexstring) | Converts the `string` to hexadecimal representation. |
| [`ilike_escape(string, like_specifier, escape_character)`](#::ilike_escapestring-like_specifier-escape_character) | Returns `true` if the `string` matches the `like_specifier` (see [Pattern Matching](#docs:current:sql:functions:pattern_matching)) using case-insensitive matching. `escape_character` is used to search for wildcard characters in the `string`. |
| [`instr(string, search_string)`](#::instrstring-search_string) | Returns location of first occurrence of `search_string` in `string`, counting from 1. Returns 0 if no match found. |
| [`lcase(string)`](#::lowerstring) | Alias for `lower`. |
| [`least(arg1, ...)`](#::leastarg1-) | Returns the smallest value in lexicographical order. Note that uppercase characters are considered smaller than lowercase characters and [collations](#docs:current:sql:expressions:collations) are not supported. |
| [`left(string, count)`](#::leftstring-count) | Extracts the left-most count characters. |
| [`left_grapheme(string, count)`](#::left_graphemestring-count) | Extracts the left-most count grapheme clusters. |
| [`len(string)`](#::lengthstring) | Alias for `length`. |
| [`length(string)`](#::lengthstring) | Number of characters in `string`. |
| [`length_grapheme(string)`](#::length_graphemestring) | Number of grapheme clusters in `string`. |
| [`like_escape(string, like_specifier, escape_character)`](#::like_escapestring-like_specifier-escape_character) | Returns `true` if the `string` matches the `like_specifier` (see [Pattern Matching](#docs:current:sql:functions:pattern_matching)) using case-sensitive matching. `escape_character` is used to search for wildcard characters in the `string`. |
| [`lower(string)`](#::lowerstring) | Converts `string` to lower case. |
| [`lpad(string, count, character)`](#::lpadstring-count-character) | Pads the `string` with the `character` on the left until it has `count` characters. Truncates the `string` on the right if it has more than `count` characters. |
| [`ltrim(string[, characters])`](#ltrimstring-characters) | Removes any occurrences of any of the `characters` from the left side of the `string`. `characters` defaults to `space`. |
| [`md5(string)`](#::md5string) | Returns the MD5 hash of the `string` as a `VARCHAR`. |
| [`md5_number(string)`](#::md5_numberstring) | Returns the MD5 hash of the `string` as a `HUGEINT`. |
| [`md5_number_lower(string)`](#::md5_number_lowerstring) | Returns the lower 64-bit segment of the MD5 hash of the `string` as a `UBIGINT`. |
| [`md5_number_upper(string)`](#::md5_number_upperstring) | Returns the upper 64-bit segment of the MD5 hash of the `string` as a `UBIGINT`. |
| [`nfc_normalize(string)`](#::nfc_normalizestring) | Converts `string` to Unicode NFC normalized string. Useful for comparisons and ordering if text data is mixed between NFC normalized and not. |
| [`not_ilike_escape(string, like_specifier, escape_character)`](#::not_ilike_escapestring-like_specifier-escape_character) | Returns `false` if the `string` matches the `like_specifier` (see [Pattern Matching](#docs:current:sql:functions:pattern_matching)) using case-insensitive matching. `escape_character` is used to search for wildcard characters in the `string`. |
| [`not_like_escape(string, like_specifier, escape_character)`](#::not_like_escapestring-like_specifier-escape_character) | Returns `false` if the `string` matches the `like_specifier` (see [Pattern Matching](#docs:current:sql:functions:pattern_matching)) using case-sensitive matching. `escape_character` is used to search for wildcard characters in the `string`. |
| [`ord(string)`](#::unicodestring) | Alias for `unicode`. |
| [`parse_dirname(path[, separator])`](#parse_dirnamepath-separator) | Returns the top-level directory name from the given `path`. `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`. |
| [`parse_dirpath(path[, separator])`](#parse_dirpathpath-separator) | Returns the head of the `path` (the pathname until the last slash) similarly to Python's [`os.path.dirname`](https://docs.python.org/3.7/library/os.path.html#os.path.dirname). `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`. |
| [`parse_filename(string[, trim_extension][, separator])`](#parse_filenamestring-trim_extension-separator) | Returns the last component of the `path` similarly to Python's [`os.path.basename`](https://docs.python.org/3.7/library/os.path.html#os.path.basename) function. If `trim_extension` is `true`, the file extension will be removed (defaults to `false`). `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`. |
| [`parse_path(path[, separator])`](#parse_pathpath-separator) | Returns a list of the components (directories and filename) in the `path` similarly to Python's [`pathlib.parts`](https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.parts) function. `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`. |
| [`position(search_string IN string)`](#::positionsearch_string-in-string) | Return location of first occurrence of `search_string` in `string`, counting from 1. Returns 0 if no match found. |
| [`position(string, search_string)`](#::instrstring-search_string) | Alias for `instr`. |
| [`prefix(string, search_string)`](#::prefixstring-search_string) | Returns `true` if `string` starts with `search_string`. |
| [`printf(format, ...)`](#::printfformat-) | Formats a `string` using [printf syntax](#::printf-syntax). |
| [`read_text(source)`](#::read_textsource) | Returns the content from `source` (a filename, a list of filenames, or a glob pattern) as a `VARCHAR`. The file content is first validated to be valid UTF-8. If `read_text` attempts to read a file with invalid UTF-8 an error is thrown suggesting to use `read_blob` instead. See the [`read_text` guide](#docs:current:guides:file_formats:read_file::read_text) for more details. |
| [`regexp_escape(string)`](#::regexp_escapestring) | Escapes special patterns to turn `string` into a regular expression similarly to Python's [`re.escape` function](https://docs.python.org/3/library/re.html#re.escape). |
| [`regexp_extract(string, regex[, group][, options])`](#regexp_extractstring-regex-group-options) | If `string` contains the `regex` pattern, returns the capturing group specified by optional parameter `group`; otherwise, returns the empty string. The `group` must be a constant value. If no `group` is given, it defaults to 0. A set of optional [regex `options`](#docs:current:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. |
| [`regexp_extract(string, regex, name_list[, options])`](#regexp_extractstring-regex-name_list-options) | If `string` contains the `regex` pattern, returns the capturing groups as a struct with corresponding names from `name_list`; otherwise, returns a struct with the same keys and empty strings as values. A set of optional [regex `options`](#docs:current:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. |
| [`regexp_extract_all(string, regex[, group][, options])`](#regexp_extract_allstring-regex-group-options) | Finds non-overlapping occurrences of the `regex` in the `string` and returns the corresponding values of the capturing `group`. A set of optional [regex `options`](#docs:current:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. |
| [`regexp_extract_all(string, regex, name_list[, options])`](#regexp_extract_allstring-regex-name_list-options) | Finds non-overlapping occurrences of `regex` in `string` and returns the capturing groups as a list of structs with corresponding names from `name_list`. A set of optional [regex `options`](#docs:current:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. |
| [`regexp_full_match(string, regex[, col2])`](#regexp_full_matchstring-regex-col2) | Returns `true` if the entire `string` matches the `regex`. A set of optional [regex `options`](#docs:current:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. |
| [`regexp_matches(string, regex[, options])`](#regexp_matchesstring-regex-options) | Returns `true` if `string` contains the `regex`, `false` otherwise. A set of optional [regex `options`](#docs:current:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. |
| [`regexp_replace(string, regex, replacement[, options])`](#regexp_replacestring-regex-replacement-options) | If `string` contains the `regex`, replaces the matching part with `replacement`. A set of optional [regex `options`](#docs:current:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. |
| [`regexp_split_to_array(string, regex[, options])`](#string_split_regexstring-regex-options) | Alias for `string_split_regex`. |
| [`regexp_split_to_table(string, regex)`](#::regexp_split_to_tablestring-regex) | Splits the `string` along the `regex` and returns a row for each part. |
| [`repeat(string, count)`](#::repeatstring-count) | Repeats the `string` `count` number of times. |
| [`replace(string, source, target)`](#::replacestring-source-target) | Replaces any occurrences of the `source` with `target` in `string`. |
| [`reverse(string)`](#::reversestring) | Reverses the `string`. |
| [`right(string, count)`](#::rightstring-count) | Extract the right-most `count` characters. |
| [`right_grapheme(string, count)`](#::right_graphemestring-count) | Extracts the right-most `count` grapheme clusters. |
| [`rpad(string, count, character)`](#::rpadstring-count-character) | Pads the `string` with the `character` on the right until it has `count` characters. Truncates the `string` on the right if it has more than `count` characters. |
| [`rtrim(string[, characters])`](#rtrimstring-characters) | Removes any occurrences of any of the `characters` from the right side of the `string`. `characters` defaults to `space`. |
| [`sha1(value)`](#::sha1value) | Returns a `VARCHAR` with the SHA-1 hash of the `value`. |
| [`sha256(value)`](#::sha256value) | Returns a `VARCHAR` with the SHA-256 hash of the `value`. |
| [`split(string, separator)`](#::string_splitstring-separator) | Alias for `string_split`. |
| [`split_part(string, separator, index)`](#::split_partstring-separator-index) | Splits the `string` along the `separator` and returns the data at the (1-based) `index` of the list. If the `index` is outside the bounds of the list, return an empty string (to match PostgreSQL's behavior). |
| [`starts_with(string, search_string)`](#::starts_withstring-search_string) | Returns `true` if `string` begins with `search_string`. |
| [`str_split(string, separator)`](#::string_splitstring-separator) | Alias for `string_split`. |
| [`str_split_regex(string, regex[, options])`](#string_split_regexstring-regex-options) | Alias for `string_split_regex`. |
| [`string_split(string, separator)`](#::string_splitstring-separator) | Splits the `string` along the `separator`. |
| [`string_split_regex(string, regex[, options])`](#string_split_regexstring-regex-options) | Splits the `string` along the `regex`. A set of optional [regex `options`](#docs:current:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. |
| [`string_to_array(string, separator)`](#::string_splitstring-separator) | Alias for `string_split`. |
| [`strip_accents(string)`](#::strip_accentsstring) | Strips accents from `string`. |
| [`strlen(string)`](#::strlenstring) | Number of bytes in `string`. |
| [`strpos(string, search_string)`](#::instrstring-search_string) | Alias for `instr`. |
| [`substr(string, start[, length])`](#substringstring-start-length) | Alias for `substring`. |
| [`substring(string, start[, length])`](#substringstring-start-length) | Extracts substring starting from character `start` up to the end of the string. If optional argument `length` is set, extracts a substring of `length` characters instead. Note that a `start` value of `1` refers to the first character of the `string`. |
| [`substring_grapheme(string, start[, length])`](#substring_graphemestring-start-length) | Extracts substring starting from grapheme clusters `start` up to the end of the string. If optional argument `length` is set, extracts a substring of `length` grapheme clusters instead. Note that a `start` value of `1` refers to the `first` character of the `string`. |
| [`suffix(string, search_string)`](#::suffixstring-search_string) | Returns `true` if `string` ends with `search_string`. Note that [collations](#docs:current:sql:expressions:collations) are not supported. |
| [`to_base(number, radix[, min_length])`](#to_basenumber-radix-min_length) | Converts `number` to a string in the given base `radix`, optionally padding with leading zeros to `min_length`. |
| [`to_base64(blob)`](#::to_base64blob) | Converts a `blob` to a base64 encoded string. |
| [`to_binary(string)`](#::binstring) | Alias for `bin`. |
| [`to_hex(string)`](#::hexstring) | Alias for `hex`. |
| [`translate(string, from, to)`](#::translatestring-from-to) | Replaces each character in `string` that matches a character in the `from` set with the corresponding character in the `to` set. If `from` is longer than `to`, occurrences of the extra characters in `from` are deleted. |
| [`trim(string[, characters])`](#trimstring-characters) | Removes any occurrences of any of the `characters` from either side of the `string`. `characters` defaults to `space`. |
| [`ucase(string)`](#::upperstring) | Alias for `upper`. |
| [`unbin(value)`](#::unbinvalue) | Converts a `value` from binary representation to a blob. |
| [`unhex(value)`](#::unhexvalue) | Converts a `value` from hexadecimal representation to a blob. |
| [`unicode(string)`](#::unicodestring) | Returns an `INTEGER` representing the `unicode` codepoint of the first character in the `string`. |
| [`upper(string)`](#::upperstring) | Converts `string` to upper case. |
| [`url_decode(string)`](#::url_decodestring) | Decodes a URL from a representation using [Percent-Encoding](https://datatracker.ietf.org/doc/html/rfc3986#section-2.1). |
| [`url_encode(string)`](#::url_encodestring) | Encodes a URL to a representation using [Percent-Encoding](https://datatracker.ietf.org/doc/html/rfc3986#section-2.1). |



###### `string[index]` {#docs:current:sql:functions:text::stringindex}



|   |   |
|:--|:--------|
| **Description** |Extracts a single character using a (1-based) `index`. |
| **Example** | `'DuckDB'[4]` |
| **Result** | `k` |
| **Alias** | `array_extract` |

###### `string[begin:end]` {#docs:current:sql:functions:text::stringbeginend}



|   |   |
|:--|:--------|
| **Description** |Extracts a string using [slice conventions](#docs:current:sql:functions:list::slicing) similar to Python. Missing `begin` or `end` arguments are interpreted as the beginning or end of the list respectively. Negative values are accepted. |
| **Example** | `'DuckDB'[:4]` |
| **Result** | `Duck` |
| **Alias** | `array_slice` |

###### `string LIKE target` {#docs:current:sql:functions:text::string-like-target}



|   |   |
|:--|:--------|
| **Description** |Returns `true` if the `string` matches the like specifier (see [Pattern Matching](#docs:current:sql:functions:pattern_matching)). |
| **Example** | `'hello' LIKE '%lo'` |
| **Result** | `true` |

###### `string SIMILAR TO regex` {#docs:current:sql:functions:text::string-similar-to-regex}



|   |   |
|:--|:--------|
| **Description** |Returns `true` if the `string` matches the `regex` (see [Pattern Matching](#docs:current:sql:functions:pattern_matching)). |
| **Example** | `'hello' SIMILAR TO 'l+'` |
| **Result** | `false` |
| **Alias** | `regexp_full_match` |

###### `arg1 || arg2` {#docs:current:sql:functions:text::arg1--arg2}



|   |   |
|:--|:--------|
| **Description** |Concatenates two strings, lists, or blobs. Any `NULL` input results in `NULL`. See also [`concat(arg1, arg2, ...)`](#docs:current:sql:functions:text::concatvalue-) and [`list_concat(list1, list2, ...)`](#docs:current:sql:functions:list::list_concatlist_1--list_n). |
| **Example 1** | `'Duck' || 'DB'` |
| **Result** | `DuckDB` |
| **Example 2** | `[1, 2, 3] || [4, 5, 6]` |
| **Result** | `[1, 2, 3, 4, 5, 6]` |
| **Example 3** | `'\xAA'::BLOB || '\xBB'::BLOB` |
| **Result** | `\xAA\xBB` |

###### `array_extract(string, index)` {#docs:current:sql:functions:text::array_extractstring-index}



|   |   |
|:--|:--------|
| **Description** |Extracts a single character from a `string` using a (1-based) `index`. |
| **Example** | `array_extract('DuckDB', 2)` |
| **Result** | `u` |

###### `array_slice(list, begin, end)` {#docs:current:sql:functions:text::array_slicelist-begin-end}



|   |   |
|:--|:--------|
| **Description** |Extracts a sublist or substring using [slice conventions](#docs:current:sql:functions:list::slicing). Negative values are accepted. |
| **Example 1** | `array_slice('DuckDB', 3, 4)` |
| **Result** | `ck` |
| **Example 2** | `array_slice('DuckDB', 3, NULL)` |
| **Result** | `NULL` |
| **Example 3** | `array_slice('DuckDB', 0, -3)` |
| **Result** | `Duck` |
| **Alias** | `list_slice` |

###### `ascii(string)` {#docs:current:sql:functions:text::asciistring}



|   |   |
|:--|:--------|
| **Description** |Returns an integer that represents the Unicode code point of the first character of the `string`. |
| **Example** | `ascii('Ω')` |
| **Result** | `937` |

###### `bar(x, min, max[, width])` {#docs:current:sql:functions:text::barx-min-max-width}



|   |   |
|:--|:--------|
| **Description** |Draws a band whose width is proportional to (` x - min`) and equal to `width` characters when `x` = `max`. `width` defaults to 80. |
| **Example** | `bar(5, 0, 20, 10)` |
| **Result** | `██▌       ` |

###### `bin(string)` {#docs:current:sql:functions:text::binstring}



|   |   |
|:--|:--------|
| **Description** |Converts the `string` to binary representation. |
| **Example** | `bin('Aa')` |
| **Result** | `0100000101100001` |
| **Alias** | `to_binary` |

###### `bit_length(string)` {#docs:current:sql:functions:text::bit_lengthstring}



|   |   |
|:--|:--------|
| **Description** |Number of bits in a `string`. |
| **Example** | `bit_length('abc')` |
| **Result** | `24` |

###### `chr(code_point)` {#docs:current:sql:functions:text::chrcode_point}



|   |   |
|:--|:--------|
| **Description** |Returns a character which is corresponding the ASCII code value or Unicode code point. |
| **Example** | `chr(65)` |
| **Result** | `A` |

###### `concat(value, ...)` {#docs:current:sql:functions:text::concatvalue-}



|   |   |
|:--|:--------|
| **Description** |Concatenates multiple strings or lists. `NULL` inputs are skipped. See also [operator `||`](#::arg1--arg2). |
| **Example 1** | `concat('Hello', ' ', 'World')` |
| **Result** | `Hello World` |
| **Example 2** | `concat([1, 2, 3], NULL, [4, 5, 6])` |
| **Result** | `[1, 2, 3, 4, 5, 6]` |

###### `concat_ws(separator, string, ...)` {#docs:current:sql:functions:text::concat_wsseparator-string-}



|   |   |
|:--|:--------|
| **Description** |Concatenates many strings, separated by `separator`. `NULL` inputs are skipped. |
| **Example** | `concat_ws(', ', 'Banana', 'Apple', 'Melon')` |
| **Result** | `Banana, Apple, Melon` |

###### `contains(string, search_string)` {#docs:current:sql:functions:text::containsstring-search_string}



|   |   |
|:--|:--------|
| **Description** |Returns `true` if `search_string` is found within `string`. |
| **Example** | `contains('abc', 'a')` |
| **Result** | `true` |

###### `format(format, ...)` {#docs:current:sql:functions:text::formatformat-}



|   |   |
|:--|:--------|
| **Description** |Formats a string using the [fmt syntax](#::fmt-syntax). |
| **Example** | `format('Benchmark "{}" took {} seconds', 'CSV', 42)` |
| **Result** | `Benchmark "CSV" took 42 seconds` |

###### `formatReadableDecimalSize(integer)` {#docs:current:sql:functions:text::formatreadabledecimalsizeinteger}



|   |   |
|:--|:--------|
| **Description** |Converts `integer` to a human-readable representation using units based on powers of 10 (KB, MB, GB, etc.). |
| **Example** | `formatReadableDecimalSize(16000)` |
| **Result** | `16.0 kB` |

###### `format_bytes(integer)` {#docs:current:sql:functions:text::format_bytesinteger}



|   |   |
|:--|:--------|
| **Description** |Converts `integer` to a human-readable representation using units based on powers of 2 (KiB, MiB, GiB, etc.). |
| **Example** | `format_bytes(16_000)` |
| **Result** | `15.6 KiB` |
| **Alias** | `formatReadableSize`, `pg_size_pretty` |

###### `from_base64(string)` {#docs:current:sql:functions:text::from_base64string}



|   |   |
|:--|:--------|
| **Description** |Converts a base64 encoded `string` to a character string (` BLOB`). |
| **Example** | `from_base64('QQ==')` |
| **Result** | `A` |

###### `greatest(arg1, ...)` {#docs:current:sql:functions:text::greatestarg1-}



|   |   |
|:--|:--------|
| **Description** |Returns the largest value in lexicographical order. Note that lowercase characters are considered larger than uppercase characters and [collations](#docs:current:sql:expressions:collations) are not supported. |
| **Example 1** | `greatest(42, 84)` |
| **Result** | `84` |
| **Example 2** | `greatest('abc', 'bcd', 'cde', 'EFG')` |
| **Result** | `cde` |

###### `hash(value, ...)` {#docs:current:sql:functions:text::hashvalue-}



|   |   |
|:--|:--------|
| **Description** |Returns a `UBIGINT` with the hash of the `value`. Note that this is not a cryptographic hash. |
| **Example** | `hash('🦆')` |
| **Result** | `4164431626903154684` |

###### `hex(string)` {#docs:current:sql:functions:text::hexstring}



|   |   |
|:--|:--------|
| **Description** |Converts the `string` to hexadecimal representation. |
| **Example** | `hex('Hello')` |
| **Result** | `48656C6C6F` |
| **Alias** | `to_hex` |

###### `ilike_escape(string, like_specifier, escape_character)` {#docs:current:sql:functions:text::ilike_escapestring-like_specifier-escape_character}



|   |   |
|:--|:--------|
| **Description** |Returns `true` if the `string` matches the `like_specifier` (see [Pattern Matching](#docs:current:sql:functions:pattern_matching)) using case-insensitive matching. `escape_character` is used to search for wildcard characters in the `string`. |
| **Example** | `ilike_escape('A%c', 'a$%C', '$')` |
| **Result** | `true` |

###### `instr(string, search_string)` {#docs:current:sql:functions:text::instrstring-search_string}



|   |   |
|:--|:--------|
| **Description** |Returns location of first occurrence of `search_string` in `string`, counting from 1. Returns 0 if no match found. |
| **Example** | `instr('test test', 'es')` |
| **Result** | `2` |
| **Aliases** | `position`, `strpos` |

###### `least(arg1, ...)` {#docs:current:sql:functions:text::leastarg1-}



|   |   |
|:--|:--------|
| **Description** |Returns the smallest value in lexicographical order. Note that uppercase characters are considered smaller than lowercase characters and [collations](#docs:current:sql:expressions:collations) are not supported. |
| **Example 1** | `least(42, 84)` |
| **Result** | `42` |
| **Example 2** | `least('abc', 'bcd', 'cde', 'EFG')` |
| **Result** | `EFG` |

###### `left(string, count)` {#docs:current:sql:functions:text::leftstring-count}



|   |   |
|:--|:--------|
| **Description** |Extracts the left-most count characters. |
| **Example** | `left('Hello🦆', 2)` |
| **Result** | `He` |

###### `left_grapheme(string, count)` {#docs:current:sql:functions:text::left_graphemestring-count}



|   |   |
|:--|:--------|
| **Description** |Extracts the left-most count grapheme clusters. |
| **Example** | `left_grapheme('🤦🏼‍♂️🤦🏽‍♀️', 1)` |
| **Result** | `🤦🏼‍♂️` |

###### `length(string)` {#docs:current:sql:functions:text::lengthstring}



|   |   |
|:--|:--------|
| **Description** |Number of characters in `string`. |
| **Example** | `length('Hello🦆')` |
| **Result** | `6` |
| **Aliases** | `char_length`, `character_length`, `len` |

###### `length_grapheme(string)` {#docs:current:sql:functions:text::length_graphemestring}



|   |   |
|:--|:--------|
| **Description** |Number of grapheme clusters in `string`. |
| **Example** | `length_grapheme('🤦🏼‍♂️🤦🏽‍♀️')` |
| **Result** | `2` |

###### `like_escape(string, like_specifier, escape_character)` {#docs:current:sql:functions:text::like_escapestring-like_specifier-escape_character}



|   |   |
|:--|:--------|
| **Description** |Returns `true` if the `string` matches the `like_specifier` (see [Pattern Matching](#docs:current:sql:functions:pattern_matching)) using case-sensitive matching. `escape_character` is used to search for wildcard characters in the `string`. |
| **Example** | `like_escape('a%c', 'a$%c', '$')` |
| **Result** | `true` |

###### `lower(string)` {#docs:current:sql:functions:text::lowerstring}



|   |   |
|:--|:--------|
| **Description** |Converts `string` to lower case. |
| **Example** | `lower('Hello')` |
| **Result** | `hello` |
| **Alias** | `lcase` |

###### `lpad(string, count, character)` {#docs:current:sql:functions:text::lpadstring-count-character}



|   |   |
|:--|:--------|
| **Description** |Pads the `string` with the `character` on the left until it has `count` characters. Truncates the `string` on the right if it has more than `count` characters. |
| **Example** | `lpad('hello', 8, '>')` |
| **Result** | `>>>hello` |

###### `ltrim(string[, characters])` {#docs:current:sql:functions:text::ltrimstring-characters}



|   |   |
|:--|:--------|
| **Description** |Removes any occurrences of any of the `characters` from the left side of the `string`. `characters` defaults to `space`. |
| **Example 1** | <code class="language-plaintext highlighter-rouge">ltrim('&nbsp;&nbsp;&nbsp;&nbsp;test&nbsp;&nbsp;')</code> |
| **Result** | <code class="language-plaintext highlighter-rouge">test&nbsp;&nbsp;</code> |
| **Example 2** | `ltrim('>>>>test<<', '><')` |
| **Result** | `test<<` |

###### `md5(string)` {#docs:current:sql:functions:text::md5string}



|   |   |
|:--|:--------|
| **Description** |Returns the MD5 hash of the `string` as a `VARCHAR`. |
| **Example** | `md5('abc')` |
| **Result** | `900150983cd24fb0d6963f7d28e17f72` |

###### `md5_number(string)` {#docs:current:sql:functions:text::md5_numberstring}



|   |   |
|:--|:--------|
| **Description** |Returns the MD5 hash of the `string` as a `HUGEINT`. |
| **Example** | `md5_number('abc')` |
| **Result** | `152195979970564155685860391459828531600` |

###### `md5_number_lower(string)` {#docs:current:sql:functions:text::md5_number_lowerstring}



|   |   |
|:--|:--------|
| **Description** |Returns the lower 64-bit segment of the MD5 hash of the `string` as a `UBIGINT`. |
| **Example** | `md5_number_lower('abc')` |
| **Result** | `8250560606382298838` |

###### `md5_number_upper(string)` {#docs:current:sql:functions:text::md5_number_upperstring}



|   |   |
|:--|:--------|
| **Description** |Returns the upper 64-bit segment of the MD5 hash of the `string` as a `UBIGINT`. |
| **Example** | `md5_number_upper('abc')` |
| **Result** | `12704604231530709392` |

###### `nfc_normalize(string)` {#docs:current:sql:functions:text::nfc_normalizestring}



|   |   |
|:--|:--------|
| **Description** |Converts `string` to Unicode NFC normalized string. Useful for comparisons and ordering if text data is mixed between NFC normalized and not. |
| **Example** | `nfc_normalize('ardèch')` |
| **Result** | `ardèch` |

###### `not_ilike_escape(string, like_specifier, escape_character)` {#docs:current:sql:functions:text::not_ilike_escapestring-like_specifier-escape_character}



|   |   |
|:--|:--------|
| **Description** |Returns `false` if the `string` matches the `like_specifier` (see [Pattern Matching](#docs:current:sql:functions:pattern_matching)) using case-insensitive matching. `escape_character` is used to search for wildcard characters in the `string`. |
| **Example** | `not_ilike_escape('A%c', 'a$%C', '$')` |
| **Result** | `false` |

###### `not_like_escape(string, like_specifier, escape_character)` {#docs:current:sql:functions:text::not_like_escapestring-like_specifier-escape_character}



|   |   |
|:--|:--------|
| **Description** |Returns `false` if the `string` matches the `like_specifier` (see [Pattern Matching](#docs:current:sql:functions:pattern_matching)) using case-sensitive matching. `escape_character` is used to search for wildcard characters in the `string`. |
| **Example** | `not_like_escape('a%c', 'a$%c', '$')` |
| **Result** | `false` |

###### `parse_dirname(path[, separator])` {#docs:current:sql:functions:text::parse_dirnamepath-separator}



|   |   |
|:--|:--------|
| **Description** |Returns the top-level directory name from the given `path`. `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`. |
| **Example** | `parse_dirname('path/to/file.csv', 'system')` |
| **Result** | `path` |

###### `parse_dirpath(path[, separator])` {#docs:current:sql:functions:text::parse_dirpathpath-separator}



|   |   |
|:--|:--------|
| **Description** |Returns the head of the `path` (the pathname until the last slash) similarly to Python's [`os.path.dirname`](https://docs.python.org/3.7/library/os.path.html#os.path.dirname). `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`. |
| **Example** | `parse_dirpath('path/to/file.csv', 'forward_slash')` |
| **Result** | `path/to` |

###### `parse_filename(string[, trim_extension][, separator])` {#docs:current:sql:functions:text::parse_filenamestring-trim_extension-separator}



|   |   |
|:--|:--------|
| **Description** |Returns the last component of the `path` similarly to Python's [`os.path.basename`](https://docs.python.org/3.7/library/os.path.html#os.path.basename) function. If `trim_extension` is `true`, the file extension will be removed (defaults to `false`). `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`. |
| **Example** | `parse_filename('path/to/file.csv', true, 'forward_slash')` |
| **Result** | `file` |

###### `parse_path(path[, separator])` {#docs:current:sql:functions:text::parse_pathpath-separator}



|   |   |
|:--|:--------|
| **Description** |Returns a list of the components (directories and filename) in the `path` similarly to Python's [`pathlib.parts`](https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.parts) function. `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`. |
| **Example** | `parse_path('path/to/file.csv', 'system')` |
| **Result** | `[path, to, file.csv]` |

###### `position(search_string IN string)` {#docs:current:sql:functions:text::positionsearch_string-in-string}



|   |   |
|:--|:--------|
| **Description** |Return location of first occurrence of `search_string` in `string`, counting from 1. Returns 0 if no match found. |
| **Example** | `position('b' IN 'abc')` |
| **Result** | `2` |
| **Aliases** | `instr`, `strpos` |

###### `prefix(string, search_string)` {#docs:current:sql:functions:text::prefixstring-search_string}



|   |   |
|:--|:--------|
| **Description** |Returns `true` if `string` starts with `search_string`. |
| **Example** | `prefix('abc', 'ab')` |
| **Result** | `true` |

###### `printf(format, ...)` {#docs:current:sql:functions:text::printfformat-}



|   |   |
|:--|:--------|
| **Description** |Formats a `string` using [printf syntax](#::printf-syntax). |
| **Example** | `printf('Benchmark "%s" took %d seconds', 'CSV', 42)` |
| **Result** | `Benchmark "CSV" took 42 seconds` |

###### `read_text(source)` {#docs:current:sql:functions:text::read_textsource}



|   |   |
|:--|:--------|
| **Description** |Returns the content from `source` (a filename, a list of filenames, or a glob pattern) as a `VARCHAR`. The file content is first validated to be valid UTF-8. If `read_text` attempts to read a file with invalid UTF-8 an error is thrown suggesting to use `read_blob` instead. See the [`read_text` guide](#docs:current:guides:file_formats:read_file::read_text) for more details. |
| **Example** | `read_text('hello.txt')` |
| **Result** | `hello\n` |

###### `regexp_escape(string)` {#docs:current:sql:functions:text::regexp_escapestring}



|   |   |
|:--|:--------|
| **Description** |Escapes special patterns to turn `string` into a regular expression similarly to Python's [`re.escape` function](https://docs.python.org/3/library/re.html#re.escape). |
| **Example** | `regexp_escape('https://duckdb.org')` |
| **Result** | `https\:\/\/duckdb\.org` |

###### `regexp_extract(string, regex[, group][, options])` {#docs:current:sql:functions:text::regexp_extractstring-regex-group-options}



|   |   |
|:--|:--------|
| **Description** |If `string` contains the `regex` pattern, returns the capturing group specified by optional parameter `group`; otherwise, returns the empty string. The `group` must be a constant value. If no `group` is given, it defaults to 0. A set of optional [regex `options`](#docs:current:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. |
| **Example** | `regexp_extract('ABC', '([a-z])(b)', 1, 'i')` |
| **Result** | `A` |

###### `regexp_extract(string, regex, name_list[, options])` {#docs:current:sql:functions:text::regexp_extractstring-regex-name_list-options}



|   |   |
|:--|:--------|
| **Description** |If `string` contains the `regex` pattern, returns the capturing groups as a struct with corresponding names from `name_list`; otherwise, returns a struct with the same keys and empty strings as values. A set of optional [regex `options`](#docs:current:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. |
| **Example** | `regexp_extract('John Doe', '([a-z]+) ([a-z]+)', ['first_name', 'last_name'], 'i')` |
| **Result** | `{'first_name': John, 'last_name': Doe}` |

###### `regexp_extract_all(string, regex[, group][, options])` {#docs:current:sql:functions:text::regexp_extract_allstring-regex-group-options}



|   |   |
|:--|:--------|
| **Description** |Finds non-overlapping occurrences of the `regex` in the `string` and returns the corresponding values of the capturing `group`. A set of optional [regex `options`](#docs:current:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. |
| **Example** | `regexp_extract_all('Peter: 33, Paul:14', '(\w+):\s*(\d+)', 2)` |
| **Result** | `[33, 14]` |

###### `regexp_extract_all(string, regex, name_list[, options])` {#docs:current:sql:functions:text::regexp_extract_allstring-regex-name_list-options}



|   |   |
|:--|:--------|
| **Description** |Finds non-overlapping occurrences of `regex` in `string` and returns the capturing groups as a list of structs with corresponding names from `name_list`. A set of optional [regex `options`](#docs:current:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. |
| **Example** | `regexp_extract_all('Peter: 33, Paul: 14', '(\w+):\s*(\d+)', ['name', 'age'])` |
| **Result** | `[{'name': Peter, 'age': 33}, {'name': Paul, 'age': 14}]` |

###### `regexp_full_match(string, regex[, col2])` {#docs:current:sql:functions:text::regexp_full_matchstring-regex-col2}



|   |   |
|:--|:--------|
| **Description** |Returns `true` if the entire `string` matches the `regex`. A set of optional [regex `options`](#docs:current:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. |
| **Example** | `regexp_full_match('anabanana', '(an)*')` |
| **Result** | `false` |

###### `regexp_matches(string, regex[, options])` {#docs:current:sql:functions:text::regexp_matchesstring-regex-options}



|   |   |
|:--|:--------|
| **Description** |Returns `true` if `string` contains the `regex`, `false` otherwise. A set of optional [regex `options`](#docs:current:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. |
| **Example** | `regexp_matches('anabanana', '(an)*')` |
| **Result** | `true` |

###### `regexp_replace(string, regex, replacement[, options])` {#docs:current:sql:functions:text::regexp_replacestring-regex-replacement-options}



|   |   |
|:--|:--------|
| **Description** |If `string` contains the `regex`, replaces the matching part with `replacement`. A set of optional [regex `options`](#docs:current:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. |
| **Example** | `regexp_replace('hello', '[lo]', '-')` |
| **Result** | `he-lo` |

###### `regexp_split_to_table(string, regex)` {#docs:current:sql:functions:text::regexp_split_to_tablestring-regex}



|   |   |
|:--|:--------|
| **Description** |Splits the `string` along the `regex` and returns a row for each part. |
| **Example** | `regexp_split_to_table('hello world; 42', ';? ')` |
| **Result** | Multiple rows: `'hello'`, `'world'`, `'42'` |

###### `repeat(string, count)` {#docs:current:sql:functions:text::repeatstring-count}



|   |   |
|:--|:--------|
| **Description** |Repeats the `string` `count` number of times. |
| **Example** | `repeat('A', 5)` |
| **Result** | `AAAAA` |

###### `replace(string, source, target)` {#docs:current:sql:functions:text::replacestring-source-target}



|   |   |
|:--|:--------|
| **Description** |Replaces any occurrences of the `source` with `target` in `string`. |
| **Example** | `replace('hello', 'l', '-')` |
| **Result** | `he--o` |

###### `reverse(string)` {#docs:current:sql:functions:text::reversestring}



|   |   |
|:--|:--------|
| **Description** |Reverses the `string`. |
| **Example** | `reverse('hello')` |
| **Result** | `olleh` |

###### `right(string, count)` {#docs:current:sql:functions:text::rightstring-count}



|   |   |
|:--|:--------|
| **Description** |Extract the right-most `count` characters. |
| **Example** | `right('Hello🦆', 3)` |
| **Result** | `lo🦆` |

###### `right_grapheme(string, count)` {#docs:current:sql:functions:text::right_graphemestring-count}



|   |   |
|:--|:--------|
| **Description** |Extracts the right-most `count` grapheme clusters. |
| **Example** | `right_grapheme('🤦🏼‍♂️🤦🏽‍♀️', 1)` |
| **Result** | `🤦🏽‍♀️` |

###### `rpad(string, count, character)` {#docs:current:sql:functions:text::rpadstring-count-character}



|   |   |
|:--|:--------|
| **Description** |Pads the `string` with the `character` on the right until it has `count` characters. Truncates the `string` on the right if it has more than `count` characters. |
| **Example** | `rpad('hello', 10, '<')` |
| **Result** | `hello<<<<<` |

###### `rtrim(string[, characters])` {#docs:current:sql:functions:text::rtrimstring-characters}



|   |   |
|:--|:--------|
| **Description** |Removes any occurrences of any of the `characters` from the right side of the `string`. `characters` defaults to `space`. |
| **Example 1** | <code class="language-plaintext highlighter-rouge">rtrim('&nbsp;&nbsp;&nbsp;&nbsp;test&nbsp;&nbsp;')</code> |
| **Result** | <code class="language-plaintext highlighter-rouge">&nbsp;&nbsp;&nbsp;&nbsp;test</code> |
| **Example 2** | `rtrim('>>>>test<<', '><')` |
| **Result** | `>>>>test` |

###### `sha1(value)` {#docs:current:sql:functions:text::sha1value}



|   |   |
|:--|:--------|
| **Description** |Returns a `VARCHAR` with the SHA-1 hash of the `value`. |
| **Example** | `sha1('🦆')` |
| **Result** | `949bf843dc338be348fb9525d1eb535d31241d76` |

###### `sha256(value)` {#docs:current:sql:functions:text::sha256value}



|   |   |
|:--|:--------|
| **Description** |Returns a `VARCHAR` with the SHA-256 hash of the `value` |
| **Example** | `sha256('🦆')` |
| **Result** | `d7a5c5e0d1d94c32218539e7e47d4ba9c3c7b77d61332fb60d633dde89e473fb` |

###### `split_part(string, separator, index)` {#docs:current:sql:functions:text::split_partstring-separator-index}



|   |   |
|:--|:--------|
| **Description** |Splits the `string` along the `separator` and returns the data at the (1-based) `index` of the list. If the `index` is outside the bounds of the list, return an empty string (to match PostgreSQL's behavior). |
| **Example** | `split_part('a;b;c', ';', 2)` |
| **Result** | `b` |

###### `starts_with(string, search_string)` {#docs:current:sql:functions:text::starts_withstring-search_string}



|   |   |
|:--|:--------|
| **Description** |Returns `true` if `string` begins with `search_string`. |
| **Example** | `starts_with('abc', 'a')` |
| **Result** | `true` |
| **Alias** | `^@` |

###### `string_split(string, separator)` {#docs:current:sql:functions:text::string_splitstring-separator}



|   |   |
|:--|:--------|
| **Description** |Splits the `string` along the `separator`. |
| **Example** | `string_split('hello-world', '-')` |
| **Result** | `[hello, world]` |
| **Aliases** | `split`, `str_split`, `string_to_array` |

###### `string_split_regex(string, regex[, options])` {#docs:current:sql:functions:text::string_split_regexstring-regex-options}



|   |   |
|:--|:--------|
| **Description** |Splits the `string` along the `regex`. A set of optional [regex `options`](#docs:current:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. |
| **Example** | `string_split_regex('hello world; 42', ';? ')` |
| **Result** | `[hello, world, 42]` |
| **Aliases** | `regexp_split_to_array`, `str_split_regex` |

###### `strip_accents(string)` {#docs:current:sql:functions:text::strip_accentsstring}



|   |   |
|:--|:--------|
| **Description** |Strips accents from `string`. |
| **Example** | `strip_accents('mühleisen')` |
| **Result** | `muhleisen` |

###### `strlen(string)` {#docs:current:sql:functions:text::strlenstring}



|   |   |
|:--|:--------|
| **Description** |Number of bytes in `string`. |
| **Example** | `strlen('🦆')` |
| **Result** | `4` |

###### `substring(string, start[, length])` {#docs:current:sql:functions:text::substringstring-start-length}



|   |   |
|:--|:--------|
| **Description** |Extracts substring starting from character `start` up to the end of the string. If optional argument `length` is set, extracts a substring of `length` characters instead. Note that a `start` value of `1` refers to the first character of the `string`. |
| **Example 1** | `substring('Hello', 2)` |
| **Result** | `ello` |
| **Example 2** | `substring('Hello', 2, 2)` |
| **Result** | `el` |
| **Alias** | `substr` |

###### `substring_grapheme(string, start[, length])` {#docs:current:sql:functions:text::substring_graphemestring-start-length}



|   |   |
|:--|:--------|
| **Description** |Extracts substring starting from grapheme clusters `start` up to the end of the string. If optional argument `length` is set, extracts a substring of `length` grapheme clusters instead. Note that a `start` value of `1` refers to the `first` character of the `string`. |
| **Example 1** | `substring_grapheme('🦆🤦🏼‍♂️🤦🏽‍♀️🦆', 3)` |
| **Result** | `🤦🏽‍♀️🦆` |
| **Example 2** | `substring_grapheme('🦆🤦🏼‍♂️🤦🏽‍♀️🦆', 3, 2)` |
| **Result** | `🤦🏽‍♀️🦆` |

###### `suffix(string, search_string)` {#docs:current:sql:functions:text::suffixstring-search_string}



|   |   |
|:--|:--------|
| **Description** |Returns `true` if `string` ends with `search_string`. Note that [collations](#docs:current:sql:expressions:collations) are not supported. |
| **Example** | `suffix('abc', 'bc')` |
| **Result** | `true` |
| **Alias** | `ends_with` |

###### `to_base(number, radix[, min_length])` {#docs:current:sql:functions:text::to_basenumber-radix-min_length}



|   |   |
|:--|:--------|
| **Description** |Converts `number` to a string in the given base `radix`, optionally padding with leading zeros to `min_length`. |
| **Example** | `to_base(42, 16, 5)` |
| **Result** | `0002A` |

###### `to_base64(blob)` {#docs:current:sql:functions:text::to_base64blob}



|   |   |
|:--|:--------|
| **Description** |Converts a `blob` to a base64 encoded string. |
| **Example** | `to_base64('A'::BLOB)` |
| **Result** | `QQ==` |
| **Alias** | `base64` |

###### `translate(string, from, to)` {#docs:current:sql:functions:text::translatestring-from-to}



|   |   |
|:--|:--------|
| **Description** |Replaces each character in `string` that matches a character in the `from` set with the corresponding character in the `to` set. If `from` is longer than `to`, occurrences of the extra characters in `from` are deleted. |
| **Example** | `translate('12345', '143', 'ax')` |
| **Result** | `a2x5` |

###### `trim(string[, characters])` {#docs:current:sql:functions:text::trimstring-characters}



|   |   |
|:--|:--------|
| **Description** |Removes any occurrences of any of the `characters` from either side of the `string`. `characters` defaults to `space`. |
| **Example 1** | <code class="language-plaintext highlighter-rouge">trim('&nbsp;&nbsp;&nbsp;&nbsp;test&nbsp;&nbsp;')</code> |
| **Result** | `test` |
| **Example 2** | `trim('>>>>test<<', '><')` |
| **Result** | `test` |

###### `unbin(value)` {#docs:current:sql:functions:text::unbinvalue}



|   |   |
|:--|:--------|
| **Description** |Converts a `value` from binary representation to a blob. |
| **Example** | `unbin('0110')` |
| **Result** | `\x06` |
| **Alias** | `from_binary` |

###### `unhex(value)` {#docs:current:sql:functions:text::unhexvalue}



|   |   |
|:--|:--------|
| **Description** |Converts a `value` from hexadecimal representation to a blob. |
| **Example** | `unhex('2A')` |
| **Result** | `*` |
| **Alias** | `from_hex` |

###### `unicode(string)` {#docs:current:sql:functions:text::unicodestring}



|   |   |
|:--|:--------|
| **Description** |Returns an `INTEGER` representing the `unicode` codepoint of the first character in the `string`. |
| **Example** | `[unicode('âbcd'), unicode('â'), unicode(''), unicode(NULL)]` |
| **Result** | `[226, 226, -1, NULL]` |
| **Alias** | `ord` |

###### `upper(string)` {#docs:current:sql:functions:text::upperstring}



|   |   |
|:--|:--------|
| **Description** |Converts `string` to upper case. |
| **Example** | `upper('Hello')` |
| **Result** | `HELLO` |
| **Alias** | `ucase` |

###### `url_decode(string)` {#docs:current:sql:functions:text::url_decodestring}



|   |   |
|:--|:--------|
| **Description** |Decodes a URL from a representation using [Percent-Encoding](https://datatracker.ietf.org/doc/html/rfc3986#section-2.1). |
| **Example** | `url_decode('https%3A%2F%2Fduckdb.org%2Fwhy_duckdb%23portable')` |
| **Result** | `https://duckdb.org/why_duckdb#portable` |

###### `url_encode(string)` {#docs:current:sql:functions:text::url_encodestring}



|   |   |
|:--|:--------|
| **Description** |Encodes a URL to a representation using [Percent-Encoding](https://datatracker.ietf.org/doc/html/rfc3986#section-2.1). |
| **Example** | `url_encode('this string has/ special+ characters>')` |
| **Result** | `this%20string%20has%2F%20special%2B%20characters%3E` |



#### Text Similarity Functions {#docs:current:sql:functions:text::text-similarity-functions}

These functions are used to measure the similarity of two strings using various [similarity measures](https://en.wikipedia.org/wiki/Similarity_measure).




| Function | Description |
|:--|:-------|
| [`damerau_levenshtein(s1, s2)`](#::damerau_levenshteins1-s2) | Extension of Levenshtein distance to also include transposition of adjacent characters as an allowed edit operation. In other words, the minimum number of edit operations (insertions, deletions, substitutions or transpositions) required to change one string to another. Characters of different cases (e.g., `a` and `A`) are considered different. |
| [`editdist3(s1, s2)`](#::levenshteins1-s2) | Alias for `levenshtein`. |
| [`hamming(s1, s2)`](#::hammings1-s2) | The Hamming distance between two strings, i.e., the number of positions with different characters for two strings of equal length. Strings must be of equal length. Characters of different cases (e.g., `a` and `A`) are considered different. |
| [`jaccard(s1, s2)`](#::jaccards1-s2) | The Jaccard similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1. |
| [`jaro_similarity(s1, s2[, score_cutoff])`](#jaro_similaritys1-s2-score_cutoff) | The Jaro similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1. For similarity < `score_cutoff`, 0 is returned instead. `score_cutoff` defaults to 0. |
| [`jaro_winkler_similarity(s1, s2[, score_cutoff])`](#jaro_winkler_similaritys1-s2-score_cutoff) | The Jaro-Winkler similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1. For similarity < `score_cutoff`, 0 is returned instead. `score_cutoff` defaults to 0. |
| [`levenshtein(s1, s2)`](#::levenshteins1-s2) | The minimum number of single-character edits (insertions, deletions or substitutions) required to change one string to the other. Characters of different cases (e.g., `a` and `A`) are considered different. |
| [`mismatches(s1, s2)`](#::hammings1-s2) | Alias for `hamming`. |



###### `damerau_levenshtein(s1, s2)` {#docs:current:sql:functions:text::damerau_levenshteins1-s2}



|   |   |
|:--|:--------|
| **Description** |Extension of Levenshtein distance to also include transposition of adjacent characters as an allowed edit operation. In other words, the minimum number of edit operations (insertions, deletions, substitutions or transpositions) required to change one string to another. Characters of different cases (e.g., `a` and `A`) are considered different. |
| **Example** | `damerau_levenshtein('duckdb', 'udckbd')` |
| **Result** | `2` |

###### `hamming(s1, s2)` {#docs:current:sql:functions:text::hammings1-s2}



|   |   |
|:--|:--------|
| **Description** |The Hamming distance between two strings, i.e., the number of positions with different characters for two strings of equal length. Strings must be of equal length. Characters of different cases (e.g., `a` and `A`) are considered different. |
| **Example** | `hamming('duck', 'luck')` |
| **Result** | `1` |
| **Alias** | `mismatches` |

###### `jaccard(s1, s2)` {#docs:current:sql:functions:text::jaccards1-s2}



|   |   |
|:--|:--------|
| **Description** |The Jaccard similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1. |
| **Example** | `jaccard('duck', 'luck')` |
| **Result** | `0.6` |

###### `jaro_similarity(s1, s2[, score_cutoff])` {#docs:current:sql:functions:text::jaro_similaritys1-s2-score_cutoff}



|   |   |
|:--|:--------|
| **Description** |The Jaro similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1. For similarity < `score_cutoff`, 0 is returned instead. `score_cutoff` defaults to 0. |
| **Example** | `jaro_similarity('duck', 'duckdb')` |
| **Result** | `0.8888888888888888` |

###### `jaro_winkler_similarity(s1, s2[, score_cutoff])` {#docs:current:sql:functions:text::jaro_winkler_similaritys1-s2-score_cutoff}



|   |   |
|:--|:--------|
| **Description** |The Jaro-Winkler similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1. For similarity < `score_cutoff`, 0 is returned instead. `score_cutoff` defaults to 0. |
| **Example** | `jaro_winkler_similarity('duck', 'duckdb')` |
| **Result** | `0.9333333333333333` |

###### `levenshtein(s1, s2)` {#docs:current:sql:functions:text::levenshteins1-s2}



|   |   |
|:--|:--------|
| **Description** |The minimum number of single-character edits (insertions, deletions or substitutions) required to change one string to the other. Characters of different cases (e.g., `a` and `A`) are considered different. |
| **Example** | `levenshtein('duck', 'db')` |
| **Result** | `3` |
| **Alias** | `editdist3` |



#### Formatters {#docs:current:sql:functions:text::formatters}

##### `fmt` Syntax {#docs:current:sql:functions:text::fmt-syntax}

The `format(format, parameters...)` function formats strings, loosely following the syntax of the [{fmt} open-source formatting library](https://fmt.dev/latest/syntax/).

Format without additional parameters:

```sql
SELECT format('Hello world'); -- Hello world
```

Format a string using {}:

```sql
SELECT format('The answer is {}', 42); -- The answer is 42
```

Format a string using positional arguments:

```sql
SELECT format('I''d rather be {1} than {0}.', 'right', 'happy'); -- I'd rather be happy than right.
```

###### Format Specifiers {#docs:current:sql:functions:text::format-specifiers}

| Specifier | Description | Example |
|:-|:------|:---|
| `{:d}`   | integer                                | `654321`       |
| `{:E}`   | scientific notation                    | `3.141593E+00` |
| `{:f}`   | float                                  | `4.560000`     |
| `{:o}`   | octal                                  | `2375761`      |
| `{:s}`   | string                                 | `asd`          |
| `{:x}`   | hexadecimal                            | `9fbf1`        |
| `{:tX}`  | integer, `X` is the thousand separator | `654 321`      |

###### Formatting Types {#docs:current:sql:functions:text::formatting-types}

Integers:

```sql
SELECT format('{} + {} = {}', 3, 5, 3 + 5); -- 3 + 5 = 8
```

Booleans:

```sql
SELECT format('{} != {}', true, false); -- true != false
```

Format datetime values:

```sql
SELECT format('{}', DATE '1992-01-01'); -- 1992-01-01
SELECT format('{}', TIME '12:01:00'); -- 12:01:00
SELECT format('{}', TIMESTAMP '1992-01-01 12:01:00'); -- 1992-01-01 12:01:00
```

Format BLOB:

```sql
SELECT format('{}', BLOB '\x00hello'); -- \x00hello
```

Pad integers with 0s:

```sql
SELECT format('{:04d}', 33); -- 0033
```

> Padding cannot currently be combined with the specification of a thousands separator.

Create timestamps from integers:

```sql
SELECT format('{:02d}:{:02d}:{:02d} {}', 12, 3, 16, 'AM'); -- 12:03:16 AM
```

Convert to hexadecimal:

```sql
SELECT format('{:x}', 123_456_789); -- 75bcd15
```

Convert to binary:

```sql
SELECT format('{:b}', 123_456_789); -- 111010110111100110100010101
```

###### Print Numbers with Thousand Separators {#docs:current:sql:functions:text::print-numbers-with-thousand-separators}

Integers:

```sql
SELECT format('{:,}',  123_456_789); -- 123,456,789
SELECT format('{:t.}', 123_456_789); -- 123.456.789
SELECT format('{:''}', 123_456_789); -- 123'456'789
SELECT format('{:_}',  123_456_789); -- 123_456_789
SELECT format('{:t }', 123_456_789); -- 123 456 789
SELECT format('{:tX}', 123_456_789); -- 123X456X789
```

Float, double and decimal:

```sql
SELECT format('{:,f}',    123456.789); -- 123,456.78900
SELECT format('{:,.2f}',  123456.789); -- 123,456.79
SELECT format('{:t..2f}', 123456.789); -- 123.456,79
```

##### `printf` Syntax {#docs:current:sql:functions:text::printf-syntax}

The `printf(format, parameters...)` function formats strings using the [`printf` syntax](https://cplusplus.com/reference/cstdio/printf/).

Format without additional parameters:

```sql
SELECT printf('Hello world');
```

```text
Hello world
```

Format a string using arguments in a given order:

```sql
SELECT printf('The answer to %s is %d', 'life', 42);
```

```text
The answer to life is 42
```

Format a string using positional arguments `%position$formatter`, e.g., the second parameter as a string is encoded as `%2$s`:

```sql
SELECT printf('I''d rather be %2$s than %1$s.', 'right', 'happy');
```

```text
I'd rather be happy than right.
```

###### Format Specifiers {#docs:current:sql:functions:text::format-specifiers}

| Specifier | Description | Example |
|:-|:------|:---|
| `%c`   | character code to character                                    | `a`            |
| `%d`   | integer                                                        | `654321`       |
| `%Xd`  | integer with thousand separator `X` from `,`, `.`, `''`, `_` | `654_321`      |
| `%E`   | scientific notation                                            | `3.141593E+00` |
| `%f`   | float                                                          | `4.560000`     |
| `%hd`  | integer                                                        | `654321`       |
| `%hhd` | integer                                                        | `654321`       |
| `%lld` | integer                                                        | `654321`       |
| `%o`   | octal                                                          | `2375761`      |
| `%s`   | string                                                         | `asd`          |
| `%x`   | hexadecimal                                                    | `9fbf1`        |

###### Formatting Types {#docs:current:sql:functions:text::formatting-types}

Integers:

```sql
SELECT printf('%d + %d = %d', 3, 5, 3 + 5); -- 3 + 5 = 8
```

Booleans:

```sql
SELECT printf('%s != %s', true, false); -- true != false
```

Format datetime values:

```sql
SELECT printf('%s', DATE '1992-01-01'); -- 1992-01-01
SELECT printf('%s', TIME '12:01:00'); -- 12:01:00
SELECT printf('%s', TIMESTAMP '1992-01-01 12:01:00'); -- 1992-01-01 12:01:00
```

Format BLOB:

```sql
SELECT printf('%s', BLOB '\x00hello'); -- \x00hello
```

Pad integers with 0s:

```sql
SELECT printf('%04d', 33); -- 0033
```

Create timestamps from integers:

```sql
SELECT printf('%02d:%02d:%02d %s', 12, 3, 16, 'AM'); -- 12:03:16 AM
```

Convert to hexadecimal:

```sql
SELECT printf('%x', 123_456_789); -- 75bcd15
```

Convert to binary:

```sql
SELECT printf('%b', 123_456_789); -- 111010110111100110100010101
```

###### Thousand Separators {#docs:current:sql:functions:text::thousand-separators}

Integers:

```sql
SELECT printf('%,d',  123_456_789); -- 123,456,789
SELECT printf('%.d',  123_456_789); -- 123.456.789
SELECT printf('%''d', 123_456_789); -- 123'456'789
SELECT printf('%_d',  123_456_789); -- 123_456_789
```

Float, double and decimal:

```sql
SELECT printf('%,f',   123456.789); -- 123,456.789000
SELECT printf('%,.2f', 123456.789); -- 123,456.79
```

### Time Functions {#docs:current:sql:functions:time}



This section describes functions and operators for examining and manipulating [`TIME` values](#docs:current:sql:data_types:time).

#### Time Operators {#docs:current:sql:functions:time::time-operators}

The table below shows the available mathematical operators for `TIME` types.

| Operator | Description | Example | Result |
|:-|:---|:----|:--|
| `+` | addition of an `INTERVAL` | `TIME '01:02:03' + INTERVAL 5 HOUR` | `06:02:03` |
| `-` | subtraction of an `INTERVAL` | `TIME '06:02:03' - INTERVAL 5 HOUR` | `01:02:03` |

#### Time Functions {#docs:current:sql:functions:time::time-functions}

The table below shows the available scalar functions for `TIME` types.

| Name | Description |
|:--|:-------|
| [`date_diff(part, starttime, endtime)`](#::date_diffpart-starttime-endtime) | The number of [`part`](#docs:current:sql:functions:datepart) boundaries between `starttime` and `endtime`, inclusive of the larger time and exclusive of the smaller time. |
| [`date_part(part, time)`](#::date_partpart-time) | Get [subfield](#docs:current:sql:functions:datepart) (equivalent to `extract`). |
| [`date_sub(part, starttime, endtime)`](#::date_subpart-starttime-endtime) | The signed length of the interval between `starttime` and `endtime`, truncated to whole multiples of [`part`](#docs:current:sql:functions:datepart). |
| [`extract(part FROM time)`](#::extractpart-from-time) | Get subfield from a time. |
| [`get_current_time()`](#::get_current_time) | Current time (start of current transaction). |
| [`make_time(bigint, bigint, double)`](#::make_timebigint-bigint-double) | The time for the given parts. |

The only [date parts](#docs:current:sql:functions:datepart) that are defined for times are `epoch`, `hours`, `minutes`, `seconds`, `milliseconds` and `microseconds`.

###### `date_diff(part, starttime, endtime)` {#docs:current:sql:functions:time::date_diffpart-starttime-endtime}



|   |   |
|:--|:--------|
| **Description** |The number of [`part`](#docs:current:sql:functions:datepart) boundaries between `starttime` and `endtime`, inclusive of the larger time and exclusive of the smaller time. |
| **Example** | `date_diff('hour', TIME '01:02:03', TIME '06:01:03')` |
| **Result** | `5` |
| **Alias** | `datediff` |

###### `date_part(part, time)` {#docs:current:sql:functions:time::date_partpart-time}



|   |   |
|:--|:--------|
| **Description** |Get [subfield](#docs:current:sql:functions:datepart) (equivalent to `extract`). |
| **Example** | `date_part('minute', TIME '14:21:13')` |
| **Result** | `21` |
| **Alias** | `datepart` |

###### `date_sub(part, starttime, endtime)` {#docs:current:sql:functions:time::date_subpart-starttime-endtime}



|   |   |
|:--|:--------|
| **Description** |The signed length of the interval between `starttime` and `endtime`, truncated to whole multiples of [`part`](#docs:current:sql:functions:datepart). |
| **Example** | `date_sub('hour', TIME '01:02:03', TIME '06:01:03')` |
| **Result** | `4` |
| **Alias** | `datesub` |

###### `extract(part FROM time)` {#docs:current:sql:functions:time::extractpart-from-time}



|   |   |
|:--|:--------|
| **Description** |Get subfield from a time. |
| **Example** | `extract('hour' FROM TIME '14:21:13')` |
| **Result** | `14` |

###### `get_current_time()` {#docs:current:sql:functions:time::get_current_time}



|   |   |
|:--|:--------|
| **Description** |Current time (start of current transaction) in the local time zone as `TIMETZ`. |
| **Example** | `get_current_time()` |
| **Result** | `06:09:59.988+2` |
| **Alias** | `current_time` (no parentheses necessary) |

###### `make_time(bigint, bigint, double)` {#docs:current:sql:functions:time::make_timebigint-bigint-double}



|   |   |
|:--|:--------|
| **Description** |The time for the given parts. |
| **Example** | `make_time(13, 34, 27.123456)` |
| **Result** | `13:34:27.123456` |

### Timestamp Functions {#docs:current:sql:functions:timestamp}



This section describes functions and operators for examining and manipulating [`TIMESTAMP` values](#docs:current:sql:data_types:timestamp).
See also the related [`TIMESTAMPTZ` functions](#docs:current:sql:functions:timestamptz).

#### Timestamp Operators {#docs:current:sql:functions:timestamp::timestamp-operators}

The table below shows the available mathematical operators for `TIMESTAMP` types.

| Operator | Description | Example | Result |
|:-|:--|:----|:--|
| `+` | addition of an `INTERVAL` | `TIMESTAMP '1992-03-22 01:02:03' + INTERVAL 5 DAY` | `1992-03-27 01:02:03` |
| `-` | subtraction of `TIMESTAMP`s | `TIMESTAMP '1992-03-27' - TIMESTAMP '1992-03-22'` | `5 days` |
| `-` | subtraction of an `INTERVAL` | `TIMESTAMP '1992-03-27 01:02:03' - INTERVAL 5 DAY` | `1992-03-22 01:02:03` |

Adding to or subtracting from [infinite values](#docs:current:sql:data_types:timestamp::special-values) produces the same infinite value.

#### Scalar Timestamp Functions {#docs:current:sql:functions:timestamp::scalar-timestamp-functions}

The table below shows the available scalar functions for `TIMESTAMP` values.

| Name | Description |
|:--|:-------|
| [`age(timestamp, timestamp)`](#::agetimestamp-timestamp) | Subtract arguments, resulting in the time difference between the two timestamps. |
| [`age(timestamp)`](#::agetimestamp) | Subtract from current_date. |
| [`ago(interval)`](#::agointerval) | Subtracts an interval from the current timestamp. |
| [`century(timestamp)`](#::centurytimestamp) | Extracts the century of a timestamp. |
| [`current_localtimestamp()`](#::current_localtimestamp) | Returns the current timestamp (at the start of the transaction). |
| [`date_diff(part, starttimestamp, endtimestamp)`](#::date_diffpart-starttimestamp-endtimestamp) | The number of [`part`](#docs:current:sql:functions:datepart) boundaries between `starttimestamp` and `endtimestamp`, inclusive of the larger timestamp and exclusive of the smaller timestamp. |
| [`date_part([part, ...], timestamp)`](#date_partpart--timestamp) | Get the listed [subfields](#docs:current:sql:functions:datepart) as a `struct`. The list must be constant. |
| [`date_part(part, timestamp)`](#::date_partpart-timestamp) | Get [subfield](#docs:current:sql:functions:datepart) (equivalent to `extract`). |
| [`date_sub(part, starttimestamp, endtimestamp)`](#::date_subpart-starttimestamp-endtimestamp) | The signed length of the interval between `starttimestamp` and `endtimestamp`, truncated to whole multiples of [`part`](#docs:current:sql:functions:datepart). |
| [`date_trunc(part, timestamp)`](#::date_truncpart-timestamp) | Truncate to specified [precision](#docs:current:sql:functions:datepart). |
| [`dayname(timestamp)`](#::daynametimestamp) | The (English) name of the weekday. |
| [`epoch_ms(timestamp)`](#::epoch_mstimestamp) | Returns the total number of milliseconds since the epoch. |
| [`epoch_ns(timestamp)`](#::epoch_nstimestamp) | Returns the total number of nanoseconds since the epoch. |
| [`epoch_us(timestamp)`](#::epoch_ustimestamp) | Returns the total number of microseconds since the epoch. |
| [`epoch(timestamp)`](#::epochtimestamp) | Returns the total number of seconds since the epoch. |
| [`extract(field FROM timestamp)`](#::extractfield-from-timestamp) | Get [subfield](#docs:current:sql:functions:datepart) from a timestamp. |
| [`greatest(timestamp, timestamp)`](#::greatesttimestamp-timestamp) | The later of two timestamps. |
| [`isfinite(timestamp)`](#::isfinitetimestamp) | Returns true if the timestamp is finite, false otherwise. |
| [`isinf(timestamp)`](#::isinftimestamp) | Returns true if the timestamp is infinite, false otherwise. |
| [`julian(timestamp)`](#::juliantimestamp) | Extract the Julian Day number from a timestamp. |
| [`last_day(timestamp)`](#::last_daytimestamp) | The last day of the month. |
| [`least(timestamp, timestamp)`](#::leasttimestamp-timestamp) | The earlier of two timestamps. |
| [`make_timestamp(bigint, bigint, bigint, bigint, bigint, double)`](#::make_timestampbigint-bigint-bigint-bigint-bigint-double) | The timestamp for the given parts. |
| [`make_timestamp(microseconds)`](#::make_timestampmicroseconds) | Converts microseconds since the epoch to a timestamp. |
| [`make_timestamp_ms(milliseconds)`](#::make_timestamp_msmilliseconds) | Converts milliseconds since the epoch to a timestamp. |
| [`make_timestamp_ns(nanoseconds)`](#::make_timestamp_nsnanoseconds) | Converts nanoseconds since the epoch to a timestamp. |
| [`monthname(timestamp)`](#::monthnametimestamp) | The (English) name of the month. |
| [`strftime(timestamp, format)`](#::strftimetimestamp-format) | Converts timestamp to string according to the [format string](#docs:current:sql:functions:dateformat::format-specifiers). |
| [`strptime(text, format-list)`](#::strptimetext-format-list) | Converts the string `text` to timestamp applying the [format strings](#docs:current:sql:functions:dateformat) in the list until one succeeds. Throws an error on failure. To return `NULL` on failure, use [`try_strptime`](#::try_strptimetext-format-list). |
| [`strptime(text, format)`](#::strptimetext-format) | Converts the string `text` to timestamp according to the [format string](#docs:current:sql:functions:dateformat::format-specifiers). Throws an error on failure. To return `NULL` on failure, use [`try_strptime`](#::try_strptimetext-format). |
| [`time_bucket(bucket_width, timestamp[, offset])`](#time_bucketbucket_width-timestamp-offset) | Truncate `timestamp` to a grid of width `bucket_width`. The grid is anchored at `2000-01-01 00:00:00[ + offset]` when `bucket_width` is a number of months or coarser units, else `2000-01-03 00:00:00[ + offset]`. Note that `2000-01-03` is a Monday. |
| [`time_bucket(bucket_width, timestamp[, origin])`](#time_bucketbucket_width-timestamp-origin) | Truncate `timestamp` to a grid of width `bucket_width`. The grid is anchored at the `origin` timestamp, which defaults to `2000-01-01 00:00:00` when `bucket_width` is a number of months or coarser units, else `2000-01-03 00:00:00`. Note that `2000-01-03` is a Monday. |
| [`try_strptime(text, format-list)`](#::try_strptimetext-format-list) | Converts the string `text` to timestamp applying the [format strings](#docs:current:sql:functions:dateformat) in the list until one succeeds. Returns `NULL` on failure. |
| [`try_strptime(text, format)`](#::try_strptimetext-format) | Converts the string `text` to timestamp according to the [format string](#docs:current:sql:functions:dateformat::format-specifiers). Returns `NULL` on failure. |

There are also dedicated extraction functions to get the [subfields](#docs:current:sql:functions:datepart).

Functions applied to infinite dates will either return the same infinite dates
(e.g., `greatest`) or `NULL` (e.g., `date_part`) depending on what “makes sense”.
In general, if the function needs to examine the parts of the infinite date, the result will be `NULL`.

###### `age(timestamp, timestamp)` {#docs:current:sql:functions:timestamp::agetimestamp-timestamp}



|   |   |
|:--|:--------|
| **Description** |Subtract arguments, resulting in the time difference between the two timestamps. |
| **Example** | `age(TIMESTAMP '2001-04-10', TIMESTAMP '1992-09-20')` |
| **Result** | `8 years 6 months 20 days` |

###### `age(timestamp)` {#docs:current:sql:functions:timestamp::agetimestamp}



|   |   |
|:--|:--------|
| **Description** |Subtract from current_date. |
| **Example** | `age(TIMESTAMP '1992-09-20')` |
| **Result** | `29 years 1 month 27 days 12:39:00.844` |

###### `ago(interval)` {#docs:current:sql:functions:timestamp::agointerval}



|   |   |
|:--|:--------|
| **Description** |Subtracts an interval from the current timestamp, returning a timestamp in the past. Equivalent to `current_timestamp - interval`. |
| **Example** | `ago(INTERVAL 1 HOUR)` |
| **Result** | `2024-11-30 12:28:48.895` (if current time is `2024-11-30 13:28:48.895`) |

###### `century(timestamp)` {#docs:current:sql:functions:timestamp::centurytimestamp}



|   |   |
|:--|:--------|
| **Description** |Extracts the century of a timestamp. |
| **Example** | `century(TIMESTAMP '1992-03-22')` |
| **Result** | `20` |

###### `current_localtimestamp()` {#docs:current:sql:functions:timestamp::current_localtimestamp}



|   |   |
|:--|:--------|
| **Description** |Returns the current timestamp with time zone (at the start of the transaction). |
| **Example** | `current_localtimestamp()` |
| **Result** | `2024-11-30 13:28:48.895` |

###### `date_diff(part, starttimestamp, endtimestamp)` {#docs:current:sql:functions:timestamp::date_diffpart-starttimestamp-endtimestamp}



|   |   |
|:--|:--------|
| **Description** |The signed number of [`part`](#docs:current:sql:functions:datepart) boundaries between `starttimestamp` and `endtimestamp`, inclusive of the larger timestamp and exclusive of the smaller timestamp. |
| **Example** | `date_diff('hour', TIMESTAMP '1992-09-30 23:59:59', TIMESTAMP '1992-10-01 01:58:00')` |
| **Result** | `2` |

###### `date_part([part, ...], timestamp)` {#docs:current:sql:functions:timestamp::date_partpart--timestamp}



|   |   |
|:--|:--------|
| **Description** |Get the listed [subfields](#docs:current:sql:functions:datepart) as a `struct`. The list must be constant. |
| **Example** | `date_part(['year', 'month', 'day'], TIMESTAMP '1992-09-20 20:38:40')` |
| **Result** | `{year: 1992, month: 9, day: 20}` |

###### `date_part(part, timestamp)` {#docs:current:sql:functions:timestamp::date_partpart-timestamp}



|   |   |
|:--|:--------|
| **Description** |Get [subfield](#docs:current:sql:functions:datepart) (equivalent to `extract`). |
| **Example** | `date_part('minute', TIMESTAMP '1992-09-20 20:38:40')` |
| **Result** | `38` |

###### `date_sub(part, starttimestamp, endtimestamp)` {#docs:current:sql:functions:timestamp::date_subpart-starttimestamp-endtimestamp}



|   |   |
|:--|:--------|
| **Description** |The signed length of the interval between `starttimestamp` and `endtimestamp`, truncated to whole multiples of [`part`](#docs:current:sql:functions:datepart). |
| **Example** | `date_sub('hour', TIMESTAMP '1992-09-30 23:59:59', TIMESTAMP '1992-10-01 01:58:00')` |
| **Result** | `1` |

###### `date_trunc(part, timestamp)` {#docs:current:sql:functions:timestamp::date_truncpart-timestamp}



|   |   |
|:--|:--------|
| **Description** |Truncate to specified [precision](#docs:current:sql:functions:datepart). |
| **Example** | `date_trunc('hour', TIMESTAMP '1992-09-20 20:38:40')` |
| **Result** | `1992-09-20 20:00:00` |

###### `dayname(timestamp)` {#docs:current:sql:functions:timestamp::daynametimestamp}



|   |   |
|:--|:--------|
| **Description** |The (English) name of the weekday. |
| **Example** | `dayname(TIMESTAMP '1992-03-22')` |
| **Result** | `Sunday` |

###### `epoch_ms(timestamp)` {#docs:current:sql:functions:timestamp::epoch_mstimestamp}



|   |   |
|:--|:--------|
| **Description** |Returns the total number of milliseconds since the epoch. |
| **Example** | `epoch_ms(TIMESTAMP '2021-08-03 11:59:44.123456')` |
| **Result** | `1627991984123` |

###### `epoch_ns(timestamp)` {#docs:current:sql:functions:timestamp::epoch_nstimestamp}



|   |   |
|:--|:--------|
| **Description** |Returns the total number of nanoseconds since the epoch. |
| **Example** | `epoch_ns(TIMESTAMP '2021-08-03 11:59:44.123456')` |
| **Result** | `1627991984123456000` |

###### `epoch_us(timestamp)` {#docs:current:sql:functions:timestamp::epoch_ustimestamp}



|   |   |
|:--|:--------|
| **Description** |Returns the total number of microseconds since the epoch. |
| **Example** | `epoch_us(TIMESTAMP '2021-08-03 11:59:44.123456')` |
| **Result** | `1627991984123456` |

###### `epoch(timestamp)` {#docs:current:sql:functions:timestamp::epochtimestamp}



|   |   |
|:--|:--------|
| **Description** |Returns the total number of seconds since the epoch. |
| **Example** | `epoch('2022-11-07 08:43:04'::TIMESTAMP);` |
| **Result** | `1667810584` |

###### `extract(field FROM timestamp)` {#docs:current:sql:functions:timestamp::extractfield-from-timestamp}



|   |   |
|:--|:--------|
| **Description** |Get [subfield](#docs:current:sql:functions:datepart) from a timestamp. |
| **Example** | `extract('hour' FROM TIMESTAMP '1992-09-20 20:38:48')` |
| **Result** | `20` |

###### `greatest(timestamp, timestamp)` {#docs:current:sql:functions:timestamp::greatesttimestamp-timestamp}



|   |   |
|:--|:--------|
| **Description** |The later of two timestamps. |
| **Example** | `greatest(TIMESTAMP '1992-09-20 20:38:48', TIMESTAMP '1992-03-22 01:02:03.1234')` |
| **Result** | `1992-09-20 20:38:48` |

###### `isfinite(timestamp)` {#docs:current:sql:functions:timestamp::isfinitetimestamp}



|   |   |
|:--|:--------|
| **Description** |Returns true if the timestamp is finite, false otherwise. |
| **Example** | `isfinite(TIMESTAMP '1992-03-07')` |
| **Result** | `true` |

###### `isinf(timestamp)` {#docs:current:sql:functions:timestamp::isinftimestamp}



|   |   |
|:--|:--------|
| **Description** |Returns true if the timestamp is infinite, false otherwise. |
| **Example** | `isinf(TIMESTAMP '-infinity')` |
| **Result** | `true` |

###### `julian(timestamp)` {#docs:current:sql:functions:timestamp::juliantimestamp}



|   |   |
|:--|:--------|
| **Description** |Extract the Julian Day number from a timestamp. |
| **Example** | `julian(TIMESTAMP '1992-03-22 01:02:03.1234')` |
| **Result** | `2448704.043091706` |

###### `last_day(timestamp)` {#docs:current:sql:functions:timestamp::last_daytimestamp}



|   |   |
|:--|:--------|
| **Description** |The last day of the month. |
| **Example** | `last_day(TIMESTAMP '1992-03-22 01:02:03.1234')` |
| **Result** | `1992-03-31` |

###### `least(timestamp, timestamp)` {#docs:current:sql:functions:timestamp::leasttimestamp-timestamp}



|   |   |
|:--|:--------|
| **Description** |The earlier of two timestamps. |
| **Example** | `least(TIMESTAMP '1992-09-20 20:38:48', TIMESTAMP '1992-03-22 01:02:03.1234')` |
| **Result** | `1992-03-22 01:02:03.1234` |

###### `make_timestamp(bigint, bigint, bigint, bigint, bigint, double)` {#docs:current:sql:functions:timestamp::make_timestampbigint-bigint-bigint-bigint-bigint-double}



|   |   |
|:--|:--------|
| **Description** |The timestamp for the given parts. |
| **Example** | `make_timestamp(1992, 9, 20, 13, 34, 27.123456)` |
| **Result** | `1992-09-20 13:34:27.123456` |

###### `make_timestamp(microseconds)` {#docs:current:sql:functions:timestamp::make_timestampmicroseconds}



|   |   |
|:--|:--------|
| **Description** |Converts microseconds since the epoch to a timestamp. |
| **Example** | `make_timestamp(1667810584123456)` |
| **Result** | `2022-11-07 08:43:04.123456` |

###### `make_timestamp_ms(milliseconds)` {#docs:current:sql:functions:timestamp::make_timestamp_msmilliseconds}



|   |   |
|:--|:--------|
| **Description** |Converts milliseconds since the epoch to a timestamp. |
| **Example** | `make_timestamp_ms(1667810584123)` |
| **Result** | `2022-11-07 08:43:04.123` |

###### `make_timestamp_ns(nanoseconds)` {#docs:current:sql:functions:timestamp::make_timestamp_nsnanoseconds}



|   |   |
|:--|:--------|
| **Description** |Converts nanoseconds since the epoch to a timestamp. |
| **Example** | `make_timestamp_ns(1667810584123456789)` |
| **Result** | `2022-11-07 08:43:04.123456789` |

###### `monthname(timestamp)` {#docs:current:sql:functions:timestamp::monthnametimestamp}



|   |   |
|:--|:--------|
| **Description** |The (English) name of the month. |
| **Example** | `monthname(TIMESTAMP '1992-09-20')` |
| **Result** | `September` |

###### `strftime(timestamp, format)` {#docs:current:sql:functions:timestamp::strftimetimestamp-format}



|   |   |
|:--|:--------|
| **Description** |Converts timestamp to string according to the [format string](#docs:current:sql:functions:dateformat::format-specifiers). |
| **Example** | `strftime(timestamp '1992-01-01 20:38:40', '%a, %-d %B %Y - %I:%M:%S %p')` |
| **Result** | `Wed, 1 January 1992 - 08:38:40 PM` |

###### `strptime(text, format-list)` {#docs:current:sql:functions:timestamp::strptimetext-format-list}



|   |   |
|:--|:--------|
| **Description** |Converts the string `text` to timestamp applying the [format strings](#docs:current:sql:functions:dateformat) in the list until one succeeds. Throws an error on failure. To return `NULL` on failure, use [`try_strptime`](#::try_strptimetext-format-list). |
| **Example** | `strptime('4/15/2023 10:56:00', ['%d/%m/%Y %H:%M:%S', '%m/%d/%Y %H:%M:%S'])` |
| **Result** | `2023-04-15 10:56:00` |

###### `strptime(text, format)` {#docs:current:sql:functions:timestamp::strptimetext-format}



|   |   |
|:--|:--------|
| **Description** |Converts the string `text` to timestamp according to the [format string](#docs:current:sql:functions:dateformat::format-specifiers). Throws an error on failure. To return `NULL` on failure, use [`try_strptime`](#::try_strptimetext-format). |
| **Example** | `strptime('Wed, 1 January 1992 - 08:38:40 PM', '%a, %-d %B %Y - %I:%M:%S %p')` |
| **Result** | `1992-01-01 20:38:40` |

###### `time_bucket(bucket_width, timestamp[, offset])` {#docs:current:sql:functions:timestamp::time_bucketbucket_width-timestamp-offset}



|   |   |
|:--|:--------|
| **Description** |Truncate `timestamp` to a grid of width `bucket_width`. The grid includes `2000-01-01 00:00:00[ + offset]` when `bucket_width` is a number of months or coarser units, else `2000-01-03 00:00:00[ + offset]`. Note that `2000-01-03` is a Monday. |
| **Example** | `time_bucket(INTERVAL '10 minutes', TIMESTAMP '1992-04-20 15:26:00-07', INTERVAL '5 minutes')` |
| **Result** | `1992-04-20 15:25:00` |

###### `time_bucket(bucket_width, timestamp[, origin])` {#docs:current:sql:functions:timestamp::time_bucketbucket_width-timestamp-origin}



|   |   |
|:--|:--------|
| **Description** |Truncate `timestamp` to a grid of width `bucket_width`. The grid includes the `origin` timestamp, which defaults to `2000-01-01 00:00:00` when `bucket_width` is a number of months or coarser units, else `2000-01-03 00:00:00`. Note that `2000-01-03` is a Monday. |
| **Example** | `time_bucket(INTERVAL '2 weeks', TIMESTAMP '1992-04-20 15:26:00', TIMESTAMP '1992-04-01 00:00:00')` |
| **Result** | `1992-04-15 00:00:00` |

###### `try_strptime(text, format-list)` {#docs:current:sql:functions:timestamp::try_strptimetext-format-list}



|   |   |
|:--|:--------|
| **Description** |Converts the string `text` to timestamp applying the [format strings](#docs:current:sql:functions:dateformat) in the list until one succeeds. Returns `NULL` on failure. |
| **Example** | `try_strptime('4/15/2023 10:56:00', ['%d/%m/%Y %H:%M:%S', '%m/%d/%Y %H:%M:%S'])` |
| **Result** | `2023-04-15 10:56:00` |

###### `try_strptime(text, format)` {#docs:current:sql:functions:timestamp::try_strptimetext-format}



|   |   |
|:--|:--------|
| **Description** |Converts the string `text` to timestamp according to the [format string](#docs:current:sql:functions:dateformat::format-specifiers). Returns `NULL` on failure. |
| **Example** | `try_strptime('Wed, 1 January 1992 - 08:38:40 PM', '%a, %-d %B %Y - %I:%M:%S %p')` |
| **Result** | `1992-01-01 20:38:40` |

#### Timestamp Table Functions {#docs:current:sql:functions:timestamp::timestamp-table-functions}

The table below shows the available table functions for `TIMESTAMP` types.

| Name | Description |
|:--|:-------|
| [`generate_series(timestamp, timestamp, interval)`](#::generate_seriestimestamp-timestamp-interval) | Generate a table of timestamps in the closed range, stepping by the interval. |
| [`range(timestamp, timestamp, interval)`](#::rangetimestamp-timestamp-interval) | Generate a table of timestamps in the half open range, stepping by the interval. |

> Infinite values are not allowed as table function bounds.

###### `generate_series(timestamp, timestamp, interval)` {#docs:current:sql:functions:timestamp::generate_seriestimestamp-timestamp-interval}



|   |   |
|:--|:--------|
| **Description** |Generate a table of timestamps in the closed range, stepping by the interval. |
| **Example** | `generate_series(TIMESTAMP '2001-04-10', TIMESTAMP '2001-04-11', INTERVAL 30 MINUTE)` |

###### `range(timestamp, timestamp, interval)` {#docs:current:sql:functions:timestamp::rangetimestamp-timestamp-interval}



|   |   |
|:--|:--------|
| **Description** |Generate a table of timestamps in the half open range, stepping by the interval. |
| **Example** | `range(TIMESTAMP '2001-04-10', TIMESTAMP '2001-04-11', INTERVAL 30 MINUTE)` |

### Timestamp with Time Zone Functions {#docs:current:sql:functions:timestamptz}



This section describes functions and operators for examining and manipulating [`TIMESTAMP WITH TIME ZONE`
(or `TIMESTAMPTZ`) values](#docs:current:sql:data_types:timestamp). See also the related [`TIMESTAMP` functions](#docs:current:sql:functions:timestamp).

Time zone support is provided by the built-in [ICU extension](#docs:current:core_extensions:icu).

In the examples below, the current time zone is presumed to be `America/Los_Angeles`
using the Gregorian calendar.

#### Built-In Timestamp with Time Zone Functions {#docs:current:sql:functions:timestamptz::built-in-timestamp-with-time-zone-functions}

The table below shows the available scalar functions for `TIMESTAMPTZ` values.
Since these functions do not involve binning or display,
they are always available.

| Name | Description |
|:--|:-------|
| [`current_timestamp`](#::current_timestamp) | Current date and time (start of current transaction). |
| [`get_current_timestamp()`](#::get_current_timestamp) | Current date and time (start of current transaction). |
| [`greatest(timestamptz, timestamptz)`](#::greatesttimestamptz-timestamptz) | The later of two timestamps. |
| [`isfinite(timestamptz)`](#::isfinitetimestamptz) | Returns true if the timestamp with time zone is finite, false otherwise. |
| [`isinf(timestamptz)`](#::isinftimestamptz) | Returns true if the timestamp with time zone is infinite, false otherwise. |
| [`least(timestamptz, timestamptz)`](#::leasttimestamptz-timestamptz) | The earlier of two timestamps. |
| [`now()`](#::now) | Current date and time (start of current transaction). |
| [`timetz_byte_comparable(timetz)`](#::timetz_byte_comparabletimetz) | Converts a `TIME WITH TIME ZONE` to a `UBIGINT` sort key. |
| [`to_timestamp(double)`](#::to_timestampdouble) | Converts seconds since the epoch to a timestamp with time zone. |
| [`transaction_timestamp()`](#::transaction_timestamp) | Current date and time (start of current transaction). |

###### `current_timestamp` {#docs:current:sql:functions:timestamptz::current_timestamp}



|   |   |
|:--|:--------|
| **Description** |Current date and time (start of current transaction). |
| **Example** | `current_timestamp` |
| **Result** | `2022-10-08 12:44:46.122-07` |

###### `get_current_timestamp()` {#docs:current:sql:functions:timestamptz::get_current_timestamp}



|   |   |
|:--|:--------|
| **Description** |Current date and time (start of current transaction). |
| **Example** | `get_current_timestamp()` |
| **Result** | `2022-10-08 12:44:46.122-07` |

###### `greatest(timestamptz, timestamptz)` {#docs:current:sql:functions:timestamptz::greatesttimestamptz-timestamptz}



|   |   |
|:--|:--------|
| **Description** |The later of two timestamps. |
| **Example** | `greatest(TIMESTAMPTZ '1992-09-20 20:38:48', TIMESTAMPTZ '1992-03-22 01:02:03.1234')` |
| **Result** | `1992-09-20 20:38:48-07` |

###### `isfinite(timestamptz)` {#docs:current:sql:functions:timestamptz::isfinitetimestamptz}



|   |   |
|:--|:--------|
| **Description** |Returns true if the timestamp with time zone is finite, false otherwise. |
| **Example** | `isfinite(TIMESTAMPTZ '1992-03-07')` |
| **Result** | `true` |

###### `isinf(timestamptz)` {#docs:current:sql:functions:timestamptz::isinftimestamptz}



|   |   |
|:--|:--------|
| **Description** |Returns true if the timestamp with time zone is infinite, false otherwise. |
| **Example** | `isinf(TIMESTAMPTZ '-infinity')` |
| **Result** | `true` |

###### `least(timestamptz, timestamptz)` {#docs:current:sql:functions:timestamptz::leasttimestamptz-timestamptz}



|   |   |
|:--|:--------|
| **Description** |The earlier of two timestamps. |
| **Example** | `least(TIMESTAMPTZ '1992-09-20 20:38:48', TIMESTAMPTZ '1992-03-22 01:02:03.1234')` |
| **Result** | `1992-03-22 01:02:03.1234-08` |

###### `now()` {#docs:current:sql:functions:timestamptz::now}



|   |   |
|:--|:--------|
| **Description** |Current date and time (start of current transaction). |
| **Example** | `now()` |
| **Result** | `2022-10-08 12:44:46.122-07` |

###### `timetz_byte_comparable(timetz)` {#docs:current:sql:functions:timestamptz::timetz_byte_comparabletimetz}



|   |   |
|:--|:--------|
| **Description** |Converts a `TIME WITH TIME ZONE` to a `UBIGINT` sort key. |
| **Example** | `timetz_byte_comparable('18:18:16.21-07:00'::TIMETZ)` |
| **Result** | `2494691656335442799` |

###### `to_timestamp(double)` {#docs:current:sql:functions:timestamptz::to_timestampdouble}



|   |   |
|:--|:--------|
| **Description** |Converts seconds since the epoch to a timestamp with time zone. |
| **Example** | `to_timestamp(1284352323.5)` |
| **Result** | `2010-09-13 04:32:03.5+00` |

###### `transaction_timestamp()` {#docs:current:sql:functions:timestamptz::transaction_timestamp}



|   |   |
|:--|:--------|
| **Description** |Current date and time (start of current transaction). |
| **Example** | `transaction_timestamp()` |
| **Result** | `2022-10-08 12:44:46.122-07` |

#### Timestamp with Time Zone Strings {#docs:current:sql:functions:timestamptz::timestamp-with-time-zone-strings}

With no time zone extension loaded, `TIMESTAMPTZ` values will be cast to and from strings
using offset notation.
This will let you specify an instant correctly without access to time zone information.
For portability, `TIMESTAMPTZ` values will always be displayed using GMT offsets:

```sql
SELECT '2022-10-08 13:13:34-07'::TIMESTAMPTZ;
```

```text
2022-10-08 20:13:34+00
```

If a time zone extension such as ICU is loaded, then a time zone can be parsed from a string
and cast to a representation in the local time zone:

```sql
SELECT '2022-10-08 13:13:34 Europe/Amsterdam'::TIMESTAMPTZ::VARCHAR;
```

```text
2022-10-08 04:13:34-07 -- the offset will differ based on your local time zone
```

#### ICU Timestamp with Time Zone Operators {#docs:current:sql:functions:timestamptz::icu-timestamp-with-time-zone-operators}

The table below shows the available mathematical operators for `TIMESTAMP WITH TIME ZONE` values
provided by the ICU extension.

| Operator | Description | Example | Result |
|:-|:--|:----|:--|
| `+` | addition of an `INTERVAL` | `TIMESTAMPTZ '1992-03-22 01:02:03' + INTERVAL 5 DAY` | `1992-03-27 01:02:03` |
| `-` | subtraction of `TIMESTAMPTZ`s | `TIMESTAMPTZ '1992-03-27' - TIMESTAMPTZ '1992-03-22'` | `5 days` |
| `-` | subtraction of an `INTERVAL` | `TIMESTAMPTZ '1992-03-27 01:02:03' - INTERVAL 5 DAY` | `1992-03-22 01:02:03` |

Adding to or subtracting from [infinite values](#docs:current:sql:data_types:timestamp::special-values) produces the same infinite value.

Addition and subtraction of intervals uses the [ICU Calendar add function](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1Calendar.html#aa6e19a88ca2225eddcbbe82313c9c095).
For positive intervals (forwards in time) the fields are incremented from least to most significant.
For negative intervals (backwards in time) the fields are decremented from most to least significant.
This produces the same results as Postgres, but does not match some [more recent calendar RFCs](https://www.rfc-editor.org/rfc/rfc5545).

#### ICU Timestamp with Time Zone Functions {#docs:current:sql:functions:timestamptz::icu-timestamp-with-time-zone-functions}

The table below shows the ICU provided scalar functions for `TIMESTAMP WITH TIME ZONE` values.

| Name | Description |
|:--|:-------|
| [`age(timestamptz, timestamptz)`](#::agetimestamptz-timestamptz) | Subtract arguments, resulting in the time difference between the two timestamps. |
| [`age(timestamptz)`](#::agetimestamptz) | Subtract from current_date. |
| [`date_diff(part, starttimestamptz, endtimestamptz)`](#::date_diffpart-starttimestamptz-endtimestamptz) | The number of [`part`](#docs:current:sql:functions:datepart) boundaries between `starttimestamptz` and `endtimestamptz` inclusive of the larger timestamp and exclusive of the smaller timestamp. |
| [`date_part([part, ...], timestamp)`](#date_partpart--timestamptz) | Get the listed [subfields](#docs:current:sql:functions:datepart) as a `struct`. The list must be constant. |
| [`date_part(part, timestamp)`](#::date_partpart-timestamptz) | Get [subfield](#docs:current:sql:functions:datepart) (equivalent to `extract`). |
| [`date_sub(part, starttimestamptz, endtimestamptz)`](#::date_subpart-starttimestamptz-endtimestamptz) | The signed length of the interval between `starttimestamptz` and `endtimestamptz`, truncated to whole multiples of [`part`](#docs:current:sql:functions:datepart). |
| [`date_trunc(part, timestamptz)`](#::date_truncpart-timestamptz) | Truncate to specified [precision](#docs:current:sql:functions:datepart). |
| [`epoch_ns(timestamptz)`](#::epoch_nstimestamptz) | Converts a timestamptz to nanoseconds since the epoch. |
| [`epoch_us(timestamptz)`](#::epoch_ustimestamptz) | Converts a timestamptz to microseconds since the epoch. |
| [`extract(field FROM timestamptz)`](#::extractfield-from-timestamptz) | Get [subfield](#docs:current:sql:functions:datepart) from a `TIMESTAMP WITH TIME ZONE`. |
| [`last_day(timestamptz)`](#::last_daytimestamptz) | The last day of the month. |
| [`make_timestamptz(bigint, bigint, bigint, bigint, bigint, double, string)`](#::make_timestamptzbigint-bigint-bigint-bigint-bigint-double-string) | The `TIMESTAMP WITH TIME ZONE` for the given parts and time zone. |
| [`make_timestamptz(bigint, bigint, bigint, bigint, bigint, double)`](#::make_timestamptzbigint-bigint-bigint-bigint-bigint-double) | The `TIMESTAMP WITH TIME ZONE` for the given parts in the current time zone. |
| [`make_timestamptz(microseconds)`](#::make_timestamptzmicroseconds) | The `TIMESTAMP WITH TIME ZONE` for the given µs since the epoch. |
| [`strftime(timestamptz, format)`](#::strftimetimestamptz-format) | Converts a `TIMESTAMP WITH TIME ZONE` value to string according to the [format string](#docs:current:sql:functions:dateformat::format-specifiers). |
| [`strptime(text, format)`](#::strptimetext-format) | Converts string to `TIMESTAMP WITH TIME ZONE` according to the [format string](#docs:current:sql:functions:dateformat::format-specifiers) if `%Z` is specified. |
| [`time_bucket(bucket_width, timestamptz[, offset])`](#time_bucketbucket_width-timestamptz-offset) | Truncate `timestamptz` to a grid of width `bucket_width`. The grid is anchored at `2000-01-01 00:00:00+00:00[ + offset]` when `bucket_width` is a number of months or coarser units, else `2000-01-03 00:00:00+00:00[ + offset]`. Note that `2000-01-03` is a Monday. |
| [`time_bucket(bucket_width, timestamptz[, origin])`](#time_bucketbucket_width-timestamptz-origin) | Truncate `timestamptz` to a grid of width `bucket_width`. The grid is anchored at the `origin` timestamp, which defaults to `2000-01-01 00:00:00+00:00` when `bucket_width` is a number of months or coarser units, else `2000-01-03 00:00:00+00:00`. Note that `2000-01-03` is a Monday. |
| [`time_bucket(bucket_width, timestamptz[, timezone])`](#time_bucketbucket_width-timestamptz-origin) | Truncate `timestamptz` to a grid of width `bucket_width`. The grid is anchored at the `origin` timestamp, which defaults to `2000-01-01 00:00:00` in the provided `timezone` when `bucket_width` is a number of months or coarser units, else `2000-01-03 00:00:00` in the provided `timezone`. The default timezone is `'UTC'`. Note that `2000-01-03` is a Monday. |



###### `age(timestamptz, timestamptz)` {#docs:current:sql:functions:timestamptz::agetimestamptz-timestamptz}



|   |   |
|:--|:--------|
| **Description** |Subtract arguments, resulting in the time difference between the two timestamps. |
| **Example** | `age(TIMESTAMPTZ '2001-04-10', TIMESTAMPTZ '1992-09-20')` |
| **Result** | `8 years 6 months 20 days` |

###### `age(timestamptz)` {#docs:current:sql:functions:timestamptz::agetimestamptz}



|   |   |
|:--|:--------|
| **Description** |Subtract from current_date. |
| **Example** | `age(TIMESTAMP '1992-09-20')` |
| **Result** | `29 years 1 month 27 days 12:39:00.844` |

###### `date_diff(part, starttimestamptz, endtimestamptz)` {#docs:current:sql:functions:timestamptz::date_diffpart-starttimestamptz-endtimestamptz}



|   |   |
|:--|:--------|
| **Description** |The signed number of [`part`](#docs:current:sql:functions:datepart) boundaries between `starttimestamptz` and `endtimestamptz`, inclusive of the larger timestamp and exclusive of the smaller timestamp. |
| **Example** | `date_diff('hour', TIMESTAMPTZ '1992-09-30 23:59:59', TIMESTAMPTZ '1992-10-01 01:58:00')` |
| **Result** | `2` |

###### `date_part([part, ...], timestamptz)` {#docs:current:sql:functions:timestamptz::date_partpart--timestamptz}



|   |   |
|:--|:--------|
| **Description** |Get the listed [subfields](#docs:current:sql:functions:datepart) as a `struct`. The list must be constant. |
| **Example** | `date_part(['year', 'month', 'day'], TIMESTAMPTZ '1992-09-20 20:38:40-07')` |
| **Result** | `{year: 1992, month: 9, day: 20}` |

###### `date_part(part, timestamptz)` {#docs:current:sql:functions:timestamptz::date_partpart-timestamptz}



|   |   |
|:--|:--------|
| **Description** |Get [subfield](#docs:current:sql:functions:datepart) (equivalent to *extract*). |
| **Example** | `date_part('minute', TIMESTAMPTZ '1992-09-20 20:38:40')` |
| **Result** | `38` |

###### `date_sub(part, starttimestamptz, endtimestamptz)` {#docs:current:sql:functions:timestamptz::date_subpart-starttimestamptz-endtimestamptz}



|   |   |
|:--|:--------|
| **Description** |The signed length of the interval between `starttimestamptz` and `endtimestamptz`, truncated to whole multiples of [`part`](#docs:current:sql:functions:datepart). |
| **Example** | `date_sub('hour', TIMESTAMPTZ '1992-09-30 23:59:59', TIMESTAMPTZ '1992-10-01 01:58:00')` |
| **Result** | `1` |

###### `date_trunc(part, timestamptz)` {#docs:current:sql:functions:timestamptz::date_truncpart-timestamptz}



|   |   |
|:--|:--------|
| **Description** |Truncate to specified [precision](#docs:current:sql:functions:datepart). |
| **Example** | `date_trunc('hour', TIMESTAMPTZ '1992-09-20 20:38:40')` |
| **Result** | `1992-09-20 20:00:00` |

###### `epoch_ns(timestamptz)` {#docs:current:sql:functions:timestamptz::epoch_nstimestamptz}



|   |   |
|:--|:--------|
| **Description** |Converts a timestamptz to nanoseconds since the epoch. |
| **Example** | `epoch_ns('2022-11-07 08:43:04.123456+00'::TIMESTAMPTZ);` |
| **Result** | `1667810584123456000` |

###### `epoch_us(timestamptz)` {#docs:current:sql:functions:timestamptz::epoch_ustimestamptz}



|   |   |
|:--|:--------|
| **Description** |Converts a timestamptz to microseconds since the epoch. |
| **Example** | `epoch_us('2022-11-07 08:43:04.123456+00'::TIMESTAMPTZ);` |
| **Result** | `1667810584123456` |

###### `extract(field FROM timestamptz)` {#docs:current:sql:functions:timestamptz::extractfield-from-timestamptz}



|   |   |
|:--|:--------|
| **Description** |Get [subfield](#docs:current:sql:functions:datepart) from a `TIMESTAMP WITH TIME ZONE`. |
| **Example** | `extract('hour' FROM TIMESTAMPTZ '1992-09-20 20:38:48')` |
| **Result** | `20` |

###### `last_day(timestamptz)` {#docs:current:sql:functions:timestamptz::last_daytimestamptz}



|   |   |
|:--|:--------|
| **Description** |The last day of the month. |
| **Example** | `last_day(TIMESTAMPTZ '1992-03-22 01:02:03.1234')` |
| **Result** | `1992-03-31` |

###### `make_timestamptz(bigint, bigint, bigint, bigint, bigint, double, string)` {#docs:current:sql:functions:timestamptz::make_timestamptzbigint-bigint-bigint-bigint-bigint-double-string}



|   |   |
|:--|:--------|
| **Description** |The `TIMESTAMP WITH TIME ZONE` for the given parts and time zone. |
| **Example** | `make_timestamptz(1992, 9, 20, 15, 34, 27.123456, 'CET')` |
| **Result** | `1992-09-20 06:34:27.123456-07` |

###### `make_timestamptz(bigint, bigint, bigint, bigint, bigint, double)` {#docs:current:sql:functions:timestamptz::make_timestamptzbigint-bigint-bigint-bigint-bigint-double}



|   |   |
|:--|:--------|
| **Description** |The `TIMESTAMP WITH TIME ZONE` for the given parts in the current time zone. |
| **Example** | `make_timestamptz(1992, 9, 20, 13, 34, 27.123456)` |
| **Result** | `1992-09-20 13:34:27.123456-07` |

###### `make_timestamptz(microseconds)` {#docs:current:sql:functions:timestamptz::make_timestamptzmicroseconds}



|   |   |
|:--|:--------|
| **Description** |The `TIMESTAMP WITH TIME ZONE` for the given µs since the epoch. |
| **Example** | `make_timestamptz(1667810584123456)` |
| **Result** | `2022-11-07 16:43:04.123456-08` |

###### `strftime(timestamptz, format)` {#docs:current:sql:functions:timestamptz::strftimetimestamptz-format}



|   |   |
|:--|:--------|
| **Description** |Converts a `TIMESTAMP WITH TIME ZONE` value to string according to the [format string](#docs:current:sql:functions:dateformat::format-specifiers). |
| **Example** | `strftime(timestamptz '1992-01-01 20:38:40', '%a, %-d %B %Y - %I:%M:%S %p')` |
| **Result** | `Wed, 1 January 1992 - 08:38:40 PM` |

###### `strptime(text, format)` {#docs:current:sql:functions:timestamptz::strptimetext-format}



|   |   |
|:--|:--------|
| **Description** |Converts string to `TIMESTAMP WITH TIME ZONE` according to the [format string](#docs:current:sql:functions:dateformat::format-specifiers) if `%Z` is specified. |
| **Example** | `strptime('Wed, 1 January 1992 - 08:38:40 PST', '%a, %-d %B %Y - %H:%M:%S %Z')` |
| **Result** | `1992-01-01 08:38:40-08` |

###### `time_bucket(bucket_width, timestamptz[, offset])` {#docs:current:sql:functions:timestamptz::time_bucketbucket_width-timestamptz-offset}



|   |   |
|:--|:--------|
| **Description** |Truncate `timestamptz` to a grid of width `bucket_width`. The grid is anchored at `2000-01-01 00:00:00+00:00[ + offset]` when `bucket_width` is a number of months or coarser units, else `2000-01-03 00:00:00+00:00[ + offset]`. Note that `2000-01-03` is a Monday. |
| **Example** | `time_bucket(INTERVAL '10 minutes', TIMESTAMPTZ '1992-04-20 15:26:00-07', INTERVAL '5 minutes')` |
| **Result** | `1992-04-20 15:25:00-07` |

###### `time_bucket(bucket_width, timestamptz[, origin])` {#docs:current:sql:functions:timestamptz::time_bucketbucket_width-timestamptz-origin}



|   |   |
|:--|:--------|
| **Description** |Truncate `timestamptz` to a grid of width `bucket_width`. The grid is anchored at the `origin` timestamp, which defaults to `2000-01-01 00:00:00+00:00` when `bucket_width` is a number of months or coarser units, else `2000-01-03 00:00:00+00:00`. Note that `2000-01-03` is a Monday. |
| **Example** | `time_bucket(INTERVAL '2 weeks', TIMESTAMPTZ '1992-04-20 15:26:00-07', TIMESTAMPTZ '1992-04-01 00:00:00-07')` |
| **Result** | `1992-04-15 00:00:00-07` |

###### `time_bucket(bucket_width, timestamptz[, timezone])` {#docs:current:sql:functions:timestamptz::time_bucketbucket_width-timestamptz-timezone}



|   |   |
|:--|:--------|
| **Description** |Truncate `timestamptz` to a grid of width `bucket_width`. The grid is anchored at the `origin` timestamp, which defaults to `2000-01-01 00:00:00` in the provided `timezone` when `bucket_width` is a number of months or coarser units, else `2000-01-03 00:00:00` in the provided `timezone`. The default timezone is `'UTC'`. Note that `2000-01-03` is a Monday. |
| **Example** | `time_bucket(INTERVAL '2 days', TIMESTAMPTZ '1992-04-20 15:26:00-07', 'Europe/Berlin')` |
| **Result** | `1992-04-19 15:00:00-07` (=`1992-04-20 00:00:00 Europe/Berlin`) |

There are also dedicated extraction functions to get the [subfields](#docs:current:sql:functions:datepart).

#### ICU Timestamp Table Functions {#docs:current:sql:functions:timestamptz::icu-timestamp-table-functions}

The table below shows the available table functions for `TIMESTAMP WITH TIME ZONE` types.

| Name | Description |
|:--|:-------|
| [`generate_series(timestamptz, timestamptz, interval)`](#::generate_seriestimestamptz-timestamptz-interval) | Generate a table of timestamps in the closed range (including both the starting timestamp and the ending timestamp), stepping by the interval. |
| [`range(timestamptz, timestamptz, interval)`](#::rangetimestamptz-timestamptz-interval) | Generate a table of timestamps in the half open range (including the starting timestamp, but stopping before the ending timestamp), stepping by the interval. |

> Infinite values are not allowed as table function bounds.

###### `generate_series(timestamptz, timestamptz, interval)` {#docs:current:sql:functions:timestamptz::generate_seriestimestamptz-timestamptz-interval}



|   |   |
|:--|:--------|
| **Description** |Generate a table of timestamps in the closed range (including both the starting timestamp and the ending timestamp), stepping by the interval. |
| **Example** | `generate_series(TIMESTAMPTZ '2001-04-10', TIMESTAMPTZ '2001-04-11', INTERVAL 30 MINUTE)` |

###### `range(timestamptz, timestamptz, interval)` {#docs:current:sql:functions:timestamptz::rangetimestamptz-timestamptz-interval}



|   |   |
|:--|:--------|
| **Description** |Generate a table of timestamps in the half open range (including the starting timestamp, but stopping before the ending timestamp), stepping by the interval. |
| **Example** | `range(TIMESTAMPTZ '2001-04-10', TIMESTAMPTZ '2001-04-11', INTERVAL 30 MINUTE)` |

#### ICU Timestamp Without Time Zone Functions {#docs:current:sql:functions:timestamptz::icu-timestamp-without-time-zone-functions}

The table below shows the ICU provided scalar functions that operate on plain `TIMESTAMP` values.
These functions assume that the `TIMESTAMP` is a “local timestamp”.

A local timestamp is effectively a way of encoding the part values from a time zone into a single value.
They should be used with caution because the produced values can contain gaps and ambiguities thanks to daylight savings time.
Often the same functionality can be implemented more reliably using the `struct` variant of the `date_part` function.

| Name | Description |
|:--|:-------|
| [`current_localtime()`](#::current_localtime) | Returns a `TIME` whose GMT bin values correspond to local time in the current time zone. |
| [`current_localtimestamp()`](#::current_localtimestamp) | Returns a `TIMESTAMP` whose GMT bin values correspond to local date and time in the current time zone. |
| [`localtime`](#::localtime) | Synonym for the `current_localtime()` function call. |
| [`localtimestamp`](#::localtimestamp) | Synonym for the `current_localtimestamp()` function call. |
| [`timezone(text, timestamp)`](#::timezonetext-timestamp) | Use the [date parts](#docs:current:sql:functions:datepart) of the timestamp in GMT to construct a timestamp in the given time zone. Effectively, the argument is a “local” time. |
| [`timezone(text, timestamptz)`](#::timezonetext-timestamptz) | Use the [date parts](#docs:current:sql:functions:datepart) of the timestamp in the given time zone to construct a timestamp. Effectively, the result is a “local” time. |

###### `current_localtime()` {#docs:current:sql:functions:timestamptz::current_localtime}



|   |   |
|:--|:--------|
| **Description** |Returns a `TIME` whose GMT bin values correspond to local time in the current time zone. |
| **Example** | `current_localtime()` |
| **Result** | `08:47:56.497` |

###### `current_localtimestamp()` {#docs:current:sql:functions:timestamptz::current_localtimestamp}



|   |   |
|:--|:--------|
| **Description** |Returns a `TIMESTAMP` whose GMT bin values correspond to local date and time in the current time zone. |
| **Example** | `current_localtimestamp()` |
| **Result** | `2022-12-17 08:47:56.497` |

###### `localtime` {#docs:current:sql:functions:timestamptz::localtime}



|   |   |
|:--|:--------|
| **Description** |Synonym for the `current_localtime()` function call. |
| **Example** | `localtime` |
| **Result** | `08:47:56.497` |

###### `localtimestamp` {#docs:current:sql:functions:timestamptz::localtimestamp}



|   |   |
|:--|:--------|
| **Description** |Synonym for the `current_localtimestamp()` function call. |
| **Example** | `localtimestamp` |
| **Result** | `2022-12-17 08:47:56.497` |

###### `timezone(text, timestamp)` {#docs:current:sql:functions:timestamptz::timezonetext-timestamp}



|   |   |
|:--|:--------|
| **Description** |Use the [date parts](#docs:current:sql:functions:datepart) of the timestamp in GMT to construct a timestamp in the given time zone. Effectively, the argument is a “local” time. |
| **Example** | `timezone('America/Denver', TIMESTAMP '2001-02-16 20:38:40')` |
| **Result** | `2001-02-16 19:38:40-08` |

###### `timezone(text, timestamptz)` {#docs:current:sql:functions:timestamptz::timezonetext-timestamptz}



|   |   |
|:--|:--------|
| **Description** |Use the [date parts](#docs:current:sql:functions:datepart) of the timestamp in the given time zone to construct a timestamp. Effectively, the result is a “local” time. |
| **Example** | `timezone('America/Denver', TIMESTAMPTZ '2001-02-16 20:38:40-05')` |
| **Result** | `2001-02-16 18:38:40` |

#### At Time Zone {#docs:current:sql:functions:timestamptz::at-time-zone}

The `AT TIME ZONE` syntax is syntactic sugar for the (two argument) `timezone` function listed above:

```sql
SELECT TIMESTAMP '2001-02-16 20:38:40' AT TIME ZONE 'America/Denver' AS ts;
```

```text
2001-02-16 19:38:40-08
```

```sql
SELECT TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40-05' AT TIME ZONE 'America/Denver' AS ts;
```

```text
2001-02-16 18:38:40
```

Note that numeric timezones are not allowed:

```sql
SELECT TIMESTAMP '2001-02-16 20:38:40-05' AT TIME ZONE '0200' AS ts;
```

```console
Not implemented Error: Unknown TimeZone '0200'
```

#### Infinities {#docs:current:sql:functions:timestamptz::infinities}

Functions applied to infinite dates will either return the same infinite dates
(e.g., `greatest`) or `NULL` (e.g., `date_part`) depending on what “makes sense”.
In general, if the function needs to examine the parts of the infinite temporal value,
the result will be `NULL`.

#### Calendars {#docs:current:sql:functions:timestamptz::calendars}

The ICU extension also supports [non-Gregorian calendars](#docs:current:sql:data_types:timestamp::calendar-support).
If such a calendar is current, then the display and binning operations will use that calendar.


##### Daylight Saving Time (DST) Transitions {#docs:current:sql:functions:timestamptz::daylight-saving-time-dst-transitions}

When adding calendar intervals such as `INTERVAL '1 day'` to a
`TIMESTAMPTZ`, the resulting local timestamp may fall on a
non-existent time during daylight saving time transitions.

DuckDB follows PostgreSQL behavior and adjusts the result forward
to the next valid timestamp.

Example:

```sql
SET timezone = 'Europe/Amsterdam';

SELECT TIMESTAMPTZ '2025-03-29 02:30:00+01' + INTERVAL '1 day';
```

```text
2025-03-30 03:30:00+02
```

### Union Functions {#docs:current:sql:functions:union}



| Name | Description |
|:--|:-------|
| [`union.tag`](#::uniontag) | Dot notation serves as an alias for `union_extract`. |
| [`union_extract(union, 'tag')`](#::union_extractunion-tag) | Extract the value with the named tags from the union. `NULL` if the tag is not currently selected. |
| [`union_value(tag := any)`](#::union_valuetag--any) | Create a single member `UNION` containing the argument value. The tag of the value will be the bound variable name. |
| [`union_tag(union)`](#::union_tagunion) | Retrieve the currently selected tag of the union as an [Enum](#docs:current:sql:data_types:enum). |

###### `union.tag` {#docs:current:sql:functions:union::uniontag}



|   |   |
|:--|:--------|
| **Description** |Dot notation serves as an alias for `union_extract`. |
| **Example** | `(union_value(k := 'hello')).k` |
| **Result** | `string` |

###### `union_extract(union, 'tag')` {#docs:current:sql:functions:union::union_extractunion-tag}



|   |   |
|:--|:--------|
| **Description** |Extract the value with the named tags from the union. `NULL` if the tag is not currently selected. |
| **Example** | `union_extract(s, 'k')` |
| **Result** | `hello` |

###### `union_value(tag := any)` {#docs:current:sql:functions:union::union_valuetag--any}



|   |   |
|:--|:--------|
| **Description** |Create a single member `UNION` containing the argument value. The tag of the value will be the bound variable name. |
| **Example** | `union_value(k := 'hello')` |
| **Result** | `'hello'::UNION(k VARCHAR)` |

###### `union_tag(union)` {#docs:current:sql:functions:union::union_tagunion}



|   |   |
|:--|:--------|
| **Description** |Retrieve the currently selected tag of the union as an [Enum](#docs:current:sql:data_types:enum). |
| **Example** | `union_tag(union_value(k := 'foo'))` |
| **Result** | `'k'` |

### Utility Functions {#docs:current:sql:functions:utility}



#### Scalar Utility Functions {#docs:current:sql:functions:utility::scalar-utility-functions}

The functions below are difficult to categorize into specific function types and are broadly useful.

| Name | Description |
|:--|:-------|
| [`alias(column)`](#::aliascolumn) | Return the name of the column. |
| [`can_cast_implicitly(source_value, target_value)`](#::can_cast_implicitlysource_value-target_value) | Whether or not we can implicitly cast from the types of the source value to the target value. |
| [`checkpoint(database)`](#::checkpointdatabase) | Synchronize WAL with file for (optional) database without interrupting transactions. |
| [`coalesce(expr, ...)`](#::coalesceexpr-) | Return the first expression that evaluates to a non-`NULL` value. Accepts 1 or more parameters. Each expression can be a column, literal value, function result, or many others. |
| [`constant_or_null(arg1, arg2)`](#::constant_or_nullarg1-arg2) | If `arg2` is `NULL`, return `NULL`. Otherwise, return `arg1`. |
| [`count_if(x)`](#::count_ifx) | Aggregate function; rows contribute 1 if `x` is `true` or a non-zero number, else 0. |
| [`create_sort_key(parameters...)`](#::create_sort_keyparameters) | Constructs a binary-comparable sort key based on a set of input parameters and sort qualifiers. |
| [`current_catalog()`](#::current_catalog) | Return the name of the currently active catalog. Default is memory. |
| [`current_database()`](#::current_database) | Return the name of the currently active database. |
| [`current_query()`](#::current_query) | Return the current query as a string. |
| [`current_schema()`](#::current_schema) | Return the name of the currently active schema. Default is main. |
| [`current_schemas(boolean)`](#::current_schemasboolean) | Return list of schemas. Pass a parameter of `true` to include implicit schemas. |
| [`current_setting('setting_name')`](#::current_settingsetting_name) | Return the current value of the configuration setting. |
| [`currval('sequence_name')`](#::currvalsequence_name) | Return the current value of the sequence. Note that `nextval` must be called at least once prior to calling `currval`. |
| [`error(message)`](#::errormessage) | Throws the given error `message`. |
| [`equi_width_bins(min, max, bincount, nice := false)`](#::equi_width_binsmin-max-bincount-nice--false) | Returns the upper boundaries of a partition of the interval `[min, max]` into `bin_count` equal-sized subintervals (for use with, e.g., [`histogram`](#docs:current:sql:functions:aggregates::histogramargboundaries)). If `nice = true`, then `min`, `max` and `bincount` may be adjusted to produce more aesthetically pleasing results. |
| [`force_checkpoint(database)`](#::force_checkpointdatabase) | Synchronize WAL with file for (optional) database interrupting transactions. |
| [`gen_random_uuid()`](#::gen_random_uuid) | Return a random UUID similar to this: `eeccb8c5-9943-b2bb-bb5e-222f4e14b687`. |
| [`getenv(var)`](#::getenvvar) | Returns the value of the environment variable `var`. Only available in the [command line client](#docs:current:clients:cli:overview). |
| [`hash(value)`](#::hashvalue) | Returns a `UBIGINT` with a hash of `value`. The used hash function may change across DuckDB versions.|
| [`icu_sort_key(string, collator)`](#::icu_sort_keystring-collator) | Surrogate [sort key](https://unicode-org.github.io/icu/userguide/collation/architecture.html#sort-keys) used to sort special characters according to the specific locale. Collator parameter is optional. Only available when the ICU extension is installed. |
| [`if(a, b, c)`](#::ifa-b-c) | Ternary conditional operator. |
| [`ifnull(expr, other)`](#::ifnullexpr-other) | A two-argument version of coalesce. |
| [`is_histogram_other_bin(arg)`](#::is_histogram_other_binarg) | Returns `true` when `arg` is the "catch-all element" of its datatype for the purpose of the [`histogram_exact`](#docs:current:sql:functions:aggregates::histogram_exactargelements) function, which is equal to the "right-most boundary" of its datatype for the purpose of the [`histogram`](#docs:current:sql:functions:aggregates::histogramargboundaries) function. |
| [`md5(string)`](#::md5string) | Returns the MD5 hash of the `string` as a `VARCHAR`. |
| [`md5_number(string)`](#::md5_numberstring) | Returns the MD5 hash of the `string` as a `UHUGEINT`. |
| [`md5_number_lower(string)`](#::md5_number_lowerstring) | Returns the lower 64-bit segment of the MD5 hash of the `string` as a `UBIGINT`. |
| [`md5_number_upper(string)`](#::md5_number_upperstring) | Returns the upper 64-bit segment of the MD5 hash of the `string` as a `UBIGINT`. |
| [`nextval('sequence_name')`](#::nextvalsequence_name) | Return the following value of the sequence. |
| [`nullif(a, b)`](#::nullifa-b) | Return `NULL` if `a = b`, else return `a`. Equivalent to `CASE WHEN a = b THEN NULL ELSE a END`. |
| [`parse_formatted_bytes(string)`](#::parse_formatted_bytesstring) | Parse a human-readable byte size string (e.g., `'16 KiB'`) into a `UBIGINT` number of bytes. Throws an error on invalid input. |
| [`pg_typeof(expression)`](#::pg_typeofexpression) | Returns the lower case name of the data type of the result of the expression. For PostgreSQL compatibility. |
| [`query(` *`query_string`*`)`](#::queryquery_string) | Table function that parses and executes the query defined in *`query_string`*. Only constant strings are allowed. Warning: this function allows invoking arbitrary queries, potentially altering the database state. |
| [`query_table(` *`tbl_name`*`)`](#::query_tabletbl_name) | Table function that returns the table given in *`tbl_name`*. |
| [`query_table(` *`tbl_names`*`, [`*`by_name`*`])`](#query_tabletbl_names-by_name) | Table function that returns the union of tables given in *`tbl_names`*. If the optional *`by_name`* parameter is set to `true`, it uses [`UNION ALL BY NAME`](#docs:current:sql:query_syntax:setops::union-all-by-name) semantics. |
| [`read_blob(source)`](#::read_blobsource) | Returns the content from `source` (a filename, a list of filenames, or a glob pattern) as a `BLOB`. See the [`read_blob` guide](#docs:current:guides:file_formats:read_file::read_blob) for more details. |
| [`read_text(source)`](#::read_textsource) | Returns the content from `source` (a filename, a list of filenames, or a glob pattern) as a `VARCHAR`. The file content is first validated to be valid UTF-8. If `read_text` attempts to read a file with invalid UTF-8 an error is thrown suggesting to use `read_blob` instead. See the [`read_text` guide](#docs:current:guides:file_formats:read_file::read_text) for more details. |
| [`sha1(string)`](#::sha1string) | Returns a `VARCHAR` with the SHA-1 hash of the `string`. |
| [`sha256(string)`](#::sha256string) | Returns a `VARCHAR` with the SHA-256 hash of the `string`. |
| [`sleep_ms(milliseconds)`](#::sleep_msmilliseconds) | Pause execution for the specified number of milliseconds. Returns `NULL`. |
| [`stats(expression)`](#::statsexpression) | Returns a string with statistics about the expression. Expression can be a column, constant, or SQL expression. |
| [`txid_current()`](#::txid_current) | Returns the current transaction's identifier, a `BIGINT` value. It will assign a new one if the current transaction does not have one already. |
| [`typeof(expression)`](#::typeofexpression) | Returns the name of the data type of the result of the expression. |
| [`uuid()`](#::uuid) | Return a random UUID (UUIDv4) similar to this: `eeccb8c5-9943-b2bb-bb5e-222f4e14b687`. |
| [`uuidv4()`](#::uuidv4) | Return a random UUID (UUIDv4) similar to this: `eeccb8c5-9943-b2bb-bb5e-222f4e14b687`. |
| [`uuidv7()`](#::uuidv7) | Return a random UUIDv7 similar to this: `81964ebe-00b1-7e1d-b0f9-43c29b6fb8f5`. |
| [`uuid_extract_timestamp(uuidv7)`](#::uuid_extract_timestampuuidv7) | Extracts `TIMESTAMP WITH TIME ZONE` from a UUIDv7 value. |
| [`uuid_extract_version(uuid)`](#::uuid_extract_versionuuid) | Extracts UUID version (` 4` or `7`). |
| [`version()`](#::version) | Return the currently active version of DuckDB in this format. |

###### `alias(column)` {#docs:current:sql:functions:utility::aliascolumn}



|   |   |
|:--|:--------|
| **Description** |Return the name of the column. |
| **Example** | `alias(column1)` |
| **Result** | `column1` |

###### `can_cast_implicitly(source_value, target_value)` {#docs:current:sql:functions:utility::can_cast_implicitlysource_value-target_value}



|   |   |
|:--|:--------|
| **Description** |Whether or not we can implicitly cast from the types of the source value to the target value. |
| **Example** | `can_cast_implicitly(1::BIGINT, 1::SMALLINT)` |
| **Result** | `false` |

###### `checkpoint(database)` {#docs:current:sql:functions:utility::checkpointdatabase}



|   |   |
|:--|:--------|
| **Description** |Synchronize WAL with file for (optional) database without interrupting transactions. |
| **Example** | `checkpoint(my_db)` |
| **Result** | success Boolean |

###### `coalesce(expr, ...)` {#docs:current:sql:functions:utility::coalesceexpr-}



|   |   |
|:--|:--------|
| **Description** |Return the first expression that evaluates to a non-`NULL` value. Accepts 1 or more parameters. Each expression can be a column, literal value, function result, or many others. |
| **Example** | `coalesce(NULL, NULL, 'default_string')` |
| **Result** | `default_string` |

###### `constant_or_null(arg1, arg2)` {#docs:current:sql:functions:utility::constant_or_nullarg1-arg2}



|   |   |
|:--|:--------|
| **Description** |If `arg2` is `NULL`, return `NULL`. Otherwise, return `arg1`. |
| **Example** | `constant_or_null(42, NULL)` |
| **Result** | `NULL` |

###### `count_if(x)` {#docs:current:sql:functions:utility::count_ifx}



|   |   |
|:--|:--------|
| **Description** |Aggregate function; rows contribute 1 if `x` is `true` or a non-zero number, else 0. |
| **Example** | `count_if(42)` |
| **Result** | 1 |

###### `create_sort_key(parameters...)` {#docs:current:sql:functions:utility::create_sort_keyparameters}



|   |   |
|:--|:--------|
| **Description** |Constructs a binary-comparable sort key based on a set of input parameters and sort qualifiers. |
| **Example** | `create_sort_key('abc', 'ASC NULLS FIRST');` |
| **Result** | `\x02bcd\x00` |

###### `current_catalog()` {#docs:current:sql:functions:utility::current_catalog}



|   |   |
|:--|:--------|
| **Description** |Return the name of the currently active catalog. Default is memory. |
| **Example** | `current_catalog()` |
| **Result** | `memory` |

###### `current_database()` {#docs:current:sql:functions:utility::current_database}



|   |   |
|:--|:--------|
| **Description** |Return the name of the currently active database. |
| **Example** | `current_database()` |
| **Result** | `memory` |

###### `current_query()` {#docs:current:sql:functions:utility::current_query}



|   |   |
|:--|:--------|
| **Description** |Return the current query as a string. |
| **Example** | `current_query()` |
| **Result** | `SELECT current_query();` |

###### `current_schema()` {#docs:current:sql:functions:utility::current_schema}



|   |   |
|:--|:--------|
| **Description** |Return the name of the currently active schema. Default is main. |
| **Example** | `current_schema()` |
| **Result** | `main` |

###### `current_schemas(boolean)` {#docs:current:sql:functions:utility::current_schemasboolean}



|   |   |
|:--|:--------|
| **Description** |Return list of schemas. Pass a parameter of `true` to include implicit schemas. |
| **Example** | `current_schemas(true)` |
| **Result** | `['temp', 'main', 'pg_catalog']` |

###### `current_setting('setting_name')` {#docs:current:sql:functions:utility::current_settingsetting_name}



|   |   |
|:--|:--------|
| **Description** |Return the current value of the configuration setting. |
| **Example** | `current_setting('access_mode')` |
| **Result** | `automatic` |

###### `currval('sequence_name')` {#docs:current:sql:functions:utility::currvalsequence_name}



|   |   |
|:--|:--------|
| **Description** |Return the current value of the sequence. Note that `nextval` must be called at least once prior to calling `currval`. |
| **Example** | `currval('my_sequence_name')` |
| **Result** | `1` |

###### `error(message)` {#docs:current:sql:functions:utility::errormessage}



|   |   |
|:--|:--------|
| **Description** |Throws the given error `message`. |
| **Example** | `error('access_mode')` |

###### `equi_width_bins(min, max, bincount, nice := false)` {#docs:current:sql:functions:utility::equi_width_binsmin-max-bincount-nice--false}



|   |   |
|:--|:--------|
| **Description** |Returns the upper boundaries of a partition of the interval `[min, max]` into `bin_count` equal-sized subintervals (for use with, e.g., [`histogram`](#docs:current:sql:functions:aggregates::histogramargboundaries)). If `nice = true`, then `min`, `max` and `bincount` may be adjusted to produce more aesthetically pleasing results.  |
| **Example** | `equi_width_bins(0.1, 2.7, 4, true)` |
| **Result** | `[0.5, 1.0, 1.5, 2.0, 2.5, 3.0]` |

###### `force_checkpoint(database)` {#docs:current:sql:functions:utility::force_checkpointdatabase}



|   |   |
|:--|:--------|
| **Description** |Synchronize WAL with file for (optional) database interrupting transactions. |
| **Example** | `force_checkpoint(my_db)` |
| **Result** | success Boolean |

###### `gen_random_uuid()` {#docs:current:sql:functions:utility::gen_random_uuid}



|   |   |
|:--|:--------|
| **Description** |Return a random UUID (UUIDv4) similar to this: `eeccb8c5-9943-b2bb-bb5e-222f4e14b687`. |
| **Example** | `gen_random_uuid()` |
| **Result** | various |

###### `getenv(var)` {#docs:current:sql:functions:utility::getenvvar}

|   |   |
|:--|:--------|
| **Description** |Returns the value of the environment variable `var`. Only available in the [command line client](#docs:current:clients:cli:overview). |
| **Example** | `getenv('HOME')` |
| **Result** | `/path/to/user/home` |

###### `hash(value)` {#docs:current:sql:functions:utility::hashvalue}



|   |   |
|:--|:--------|
| **Description** |Returns a `UBIGINT` with the hash of the `value`. The used hash function may change across DuckDB versions. |
| **Example** | `hash('🦆')` |
| **Result** | `2595805878642663834` |

###### `icu_sort_key(string, collator)` {#docs:current:sql:functions:utility::icu_sort_keystring-collator}



|   |   |
|:--|:--------|
| **Description** |Surrogate [sort key](https://unicode-org.github.io/icu/userguide/collation/architecture.html#sort-keys) used to sort special characters according to the specific locale. Collator parameter is optional. Only available when the ICU extension is installed. |
| **Example** | `icu_sort_key('ö', 'DE')` |
| **Result** | `460145960106` |

###### `if(a, b, c)` {#docs:current:sql:functions:utility::ifa-b-c}



|   |   |
|:--|:--------|
| **Description** |Ternary conditional operator; returns b if a, else returns c. Equivalent to `CASE WHEN a THEN b ELSE c END`. |
| **Example** | `if(2 > 1, 3, 4)` |
| **Result** | `3` |

###### `ifnull(expr, other)` {#docs:current:sql:functions:utility::ifnullexpr-other}



|   |   |
|:--|:--------|
| **Description** |A two-argument version of coalesce. |
| **Example** | `ifnull(NULL, 'default_string')` |
| **Result** | `default_string` |

###### `is_histogram_other_bin(arg)` {#docs:current:sql:functions:utility::is_histogram_other_binarg}



|   |   |
|:--|:--------|
| **Description** |Returns `true` when `arg` is the "catch-all element" of its datatype for the purpose of the [`histogram_exact`](#docs:current:sql:functions:aggregates::histogram_exactargelements) function, which is equal to the "right-most boundary" of its datatype for the purpose of the [`histogram`](#docs:current:sql:functions:aggregates::histogramargboundaries) function. |
| **Example** | `is_histogram_other_bin('')` |
| **Result** | `true` |

###### `md5(string)` {#docs:current:sql:functions:utility::md5string}



|   |   |
|:--|:--------|
| **Description** |Returns the MD5 hash of the `string` as a `VARCHAR`. |
| **Example** | `md5('abc')` |
| **Result** | `900150983cd24fb0d6963f7d28e17f72` |

###### `md5_number(string)` {#docs:current:sql:functions:utility::md5_numberstring}



|   |   |
|:--|:--------|
| **Description** |Returns the MD5 hash of the `string` as a `UHUGEINT`. |
| **Example** | `md5_number('abc')` |
| **Result** | `152195979970564155685860391459828531600` |

###### `md5_number_lower(string)` {#docs:current:sql:functions:utility::md5_number_lowerstring}



|   |   |
|:--|:--------|
| **Description** |Returns the lower 8 bytes of the MD5 hash of `string` as a `UBIGINT`. |
| **Example** | `md5_number_lower('abc')` |
| **Result** | `8250560606382298838` |

###### `md5_number_upper(string)` {#docs:current:sql:functions:utility::md5_number_upperstring}



|   |   |
|:--|:--------|
| **Description** |Returns the upper 8 bytes of the MD5 hash of `string` as a `UBIGINT`. |
| **Example** | `md5_number_upper('abc')` |
| **Result** | `12704604231530709392` |

###### `nextval('sequence_name')` {#docs:current:sql:functions:utility::nextvalsequence_name}



|   |   |
|:--|:--------|
| **Description** |Return the following value of the sequence. |
| **Example** | `nextval('my_sequence_name')` |
| **Result** | `2` |

###### `nullif(a, b)` {#docs:current:sql:functions:utility::nullifa-b}



|   |   |
|:--|:--------|
| **Description** |Return `NULL` if a = b, else return a. Equivalent to `CASE WHEN a = b THEN NULL ELSE a END`. |
| **Example** | `nullif(1+1, 2)` |
| **Result** | `NULL` |

###### `parse_formatted_bytes(string)` {#docs:current:sql:functions:utility::parse_formatted_bytesstring}



|   |   |
|:--|:--------|
| **Description** |Parse a human-readable byte size string (e.g., `'16 KiB'`) into a `UBIGINT` number of bytes. Throws an error on invalid input. |
| **Example** | `parse_formatted_bytes('1.5 GiB')` |
| **Result** | `1610612736` |

###### `pg_typeof(expression)` {#docs:current:sql:functions:utility::pg_typeofexpression}



|   |   |
|:--|:--------|
| **Description** |Returns the lower case name of the data type of the result of the expression. For PostgreSQL compatibility. |
| **Example** | `pg_typeof('abc')` |
| **Result** | `varchar` |

###### `query(query_string)` {#docs:current:sql:functions:utility::queryquery_string}



|   |   |
|:--|:--------|
| **Description** |Table function that parses and executes the query defined in `query_string`. Only constant strings are allowed. Warning: this function allows invoking arbitrary queries, potentially altering the database state. |
| **Example** | `query('SELECT 42 AS x')` |
| **Result** | `42` |

###### `query_table(tbl_name)` {#docs:current:sql:functions:utility::query_tabletbl_name}



|   |   |
|:--|:--------|
| **Description** |Table function that returns the table given in `tbl_name`. |
| **Example** | `query_table('t1')` |
| **Result** | (the rows of `t1`) |

###### `query_table(tbl_names, [by_name])` {#docs:current:sql:functions:utility::query_tabletbl_names-by_name}



|   |   |
|:--|:--------|
| **Description** |Table function that returns the union of tables given in `tbl_names`. If the optional `by_name` parameter is set to `true`, it uses [`UNION ALL BY NAME`](#docs:current:sql:query_syntax:setops::union-all-by-name) semantics. |
| **Example** | `query_table(['t1', 't2'])` |
| **Result** | (the union of the two tables) |

###### `read_blob(source)` {#docs:current:sql:functions:utility::read_blobsource}



|   |   |
|:--|:--------|
| **Description** |Returns the content from `source` (a filename, a list of filenames, or a glob pattern) as a `BLOB`. See the [`read_blob` guide](#docs:current:guides:file_formats:read_file::read_blob) for more details. |
| **Example** | `read_blob('hello.bin')` |
| **Result** | `hello\x0A` |

###### `read_text(source)` {#docs:current:sql:functions:utility::read_textsource}



|   |   |
|:--|:--------|
| **Description** |Returns the content from `source` (a filename, a list of filenames, or a glob pattern) as a `VARCHAR`. The file content is first validated to be valid UTF-8. If `read_text` attempts to read a file with invalid UTF-8 an error is thrown suggesting to use `read_blob` instead. See the [`read_text` guide](#docs:current:guides:file_formats:read_file::read_text) for more details. |
| **Example** | `read_text('hello.txt')` |
| **Result** | `hello\n` |

###### `sha1(string)` {#docs:current:sql:functions:utility::sha1string}



|   |   |
|:--|:--------|
| **Description** |Returns a `VARCHAR` with the SHA-1 hash of the `string`. |
| **Example** | `sha1('🦆')` |
| **Result** | `949bf843dc338be348fb9525d1eb535d31241d76` |

###### `sha256(string)` {#docs:current:sql:functions:utility::sha256string}



|   |   |
|:--|:--------|
| **Description** |Returns a `VARCHAR` with the SHA-256 hash of the `string`. |
| **Example** | `sha256('🦆')` |
| **Result** | `d7a5c5e0d1d94c32218539e7e47d4ba9c3c7b77d61332fb60d633dde89e473fb` |

###### `sleep_ms(milliseconds)` {#docs:current:sql:functions:utility::sleep_msmilliseconds}



|   |   |
|:--|:--------|
| **Description** |Pause execution for the specified number of milliseconds. Returns `NULL`. |
| **Example** | `sleep_ms(500)` |
| **Result** | `NULL` |

###### `stats(expression)` {#docs:current:sql:functions:utility::statsexpression}



|   |   |
|:--|:--------|
| **Description** |Returns a string with statistics about the expression. Expression can be a column, constant, or SQL expression. |
| **Example** | `stats(5)` |
| **Result** | `'[Min: 5, Max: 5][Has Null: false]'` |

###### `txid_current()` {#docs:current:sql:functions:utility::txid_current}



|   |   |
|:--|:--------|
| **Description** |Returns the current transaction's identifier, a `BIGINT` value. It will assign a new one if the current transaction does not have one already. |
| **Example** | `txid_current()` |
| **Result** | various |

###### `typeof(expression)` {#docs:current:sql:functions:utility::typeofexpression}



|   |   |
|:--|:--------|
| **Description** |Returns the name of the data type of the result of the expression. |
| **Example** | `typeof('abc')` |
| **Result** | `VARCHAR` |

###### `uuid()` {#docs:current:sql:functions:utility::uuid}



|   |   |
|:--|:--------|
| **Description** |Return a random UUID (UUIDv4) similar to this: `eeccb8c5-9943-b2bb-bb5e-222f4e14b687`. |
| **Example** | `uuid()` |
| **Result** | various |

###### `uuidv4()` {#docs:current:sql:functions:utility::uuidv4}

|   |   |
|:--|:--------|
| **Description** |Return a random UUID (UUIDv4) similar to this: `eeccb8c5-9943-b2bb-bb5e-222f4e14b687`. |
| **Example** | `uuidv4()` |
| **Result** | various |

###### `uuidv7()` {#docs:current:sql:functions:utility::uuidv7}

|   |   |
|:--|:--------|
| **Description** |Return a random UUIDv7 similar to this: `81964ebe-00b1-7e1d-b0f9-43c29b6fb8f5`. |
| **Example** | `uuidv7()` |
| **Result** | various |

###### `uuid_extract_timestamp(uuidv7)` {#docs:current:sql:functions:utility::uuid_extract_timestampuuidv7}

|   |   |
|:--|:--------|
| **Description** |Extracts `TIMESTAMP WITH TIME ZONE` from a UUIDv7 value. |
| **Example** | `uuid_extract_timestamp(uuidv7())` |
| **Result** | `2025-04-19 15:51:20.07+00` |

###### `uuid_extract_version(uuid)` {#docs:current:sql:functions:utility::uuid_extract_versionuuid}

|   |   |
|:--|:--------|
| **Description** |Extracts UUID version (` 4` or `7`). |
| **Example** | `uuid_extract_version(uuidv7())` |
| **Result** | `7` |

###### `version()` {#docs:current:sql:functions:utility::version}



|   |   |
|:--|:--------|
| **Description** |Return the currently active version of DuckDB in this format. |
| **Example** | `version()` |
| **Result** | various |

#### Utility Table Functions {#docs:current:sql:functions:utility::utility-table-functions}

A [table function](#docs:current:sql:query_syntax:from::table-functions) is used in place of a table in a `FROM` clause.

| Name | Description |
|:--|:-------|
| [`glob(search_path)`](#::globsearch_path) | Return filenames found at the location indicated by the *search_path* in a single column named `file`. The *search_path* may contain [glob pattern matching syntax](#docs:current:sql:functions:pattern_matching). |
| [`repeat_row(varargs, num_rows)`](#::repeat_rowvarargs-num_rows) | Returns a table with `num_rows` rows, each containing the fields defined in `varargs`. |

###### `glob(search_path)` {#docs:current:sql:functions:utility::globsearch_path}



|   |   |
|:--|:--------|
| **Description** |Return filenames found at the location indicated by the *search_path* in a single column named `file`. The *search_path* may contain [glob pattern matching syntax](#docs:current:sql:functions:pattern_matching). |
| **Example** | `glob('*')` |
| **Result** | (table of filenames) |

###### `repeat_row(varargs, num_rows)` {#docs:current:sql:functions:utility::repeat_rowvarargs-num_rows}



|   |   |
|:--|:--------|
| **Description** |Returns a table with `num_rows` rows, each containing the fields defined in `varargs`. |
| **Example** | `repeat_row(1, 2, 'foo', num_rows = 3)` |
| **Result** | 3 rows of `1, 2, 'foo'` |

### Window Functions {#docs:current:sql:functions:window_functions}



DuckDB supports [window functions](https://en.wikipedia.org/wiki/Window_function_(SQL)), which can use multiple rows to calculate a value for each row.
Window functions are [blocking operators](#docs:current:guides:performance:how_to_tune_workloads::blocking-operators), i.e., they require their entire input to be buffered, making them one of the most memory-intensive operators in SQL.

Window functions are available in SQL since [SQL:2003](https://en.wikipedia.org/wiki/SQL:2003) and are supported by major SQL database systems.

#### Examples {#docs:current:sql:functions:window_functions::examples}

Generate a `row_number` column to enumerate rows:

```sql
SELECT row_number() OVER ()
FROM sales;
```

> **Tip.** If you only need a number for each row in a table, you can use the [`rowid` pseudocolumn](#docs:current:sql:statements:select::row-ids).

Generate a `row_number` column to enumerate rows, ordered by `time`:

```sql
SELECT row_number() OVER (ORDER BY time)
FROM sales;
```

Generate a `row_number` column to enumerate rows, ordered by `time` and partitioned by `region`:

```sql
SELECT row_number() OVER (PARTITION BY region ORDER BY time)
FROM sales;
```

Compute the difference between the current and the previous-by-`time` `amount`:

```sql
SELECT amount - lag(amount) OVER (ORDER BY time)
FROM sales;
```

Compute the percentage of the total `amount` of sales per `region` for each row:

```sql
SELECT amount / sum(amount) OVER (PARTITION BY region)
FROM sales;
```

#### Syntax {#docs:current:sql:functions:window_functions::syntax}



Window functions can only be used in the `SELECT` clause. To share `OVER` specifications between functions, use the statement's [`WINDOW` clause](#docs:current:sql:query_syntax:window) and use the `OVER ⟨window_name⟩`{:.language-sql .highlight} syntax.

#### General-Purpose Window Functions {#docs:current:sql:functions:window_functions::general-purpose-window-functions}

The table below shows the available general window functions.

| Name | Description |
|:--|:-------|
| [`cume_dist([ORDER BY ordering])`](#cume_distorder-by-ordering) | The cumulative distribution: (number of partition rows preceding or peer with current row) / total partition rows. |
| [`dense_rank()`](#::dense_rank) | The rank of the current row *without gaps;* this function counts peer groups. |
| [`fill(expr [ ORDER BY ordering])`](#fillexpr-order-by-ordering) | Fill in missing values using linear interpolation with `ORDER BY` as the X-axis. |
| [`first_value(expr[ ORDER BY ordering][ IGNORE NULLS])`](#first_valueexpr-order-by-ordering-ignore-nulls) | Returns `expr` evaluated at the row that is the first row (with a non-null value of `expr` if `IGNORE NULLS` is set) of the window frame. |
| [`lag(expr[, offset[, default]][ ORDER BY ordering][ IGNORE NULLS])`](#lagexpr-offset-default-order-by-ordering-ignore-nulls) | Returns `expr` evaluated at the row that is `offset` rows (among rows with a non-null value of `expr` if `IGNORE NULLS` is set) before the current row within the window frame; if there is no such row, instead return `default` (which must be of the same type as `expr`). Both `offset` and `default` are evaluated with respect to the current row. If omitted, `offset` defaults to `1` and default to `NULL`. |
| [`last_value(expr[ ORDER BY ordering][ IGNORE NULLS])`](#last_valueexpr-order-by-ordering-ignore-nulls) | Returns `expr` evaluated at the row that is the last row (among rows with a non-null value of `expr` if `IGNORE NULLS` is set) of the window frame. |
| [`lead(expr[, offset[, default]][ ORDER BY ordering][ IGNORE NULLS])`](#leadexpr-offset-default-order-by-ordering-ignore-nulls) | Returns `expr` evaluated at the row that is `offset` rows after the current row (among rows with a non-null value of `expr` if `IGNORE NULLS` is set) within the window frame; if there is no such row, instead return `default` (which must be of the same type as `expr`). Both `offset` and `default` are evaluated with respect to the current row. If omitted, `offset` defaults to `1` and default to `NULL`. |
| [`nth_value(expr, nth[ ORDER BY ordering][ IGNORE NULLS])`](#nth_valueexpr-nth-order-by-ordering-ignore-nulls) | Returns `expr` evaluated at the nth row (among rows with a non-null value of `expr` if `IGNORE NULLS` is set) of the window frame (counting from 1); `NULL` if no such row. |
| [`ntile(num_buckets[ ORDER BY ordering])`](#ntilenum_buckets-order-by-ordering) | An integer ranging from 1 to `num_buckets`, dividing the partition as equally as possible. |
| [`percent_rank([ORDER BY ordering])`](#percent_rankorder-by-ordering) | The relative rank of the current row: `(rank() - 1) / (total partition rows - 1)`. |
| [`rank([ORDER BY ordering])`](#rankorder-by-ordering) | The rank of the current row *with gaps;* same as `row_number` of its first peer. |
| [`row_number([ORDER BY ordering])`](#row_numberorder-by-ordering) | The number of the current row within the partition, counting from 1. |

###### `cume_dist([ORDER BY ordering])` {#docs:current:sql:functions:window_functions::cume_distorder-by-ordering}



|   |   |
|:--|:--------|
| **Description** |The cumulative distribution: (number of partition rows preceding or peer with current row) / total partition rows. If an `ORDER BY` clause is specified, the distribution is computed within the frame using the provided ordering instead of the frame ordering. |
| **Return type** | `DOUBLE` |
| **Example** | `cume_dist()` |

###### `dense_rank()` {#docs:current:sql:functions:window_functions::dense_rank}



|   |   |
|:--|:--------|
| **Description** |The rank of the current row *without gaps;* this function counts peer groups. |
| **Return type** | `BIGINT` |
| **Example** | `dense_rank()` |
| **Aliases** | `rank_dense()` |

###### `fill(expr[ ORDER BY ordering])` {#docs:current:sql:functions:window_functions::fillexpr-order-by-ordering}



|   |   |
|:--|:--------|
| **Description** |Replaces `NULL` values of `expr` with a linear interpolation based on the closest non-`NULL` values and the sort values. Both values must support arithmetic and there must be only one ordering key. For missing values at the ends, linear extrapolation is used. Failure to interpolate results in the `NULL` value being retained. |
| **Return type** | Same type as `expr` |
| **Example** | `fill(column)` |

###### `first_value(expr[ ORDER BY ordering][ IGNORE NULLS])` {#docs:current:sql:functions:window_functions::first_valueexpr-order-by-ordering-ignore-nulls}



|   |   |
|:--|:--------|
| **Description** |Returns `expr` evaluated at the row that is the first row (with a non-null value of `expr` if `IGNORE NULLS` is set) of the window frame. If an `ORDER BY` clause is specified, the first row number is computed within the frame using the provided ordering instead of the frame ordering. |
| **Return type** | Same type as `expr` |
| **Example** | `first_value(column)` |

###### `lag(expr[, offset[, default]][ ORDER BY ordering][ IGNORE NULLS])` {#docs:current:sql:functions:window_functions::lagexpr-offset-default-order-by-ordering-ignore-nulls}



|   |   |
|:--|:--------|
| **Description** |Returns `expr` evaluated at the row that is `offset` rows (among rows with a non-null value of `expr` if `IGNORE NULLS` is set) before the current row within the window frame; if there is no such row, instead return `default` (which must be of the same type as `expr`). Both `offset` and `default` are evaluated with respect to the current row. If omitted, `offset` defaults to `1` and default to `NULL`. If an `ORDER BY` clause is specified, the lagged row number is computed within the frame using the provided ordering instead of the frame ordering. |
| **Return type** | Same type as `expr` |
| **Example** | `lag(column, 3, 0)` |

###### `last_value(expr[ ORDER BY ordering][ IGNORE NULLS])` {#docs:current:sql:functions:window_functions::last_valueexpr-order-by-ordering-ignore-nulls}



|   |   |
|:--|:--------|
| **Description** |Returns `expr` evaluated at the row that is the last row (among rows with a non-null value of `expr` if `IGNORE NULLS` is set) of the window frame. If omitted, `offset` defaults to `1` and default to `NULL`. If an `ORDER BY` clause is specified, the last row is determined within the frame using the provided ordering instead of the frame ordering. |
| **Return type** | Same type as `expr` |
| **Example** | `last_value(column)` |

###### `lead(expr[, offset[, default]][ ORDER BY ordering][ IGNORE NULLS])` {#docs:current:sql:functions:window_functions::leadexpr-offset-default-order-by-ordering-ignore-nulls}



|   |   |
|:--|:--------|
| **Description** |Returns `expr` evaluated at the row that is `offset` rows after the current row (among rows with a non-null value of `expr` if `IGNORE NULLS` is set) within the window frame; if there is no such row, instead return `default` (which must be of the same type as `expr`). Both `offset` and `default` are evaluated with respect to the current row. If omitted, `offset` defaults to `1` and default to `NULL`. If an `ORDER BY` clause is specified, the leading row number is computed within the frame using the provided ordering instead of the frame ordering. |
| **Return type** | Same type as `expr` |
| **Example** | `lead(column, 3, 0)` |

###### `nth_value(expr, nth[ ORDER BY ordering][ IGNORE NULLS])` {#docs:current:sql:functions:window_functions::nth_valueexpr-nth-order-by-ordering-ignore-nulls}



|   |   |
|:--|:--------|
| **Description** |Returns `expr` evaluated at the nth row (among rows with a non-null value of `expr` if `IGNORE NULLS` is set) of the window frame (counting from 1); `NULL` if no such row. If an `ORDER BY` clause is specified, the nth row number is computed within the frame using the provided ordering instead of the frame ordering. |
| **Return type** | Same type as `expr` |
| **Example** | `nth_value(column, 2)` |

###### `ntile(num_buckets[ ORDER BY ordering])` {#docs:current:sql:functions:window_functions::ntilenum_buckets-order-by-ordering}



|   |   |
|:--|:--------|
| **Description** |An integer ranging from 1 to `num_buckets`, dividing the partition as equally as possible. If an `ORDER BY` clause is specified, the ntile is computed within the frame using the provided ordering instead of the frame ordering. |
| **Return type** | `BIGINT` |
| **Example** | `ntile(4)` |

###### `percent_rank([ORDER BY ordering])` {#docs:current:sql:functions:window_functions::percent_rankorder-by-ordering}



|   |   |
|:--|:--------|
| **Description** |The relative rank of the current row: `(rank() - 1) / (total partition rows - 1)`. If an `ORDER BY` clause is specified, the relative rank is computed within the frame using the provided ordering instead of the frame ordering. |
| **Return type** | `DOUBLE` |
| **Example** | `percent_rank()` |

###### `rank([ORDER BY ordering])` {#docs:current:sql:functions:window_functions::rankorder-by-ordering}



|   |   |
|:--|:--------|
| **Description** |The rank of the current row *with gaps*; same as `row_number` of its first peer. If an `ORDER BY` clause is specified, the rank is computed within the frame using the provided ordering instead of the frame ordering. |
| **Return type** | `BIGINT` |
| **Example** | `rank()` |

###### `row_number([ORDER BY ordering])` {#docs:current:sql:functions:window_functions::row_numberorder-by-ordering}



|   |   |
|:--|:--------|
| **Description** |The number of the current row within the partition, counting from 1. If an `ORDER BY` clause is specified, the row number is computed within the frame using the provided ordering instead of the frame ordering. |
| **Return type** | `BIGINT` |
| **Example** | `row_number()` |

#### Aggregate Window Functions {#docs:current:sql:functions:window_functions::aggregate-window-functions}

All [aggregate functions](#docs:current:sql:functions:aggregates) can be used in a windowing context, including the optional [`FILTER` clause](#docs:current:sql:query_syntax:filter).
The `first` and `last` aggregate functions are shadowed by the respective general-purpose window functions, with the minor consequence that the `FILTER` clause is not available for these but `IGNORE NULLS` is.

#### DISTINCT Arguments {#docs:current:sql:functions:window_functions::distinct-arguments}

All aggregate window functions support using a `DISTINCT` clause for the arguments. When the `DISTINCT` clause is
provided, only distinct values are considered in the computation of the aggregate. This is typically used in combination
with the `COUNT` aggregate to get the number of distinct elements; but it can be used together with any aggregate
function in the system. There are some aggregates that are insensitive to duplicate values (e.g., `min`, `max`) and for
them this clause is parsed and ignored.

```sql
-- Count the number of distinct users at a given point in time
SELECT count(DISTINCT name) OVER (ORDER BY time) FROM sales;
-- Concatenate those distinct users into a list
SELECT list(DISTINCT name) OVER (ORDER BY time) FROM sales;
```

#### ORDER BY Arguments {#docs:current:sql:functions:window_functions::order-by-arguments}

All aggregate window functions support using an `ORDER BY` argument clause that is *different* from the window ordering.
When the `ORDER BY` argument clause is provided, the values being aggregated are sorted before applying the function.
Usually this is not important, but there are some order-sensitive aggregates that can have indeterminate results (e.g.,
`mode`, `list` and `string_agg`). These can be made deterministic by ordering the arguments. For order-insensitive
aggregates, this clause is parsed and ignored.

```sql
-- Compute the modal value up to each time, breaking ties in favor of the most recent value.
SELECT mode(value ORDER BY time DESC) OVER (ORDER BY time) FROM sales;
```

The SQL standard does not provide for using `ORDER BY` with general-purpose window functions, but we have extended all
of these functions (except `dense_rank`) to accept this syntax and use framing to restrict the range that the secondary
ordering applies to.

```sql
-- Compare each athlete's time in an event with the best time to date
SELECT event, date, athlete, time
    first_value(time ORDER BY time DESC) OVER w AS record_time,
    first_value(athlete ORDER BY time DESC) OVER w AS record_athlete,
FROM meet_results
WINDOW w AS (PARTITION BY event ORDER BY datetime)
ORDER BY ALL
```

Note that there is no comma separating the arguments from the `ORDER BY` clause.

#### Nulls {#docs:current:sql:functions:window_functions::nulls}

All [general-purpose window functions](#::general-purpose-window-functions) that accept `IGNORE NULLS` respect nulls by default. This default behavior can optionally be made explicit via `RESPECT NULLS`.

In contrast, all [aggregate window functions](#::aggregate-window-functions) (except for `list` and its aliases, which can be made to ignore nulls via a `FILTER`) ignore nulls and do not accept `RESPECT NULLS`. For example, `sum(column) OVER (ORDER BY time) AS cumulativeColumn` computes a cumulative sum where rows with a `NULL` value of `column` have the same value of `cumulativeColumn` as the row that precedes them.

#### Evaluation {#docs:current:sql:functions:window_functions::evaluation}

Windowing works by breaking a relation up into independent *partitions*,
*ordering* those partitions,
and then computing a new column for each row as a function of the nearby values.
Some window functions depend only on the partition boundary and the ordering,
but a few (including all the aggregates) also use a *frame*.
Frames are specified as a number of rows on either side (*preceding* or *following*) of the *current row*.
The distance can be specified as a number of *rows*,
as a *range* of values using the partition's ordering value and a distance,
or as a number of *groups* (sets of rows with the same sort value).

The full syntax is shown in the diagram at the top of the page,
and this diagram visually illustrates computation environment:

![](../images/framing-light.png)



##### Partition and Ordering {#docs:current:sql:functions:window_functions::partition-and-ordering}

Partitioning breaks the relation up into independent, unrelated pieces.
Partitioning is optional, and if none is specified then the entire relation is treated as a single partition.
Window functions cannot access values outside of the partition containing the row they are being evaluated at.

Ordering is also optional, but without it the results of [general-purpose window functions](#::general-purpose-window-functions) and [order-sensitive aggregate functions](#docs:current:sql:functions:aggregates::order-by-clause-in-aggregate-functions), and the order of [framing](#::framing) are not well-defined.
Each partition is ordered using the same ordering clause.

Here is a table of power generation data, available as a CSV file ([`power-plant-generation-history.csv`](https://duckdb.org/data/power-plant-generation-history.csv)). To load the data, run:

```sql
CREATE TABLE "Generation History" AS
    FROM 'power-plant-generation-history.csv';
```

After partitioning by plant and ordering by date, it will have this layout:

| Plant | Date | MWh |
|:---|:---|---:|
| Boston | 2019-01-02 | 564337 |
| Boston | 2019-01-03 | 507405 |
| Boston | 2019-01-04 | 528523 |
| Boston | 2019-01-05 | 469538 |
| Boston | 2019-01-06 | 474163 |
| Boston | 2019-01-07 | 507213 |
| Boston | 2019-01-08 | 613040 |
| Boston | 2019-01-09 | 582588 |
| Boston | 2019-01-10 | 499506 |
| Boston | 2019-01-11 | 482014 |
| Boston | 2019-01-12 | 486134 |
| Boston | 2019-01-13 | 531518 |
| Worcester | 2019-01-02 | 118860 |
| Worcester | 2019-01-03 | 101977 |
| Worcester | 2019-01-04 | 106054 |
| Worcester | 2019-01-05 | 92182 |
| Worcester | 2019-01-06 | 94492 |
| Worcester | 2019-01-07 | 99932 |
| Worcester | 2019-01-08 | 118854 |
| Worcester | 2019-01-09 | 113506 |
| Worcester | 2019-01-10 | 96644 |
| Worcester | 2019-01-11 | 93806 |
| Worcester | 2019-01-12 | 98963 |
| Worcester | 2019-01-13 | 107170 |

In what follows,
we shall use this table (or small sections of it) to illustrate various pieces of window function evaluation.

The simplest window function is `row_number()`.
This function just computes the 1-based row number within the partition using the query:

```sql
SELECT
    "Plant",
    "Date",
    row_number() OVER (PARTITION BY "Plant" ORDER BY "Date") AS "Row"
FROM "Generation History"
ORDER BY 1, 2;
```

The result will be the following:

| Plant | Date | Row |
|:---|:---|---:|
| Boston | 2019-01-02 | 1 |
| Boston | 2019-01-03 | 2 |
| Boston | 2019-01-04 | 3 |
| ... | ... | ... |
| Worcester | 2019-01-02 | 1 |
| Worcester | 2019-01-03 | 2 |
| Worcester | 2019-01-04 | 3 |
| ... | ... | ... |

Note that even though the function is computed with an `ORDER BY` clause,
the result does not have to be sorted,
so the `SELECT` also needs to be explicitly sorted if that is desired.

##### Framing {#docs:current:sql:functions:window_functions::framing}

Framing specifies a set of rows relative to each row where the function is evaluated.
The distance from the current row is given as an expression either `PRECEDING` or `FOLLOWING` the current row in the order specified by the `ORDER BY` clause in the `OVER` specification.
This distance can either be specified as an integral number of `ROWS` or `GROUPS`,
or as a `RANGE` delta expression. It is invalid for a frame to start after it ends.
For a `RANGE` specification, there must be only one ordering expression and it must support subtraction unless only the sentinel boundary values `UNBOUNDED PRECEDING` / `UNBOUNDED FOLLOWING` / `CURRENT ROW` are used.
Using the [`EXCLUDE` clause](#::exclude-clause), rows comparing equal to the current row in the specified ordering expression (so-called peers) can be excluded from the frame.

The default frame is unbounded (i.e., the entire partition) when no `ORDER BY` clause is present and `RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW` when an `ORDER BY` clause is present. By default, the `CURRENT ROW` boundary value (but not the `CURRENT ROW` in the `EXCLUDE` clause) means the current row and all its peers when `RANGE` or `GROUP` framing are used but it means only the current row when `ROWS` framing is used.

###### `ROWS` Framing {#docs:current:sql:functions:window_functions::rows-framing}

Here is a simple `ROW` frame query, using an aggregate function:

```sql
SELECT points,
    sum(points) OVER (
        ROWS BETWEEN 1 PRECEDING
                 AND 1 FOLLOWING) AS we
FROM results;
```

This query computes the `sum` of each point and the points on either side of it:

![](../images/blog/windowing/moving-sum.jpg)


Notice that at the edge of the partition, there are only two values added together.
This is because frames are cropped to the edge of the partition.

###### `RANGE` Framing {#docs:current:sql:functions:window_functions::range-framing}

Returning to the power data, suppose the data is noisy.
We might want to compute a 7 day moving average for each plant to smooth out the noise.
To do this, we can use this window query:

```sql
SELECT "Plant", "Date",
    avg("MWh") OVER (
        PARTITION BY "Plant"
        ORDER BY "Date" ASC
        RANGE BETWEEN INTERVAL 3 DAYS PRECEDING
                  AND INTERVAL 3 DAYS FOLLOWING)
        AS "MWh 7-day Moving Average"
FROM "Generation History"
ORDER BY 1, 2;
```

This query partitions the data by `Plant` (to keep the different power plants' data separate),
orders each plant's partition by `Date` (to put the energy measurements next to each other),
and uses a `RANGE` frame of three days on either side of each day for the `avg`
(to handle any missing days).
This is the result:

| Plant | Date | MWh 7-day Moving Average |
|:---|:---|---:|
| Boston | 2019-01-02 | 517450.75 |
| Boston | 2019-01-03 | 508793.20 |
| Boston | 2019-01-04 | 508529.83 |
| ... | ... | ... |
| Boston | 2019-01-13 | 499793.00 |
| Worcester | 2019-01-02 | 104768.25 |
| Worcester | 2019-01-03 | 102713.00 |
| Worcester | 2019-01-04 | 102249.50 |
| ... | ... | ... |

###### `GROUPS` Framing {#docs:current:sql:functions:window_functions::groups-framing}

The third type of framing counts *groups* of rows relative the current row.
A *group* in this framing is a set of values with identical `ORDER BY` values.
If we assume that power is being generated on every day,
we can use `GROUPS` framing to compute the moving average of all power generated in the system
without having to resort to date arithmetic:

```sql
SELECT "Date", "Plant",
    avg("MWh") OVER (
        ORDER BY "Date" ASC
        GROUPS BETWEEN 3 PRECEDING
                   AND 3 FOLLOWING)
        AS "MWh 7-day Moving Average"
FROM "Generation History"
ORDER BY 1, 2;
```

|    Date    |   Plant   | MWh 7-day Moving Average |
|------------|-----------|-------------------------:|
| 2019-01-02 | Boston    | 311109.500               |
| 2019-01-02 | Worcester | 311109.500               |
| 2019-01-03 | Boston    | 305753.100               |
| 2019-01-03 | Worcester | 305753.100               |
| 2019-01-04 | Boston    | 305389.667               |
| 2019-01-04 | Worcester | 305389.667               |
| ... | ... | ... |
| 2019-01-12 | Boston    | 309184.900               |
| 2019-01-12 | Worcester | 309184.900               |
| 2019-01-13 | Boston    | 299469.375               |
| 2019-01-13 | Worcester | 299469.375               |

Notice how the values for each date are the same.

###### `EXCLUDE` Clause {#docs:current:sql:functions:window_functions::exclude-clause}

`EXCLUDE` is an optional modifier to the frame clause for excluding rows around the `CURRENT ROW`.
This is useful when you want to compute some aggregate value of nearby rows
to see how the current row compares to it.

In the following example, we want to know how an athlete's time in an event compares to
the average of all the times recorded for their event within ±10 days:

```sql
SELECT
    event,
    date,
    athlete,
    avg(time) OVER w AS recent,
FROM results
WINDOW w AS (
    PARTITION BY event
    ORDER BY date
    RANGE BETWEEN INTERVAL 10 DAYS PRECEDING AND INTERVAL 10 DAYS FOLLOWING
        EXCLUDE CURRENT ROW
)
ORDER BY event, date, athlete;
```

There are four options for `EXCLUDE` that specify how to treat the current row:

* `CURRENT ROW` – exclude just the current row
* `GROUP` – exclude the current row and all its “peers” (rows that have the same `ORDER BY` value)
* `TIES` – exclude all peer rows, but _not_ the current row (this makes a hole on either side)
* `NO OTHERS` – don't exclude anything (the default)

Exclusion is implemented for both windowed aggregates as well as for the `first`, `last` and `nth_value` functions.

##### `WINDOW` Clauses {#docs:current:sql:functions:window_functions::window-clauses}

Multiple different `OVER` clauses can be specified in the same `SELECT`, and each will be computed separately.
Often, however, we want to use the same layout for multiple window functions.
The `WINDOW` clause can be used to define a *named* window that can be shared between multiple window functions:

```sql
SELECT "Plant", "Date",
    min("MWh") OVER seven AS "MWh 7-day Moving Minimum",
    avg("MWh") OVER seven AS "MWh 7-day Moving Average",
    max("MWh") OVER seven AS "MWh 7-day Moving Maximum"
FROM "Generation History"
WINDOW seven AS (
    PARTITION BY "Plant"
    ORDER BY "Date" ASC
    RANGE BETWEEN INTERVAL 3 DAYS PRECEDING
              AND INTERVAL 3 DAYS FOLLOWING)
ORDER BY 1, 2;
```

The three window functions will also share the data layout, which will improve performance.

Multiple windows can be defined in the same `WINDOW` clause by comma-separating them:

```sql
SELECT "Plant", "Date",
    min("MWh") OVER seven AS "MWh 7-day Moving Minimum",
    avg("MWh") OVER seven AS "MWh 7-day Moving Average",
    max("MWh") OVER seven AS "MWh 7-day Moving Maximum",
    min("MWh") OVER three AS "MWh 3-day Moving Minimum",
    avg("MWh") OVER three AS "MWh 3-day Moving Average",
    max("MWh") OVER three AS "MWh 3-day Moving Maximum"
FROM "Generation History"
WINDOW
    seven AS (
        PARTITION BY "Plant"
        ORDER BY "Date" ASC
        RANGE BETWEEN INTERVAL 3 DAYS PRECEDING
                  AND INTERVAL 3 DAYS FOLLOWING),
    three AS (
        PARTITION BY "Plant"
        ORDER BY "Date" ASC
        RANGE BETWEEN INTERVAL 1 DAYS PRECEDING
        AND INTERVAL 1 DAYS FOLLOWING)
ORDER BY 1, 2;
```

The queries above do not use a number of clauses commonly found in select statements, like
`WHERE`, `GROUP BY`, etc. For more complex queries you can find where `WINDOW` clauses fall in
the canonical order of the [`SELECT statement`](#docs:current:sql:statements:select).

##### Filtering the Results of Window Functions Using `QUALIFY` {#docs:current:sql:functions:window_functions::filtering-the-results-of-window-functions-using-qualify}

Window functions are executed after the [`WHERE`](#docs:current:sql:query_syntax:where) and [`HAVING`](#docs:current:sql:query_syntax:having) clauses have been already evaluated, so it's not possible to use these clauses to filter the results of window functions
The [`QUALIFY` clause](#docs:current:sql:query_syntax:qualify) avoids the need for a subquery or [`WITH` clause](#docs:current:sql:query_syntax:with) to perform this filtering.

##### Box and Whisker Queries {#docs:current:sql:functions:window_functions::box-and-whisker-queries}

All aggregates can be used as windowing functions, including the complex statistical functions.
These function implementations have been optimized for windowing,
and we can use the window syntax to write queries that generate the data for moving box-and-whisker plots:

```sql
SELECT "Plant", "Date",
    min("MWh") OVER seven AS "MWh 7-day Moving Minimum",
    quantile_cont("MWh", [0.25, 0.5, 0.75]) OVER seven
        AS "MWh 7-day Moving IQR",
    max("MWh") OVER seven AS "MWh 7-day Moving Maximum",
FROM "Generation History"
WINDOW seven AS (
    PARTITION BY "Plant"
    ORDER BY "Date" ASC
    RANGE BETWEEN INTERVAL 3 DAYS PRECEDING
              AND INTERVAL 3 DAYS FOLLOWING)
ORDER BY 1, 2;
```

## Constraints {#docs:current:sql:constraints}

In SQL, constraints can be specified for tables. Constraints enforce certain properties over data that is inserted into a table. Constraints can be specified along with the schema of the table as part of the [`CREATE TABLE` statement](#docs:current:sql:statements:create_table). In certain cases, constraints can also be added to a table using the [`ALTER TABLE` statement](#docs:current:sql:statements:alter_table), but this is not currently supported for all constraints.

> **Warning.** Constraints have a strong impact on performance: they slow down loading and updates but speed up certain queries. Please consult the [Performance Guide](#docs:current:guides:performance:schema::constraints) for details.

#### Syntax {#docs:current:sql:constraints::syntax}



#### Check Constraint {#docs:current:sql:constraints::check-constraint}

Check constraints allow you to specify an arbitrary Boolean expression. Any columns that *do not* satisfy this expression violate the constraint. For example, we could enforce that the `name` column does not contain spaces using the following `CHECK` constraint.

```sql
CREATE TABLE students (name VARCHAR CHECK (NOT contains(name, ' ')));
INSERT INTO students VALUES ('this name contains spaces');
```

```console
Constraint Error:
CHECK constraint failed on table students with expression CHECK((NOT contains("name", ' ')))
```

#### Not Null Constraint {#docs:current:sql:constraints::not-null-constraint}

A not-null constraint specifies that the column cannot contain any `NULL` values. By default, all columns in tables are nullable. Adding `NOT NULL` to a column definition enforces that a column cannot contain `NULL` values.

```sql
CREATE TABLE students (name VARCHAR NOT NULL);
INSERT INTO students VALUES (NULL);
```

```console
Constraint Error:
NOT NULL constraint failed: students.name
```

#### Primary Key and Unique Constraint {#docs:current:sql:constraints::primary-key-and-unique-constraint}

Primary key or unique constraints define a column, or set of columns, that are a unique identifier for a row in the table. The constraint enforces that the specified columns are *unique* within a table, i.e., that at most one row contains the given values for the set of columns.

```sql
CREATE TABLE students (id INTEGER PRIMARY KEY, name VARCHAR);
INSERT INTO students VALUES (1, 'Student 1');
INSERT INTO students VALUES (1, 'Student 2');
```

```console
Constraint Error:
Duplicate key "id: 1" violates primary key constraint
```

```sql
CREATE TABLE students (id INTEGER, name VARCHAR, PRIMARY KEY (id, name));
INSERT INTO students VALUES (1, 'Student 1');
INSERT INTO students VALUES (1, 'Student 2');
INSERT INTO students VALUES (1, 'Student 1');
```

```console
Constraint Error:
Duplicate key "id: 1, name: Student 1" violates primary key constraint
```

To enforce this property efficiently, an [ART index is automatically created](#docs:current:sql:indexes) for every primary key or unique constraint that is defined in the table.

Primary key constraints and unique constraints are identical except for two points:

* A table can only have one primary key constraint defined, but many unique constraints
* A primary key constraint also enforces the keys to not be `NULL`.

```sql
CREATE TABLE students (id INTEGER PRIMARY KEY, name VARCHAR, email VARCHAR UNIQUE);
INSERT INTO students VALUES (1, 'Student 1', 'student1@uni.com');
INSERT INTO students VALUES (2, 'Student 2', 'student1@uni.com');
```

```console
Constraint Error:
Duplicate key "email: student1@uni.com" violates unique constraint.
```

```sql
INSERT INTO students (id, name) VALUES (3, 'Student 3');
INSERT INTO students (name, email) VALUES ('Student 3', 'student3@uni.com');
```

```console
Constraint Error:
NOT NULL constraint failed: students.id
```

> **Warning.** Indexes have certain limitations that might result in constraints being evaluated too eagerly, leading to constraint errors such as `violates primary key constraint` and `violates unique constraint`. See the [Indexes page](#docs:current:sql:indexes::limitations-of-art-indexes) for more details.

You can also define a uniqueness constraint on multiple columns:

```sql
CREATE TABLE integers (i INTEGER, j INTEGER, k INTEGER, UNIQUE (i, j));
INSERT INTO integers VALUES (1, 2, 3);
INSERT INTO integers VALUES (1, 4, 5);
INSERT INTO integers VALUES (1, 2, 5);
```

```console
Constraint Error:
Duplicate key "i: 1, j: 2" violates unique constraint.
```

#### Foreign Keys {#docs:current:sql:constraints::foreign-keys}

Foreign keys define a column, or set of columns, that refer to a primary key or unique constraint from *another* table. The constraint enforces that the key exists in the other table.

```sql
CREATE TABLE students (id INTEGER PRIMARY KEY, name VARCHAR);
CREATE TABLE subjects (id INTEGER PRIMARY KEY, name VARCHAR);
CREATE TABLE exams (
    exam_id INTEGER PRIMARY KEY,
    subject_id INTEGER REFERENCES subjects(id),
    student_id INTEGER REFERENCES students(id),
    grade INTEGER
);
INSERT INTO students VALUES (1, 'Student 1');
INSERT INTO subjects VALUES (1, 'CS 101');
INSERT INTO exams VALUES (1, 1, 1, 10);
INSERT INTO exams VALUES (2, 1, 2, 10);
```

```console
Constraint Error:
Violates foreign key constraint because key "id: 2" does not exist in the referenced table
```

To enforce this property efficiently, an [ART index is automatically created](#docs:current:sql:indexes) for every foreign key constraint that is defined in the table.

> **Warning.** Indexes have certain limitations that might result in constraints being evaluated too eagerly, leading to constraint errors such as `violates primary key constraint` and `violates unique constraint`. See the [indexes section for more details](#docs:current:sql:indexes::index-limitations).

## Indexes {#docs:current:sql:indexes}

#### Index Types {#docs:current:sql:indexes::index-types}

DuckDB has two built-in index types. Indexes can also be defined via [extensions](#docs:current:extensions:overview).

##### Min-Max Index (Zonemap) {#docs:current:sql:indexes::min-max-index-zonemap}

A [min-max index](https://en.wikipedia.org/wiki/Block_Range_Index) (also known as zonemap or block range index) is _automatically created_ for columns of all [general-purpose data types](#docs:current:sql:data_types:overview).

##### Adaptive Radix Tree (ART) {#docs:current:sql:indexes::adaptive-radix-tree-art}

An [Adaptive Radix Tree (ART)](https://db.in.tum.de/~leis/papers/ART.pdf) is mainly used to ensure primary key constraints and to speed up point and very highly selective (i.e., < 0.1%) queries. ART indexes can be created manually using the `CREATE INDEX` clause and they are automatically created for columns with a `UNIQUE` or `PRIMARY KEY` constraint.

> **Warning.** ART indexes must currently be able to fit in memory during index creation. Avoid creating ART indexes if the index does not fit in memory during index creation.

##### Indexes Defined by Extensions {#docs:current:sql:indexes::indexes-defined-by-extensions}

DuckDB supports [R-trees for spatial indexing](#docs:current:core_extensions:spatial:r-tree_indexes) via the `spatial` extension.

#### Persistence {#docs:current:sql:indexes::persistence}

Both min-max indexes and ART indexes are persisted on disk.

#### `CREATE INDEX` and `DROP INDEX` Statements {#docs:current:sql:indexes::create-index-and-drop-index-statements}

To create an [ART index](#::adaptive-radix-tree-art), use the [`CREATE INDEX` statement](#docs:current:sql:statements:create_index::create-index).
To drop an [ART index](#::adaptive-radix-tree-art), use the [`DROP INDEX` statement](#docs:current:sql:statements:create_index::drop-index).

#### Limitations of ART Indexes {#docs:current:sql:indexes::limitations-of-art-indexes}

ART indexes create a secondary copy of the data in a second location.
Maintaining that second copy complicates processing.
Thus, certain limitations currently apply when it comes to modifying data that is also stored in secondary indexes.

> As expected, indexes have a strong effect on performance, slowing down loading and updates, but speeding up certain queries. Please consult the [Performance Guide](#docs:current:guides:performance:indexing) for details.

##### Constraint Checking in `UPDATE` Statements {#docs:current:sql:indexes::constraint-checking-in-update-statements}

`UPDATE` statements on indexed columns and columns that cannot be updated in place are transformed into a `DELETE` of the original row followed by an `INSERT` of the updated row.
This rewrite has performance implications, particularly for wide tables, as entire rows are rewritten instead of only the affected columns.

Additionally, it causes the following constraint-checking limitation of `UPDATE` statements.
The same limitation exists in other DBMSs, like PostgreSQL.

In the example below, note how the number of rows exceeds DuckDB's standard vector size, which is 2048 by default.
The `UPDATE` statement is rewritten into a `DELETE`, followed by an `INSERT`.
This rewrite happens per chunk of data (2048 rows) moving through DuckDB's processing pipeline.
When updating `i = 2047` to `i = 2048`, we do not yet know that 2048 becomes 2049, and so forth.
That is because we have not yet seen that chunk.
Thus, we throw a constraint violation.

```sql
CREATE TABLE my_table (i INTEGER PRIMARY KEY);
INSERT INTO my_table SELECT range FROM range(3_000);
UPDATE my_table SET i = i + 1;
```

```console
Constraint Error:
Duplicate key "i: 2048" violates primary key constraint.
```

A workaround is to split the `UPDATE` into a `DELETE ... RETURNING ...` followed by an `INSERT`,
with some additional logic to (temporarily) store the result of the `DELETE`.
All statements should be run inside a transaction via `BEGIN`, and eventually `COMMIT`.

Here's an example of how that could look like in the command line client.

```sql
CREATE TABLE my_table (i INTEGER PRIMARY KEY);
INSERT INTO my_table SELECT range FROM range(3_000);

BEGIN;
CREATE TEMP TABLE tmp AS SELECT i FROM my_table;
DELETE FROM my_table;
INSERT INTO my_table SELECT i FROM tmp;
DROP TABLE tmp;
COMMIT;
```

In other clients, you might be able to fetch the result of `DELETE ... RETURNING ...`.
Then, you can use that result in a subsequent `INSERT ...` statement, or potentially make use of DuckDB's `Appender` (if available in the client).

##### Over-Eager Constraint Checking in Foreign Keys {#docs:current:sql:indexes::over-eager-constraint-checking-in-foreign-keys}

This limitation occurs if you meet the following conditions:

* A table has a `FOREIGN KEY` constraint.
* There is an `UPDATE` on a composite payload column (e.g., a `LIST` or a `STRUCT`), the corresponding `PRIMARY KEY` table, which DuckDB rewrites into a `DELETE` followed by an `INSERT`.
* The to-be-deleted row exists in the foreign key table.

If these hold, you'll encounter an unexpected constraint violation:

```sql
CREATE TABLE pk_table (id INTEGER PRIMARY KEY, payload VARCHAR[]);
INSERT INTO pk_table VALUES (1, ['hello']);
CREATE TABLE fk_table (id INTEGER REFERENCES pk_table(id));
INSERT INTO fk_table VALUES (1);
UPDATE pk_table SET payload = ['world'] WHERE id = 1;
```

```console
Constraint Error:
Violates foreign key constraint because key "id: 1" is still referenced by a foreign key in a different table. If this is an unexpected constraint violation, please refer to our foreign key limitations in the documentation
```

The reason for this is that DuckDB does not yet support “looking ahead”.
During the `INSERT`, it is unaware it will reinsert the foreign key value as part of the `UPDATE` rewrite.

## Meta Queries {#sql:meta}

### Information Schema {#docs:current:sql:meta:information_schema}

The views in the `information_schema` are SQL-standard views that describe the catalog entries of the database. These views can be filtered to obtain information about a specific column or table.
DuckDB's implementation is based on [PostgreSQL's information schema](https://www.postgresql.org/docs/16/infoschema-columns.html).

#### Tables {#docs:current:sql:meta:information_schema::tables}

##### `character_sets`: Character Sets {#docs:current:sql:meta:information_schema::character_sets-character-sets}

| Column | Description | Type | Example |
|--------|-------------|------|---------|
| `character_set_catalog` | Currently not implemented – always `NULL`. | `VARCHAR` | `NULL` |
| `character_set_schema` | Currently not implemented – always `NULL`. | `VARCHAR` | `NULL` |
| `character_set_name` | Name of the character set, currently implemented as showing the name of the database encoding. | `VARCHAR` | `'UTF8'` |
| `character_repertoire` | Character repertoire, showing `UCS` if the encoding is `UTF8`, else just the encoding name. | `VARCHAR` | `'UCS'` |
| `form_of_use` | Character encoding form, same as the database encoding. | `VARCHAR` | `'UTF8'` |
| `default_collate_catalog`| Name of the database containing the default collation (always the current database). | `VARCHAR` | `'my_db'` |
| `default_collate_schema` | Name of the schema containing the default collation. | `VARCHAR` | `'pg_catalog'` |
| `default_collate_name` | Name of the default collation. | `VARCHAR` | `'ucs_basic'` |

##### `columns`: Columns {#docs:current:sql:meta:information_schema::columns-columns}

The view that describes the catalog information for columns is `information_schema.columns`. It lists the columns present in the database and has the following layout:

| Column | Description | Type | Example |
|:--|:---|:-|:-|
| `table_catalog` | Name of the database containing the table (always the current database). | `VARCHAR` | `'my_db'` |
| `table_schema` | Name of the schema containing the table. | `VARCHAR` | `'main'` |
| `table_name` | Name of the table. | `VARCHAR` | `'widgets'` |
| `column_name` | Name of the column. | `VARCHAR` | `'price'` |
| `ordinal_position` | Ordinal position of the column within the table (count starts at 1). | `INTEGER` | `5` |
| `column_default` | Default expression of the column. |`VARCHAR`| `1.99` |
| `is_nullable` | `YES` if the column is possibly nullable, `NO` if it is known not nullable. |`VARCHAR`| `'YES'` |
| `data_type` | Data type of the column. |`VARCHAR`| `'DECIMAL(18, 2)'` |
| `character_maximum_length` | If `data_type` identifies a character or bit string type, the declared maximum length; `NULL` for all other data types or if no maximum length was declared. |`INTEGER`| `255` |
| `character_octet_length` | If `data_type` identifies a character type, the maximum possible length in octets (bytes) of a datum; `NULL` for all other data types. The maximum octet length depends on the declared character maximum length (see above) and the character encoding. |`INTEGER`| `1073741824` |
| `numeric_precision` | If `data_type` identifies a numeric type, this column contains the (declared or implicit) precision of the type for this column. The precision indicates the number of significant digits. For all other data types, this column is `NULL`. |`INTEGER`| `18` |
| `numeric_scale` | If `data_type` identifies a numeric type, this column contains the (declared or implicit) scale of the type for this column. The scale indicates the number of digits after the decimal point. For all other data types, this column is `NULL`. |`INTEGER`| `2` |
| `datetime_precision` | If `data_type` identifies a date, time, timestamp, or interval type, this column contains the (declared or implicit) fractional seconds precision of the type for this column, that is, the number of decimal digits maintained following the decimal point in the seconds value. No fractional seconds are currently supported in DuckDB. For all other data types, this column is `NULL`. |`INTEGER`| `0` |

##### `constraint_column_usage`: Constraint Column Usage {#docs:current:sql:meta:information_schema::constraint_column_usage-constraint-column-usage}

This view describes all columns in the current database that are used by some constraint. For a check constraint, this view identifies the columns that are used in the check expression. For a not-null constraint, this view identifies the column that the constraint is defined on. For a foreign key constraint, this view identifies the columns that the foreign key references. For a unique or primary key constraint, this view identifies the constrained columns.

| Column | Description | Type | Example |
|--------|-------------|------|---------|
| `table_catalog` | Name of the database that contains the table that contains the column that is used by some constraint (always the current database) |`VARCHAR`| `'my_db'` |
| `table_schema` | Name of the schema that contains the table that contains the column that is used by some constraint |`VARCHAR`| `'main'` |
| `table_name` | Name of the table that contains the column that is used by some constraint |`VARCHAR`| `'widgets'` |
| `column_name` | Name of the column that is used by some constraint |`VARCHAR`| `'price'` |
| `constraint_catalog` | Name of the database that contains the constraint (always the current database) |`VARCHAR`| `'my_db'` |
| `constraint_schema` | Name of the schema that contains the constraint |`VARCHAR`| `'main'` |
| `constraint_name` | Name of the constraint |`VARCHAR`| `'exam_id_students_id_fkey'` |

##### `key_column_usage`: Key Column Usage {#docs:current:sql:meta:information_schema::key_column_usage-key-column-usage}

| Column | Description | Type | Example |
|--------|-------------|------|---------|
| `constraint_catalog` | Name of the database that contains the constraint (always the current database). | `VARCHAR` | `'my_db'` |
| `constraint_schema` | Name of the schema that contains the constraint. | `VARCHAR` | `'main'` |
| `constraint_name` | Name of the constraint. | `VARCHAR` | `'exams_exam_id_fkey'` |
| `table_catalog` | Name of the database that contains the table that contains the column that is restricted by this constraint (always the current database). | `VARCHAR` | `'my_db'` |
| `table_schema` | Name of the schema that contains the table that contains the column that is restricted by this constraint. | `VARCHAR` | `'main'` |
| `table_name` | Name of the table that contains the column that is restricted by this constraint. | `VARCHAR` | `'exams'` |
| `column_name` | Name of the column that is restricted by this constraint. | `VARCHAR` | `'exam_id'` |
| `ordinal_position` | Ordinal position of the column within the constraint key (count starts at 1). | `INTEGER` | `1` |
| `position_in_unique_constraint` | For a foreign-key constraint, ordinal position of the referenced column within its unique constraint (count starts at `1`); otherwise `NULL`. | `INTEGER` | `1` |

##### `referential_constraints`: Referential Constraints {#docs:current:sql:meta:information_schema::referential_constraints-referential-constraints}

| Column | Description | Type | Example |
|--------|-------------|------|---------|
| `constraint_catalog` | Name of the database containing the constraint (always the current database). | `VARCHAR` | `'my_db'` |
| `constraint_schema` | Name of the schema containing the constraint. | `VARCHAR` | `main` |
| `constraint_name` | Name of the constraint. | `VARCHAR` | `exam_id_students_id_fkey` |
| `unique_constraint_catalog` | Name of the database that contains the unique or primary key constraint that the foreign key constraint references. | `VARCHAR` | `'my_db'` |
| `unique_constraint_schema` | Name of the schema that contains the unique or primary key constraint that the foreign key constraint references. | `VARCHAR` | `'main'` |
| `unique_constraint_name` | Name of the unique or primary key constraint that the foreign key constraint references. | `VARCHAR` | `'students_id_pkey'` |
| `match_option` | Match option of the foreign key constraint. Always `NONE`. | `VARCHAR` | `NONE` |
| `update_rule` | Update rule of the foreign key constraint. Always `NO ACTION`. | `VARCHAR` | `NO ACTION` |
| `delete_rule` | Delete rule of the foreign key constraint. Always `NO ACTION`. | `VARCHAR` | `NO ACTION` |

##### `schemata`: Database, Catalog and Schema {#docs:current:sql:meta:information_schema::schemata-database-catalog-and-schema}

The top level catalog view is `information_schema.schemata`. It lists the catalogs and the schemas present in the database and has the following layout:

| Column | Description | Type | Example |
|:--|:---|:-|:-|
| `catalog_name` | Name of the database that the schema is contained in. | `VARCHAR` | `'my_db'` |
| `schema_name` | Name of the schema. | `VARCHAR` | `'main'` |
| `schema_owner` | Name of the owner of the schema. Not yet implemented. | `VARCHAR` | `'duckdb'` |
| `default_character_set_catalog` | Applies to a feature not available in DuckDB. | `VARCHAR` | `NULL` |
| `default_character_set_schema` | Applies to a feature not available in DuckDB. | `VARCHAR` | `NULL` |
| `default_character_set_name` | Applies to a feature not available in DuckDB. | `VARCHAR` | `NULL` |
| `sql_path` | Applies to a feature not available in DuckDB. | `VARCHAR` | `NULL` |

##### `tables`: Tables and Views {#docs:current:sql:meta:information_schema::tables-tables-and-views}

The view that describes the catalog information for tables and views is `information_schema.tables`. It lists the tables present in the database and has the following layout:

| Column | Description | Type | Example |
|:--|:---|:-|:-|
| `table_catalog` | The catalog the table or view belongs to. | `VARCHAR` | `'my_db'` |
| `table_schema` | The schema the table or view belongs to. | `VARCHAR` | `'main'` |
| `table_name` | The name of the table or view. | `VARCHAR` | `'widgets'` |
| `table_type` | The type of table. One of: `BASE TABLE`, `LOCAL TEMPORARY`, `VIEW`. | `VARCHAR` | `'BASE TABLE'` |
| `self_referencing_column_name` | Applies to a feature not available in DuckDB. | `VARCHAR` | `NULL` |
| `reference_generation` | Applies to a feature not available in DuckDB. | `VARCHAR` | `NULL` |
| `user_defined_type_catalog` | If the table is a typed table, the name of the database that contains the underlying data type (always the current database), else `NULL`. Currently unimplemented. | `VARCHAR` | `NULL` |
| `user_defined_type_schema` | If the table is a typed table, the name of the schema that contains the underlying data type, else `NULL`. Currently unimplemented. | `VARCHAR` | `NULL` |
| `user_defined_type_name` | If the table is a typed table, the name of the underlying data type, else `NULL`. Currently unimplemented. | `VARCHAR` | `NULL` |
| `is_insertable_into` | `YES` if the table is insertable into, `NO` if not (Base tables are always insertable into, views not necessarily.)| `VARCHAR` | `'YES'` |
| `is_typed` | `YES` if the table is a typed table, `NO` if not. | `VARCHAR` | `'NO'` |
| `commit_action` | Not yet implemented. | `VARCHAR` | `'NO'` |

##### `table_constraints`: Table Constraints {#docs:current:sql:meta:information_schema::table_constraints-table-constraints}

| Column | Description | Type | Example |
|--------|-------------|------|---------|
| `constraint_catalog` | Name of the database that contains the constraint (always the current database). | `VARCHAR` | `'my_db'` |
| `constraint_schema` | Name of the schema that contains the constraint. | `VARCHAR` | `'main'` |
| `constraint_name` | Name of the constraint. | `VARCHAR` | `'exams_exam_id_fkey'` |
| `table_catalog` | Name of the database that contains the table (always the current database). | `VARCHAR` | `'my_db'` |
| `table_schema` | Name of the schema that contains the table. | `VARCHAR` | `'main'` |
| `table_name` | Name of the table. | `VARCHAR` | `'exams'` |
| `constraint_type` | Type of the constraint: `CHECK`, `FOREIGN KEY`, `PRIMARY KEY`, or `UNIQUE`. | `VARCHAR` | `'FOREIGN KEY'` |
| `is_deferrable` | `YES` if the constraint is deferrable, `NO` if not. | `VARCHAR` | `'NO'` |
| `initially_deferred` | `YES` if the constraint is deferrable and initially deferred, `NO` if not. | `VARCHAR` | `'NO'` |
| `enforced` | Always `YES`. | `VARCHAR` | `'YES'` |
| `nulls_distinct` | If the constraint is a unique constraint, then `YES` if the constraint treats `NULL`s as distinct or `NO` if it treats `NULL`s as not distinct, otherwise `NULL` for other types of constraints. | `VARCHAR` | `'YES'` |

#### Catalog Functions {#docs:current:sql:meta:information_schema::catalog-functions}

Several functions are also provided to see details about the catalogs and schemas that are configured in the database.

| Function | Description | Example | Result |
|:--|:---|:--|:--|
| `current_catalog()` | Return the name of the currently active catalog. Default is memory. | `current_catalog()` | `'memory'` |
| `current_schema()` | Return the name of the currently active schema. Default is main. | `current_schema()` | `'main'` |
| `current_schemas(boolean)` | Return list of schemas. Pass a parameter of `true` to include implicit schemas. | `current_schemas(true)` | `['temp', 'main', 'pg_catalog']` |

### DuckDB_% Metadata Functions {#docs:current:sql:meta:duckdb_table_functions}

DuckDB offers a collection of table functions that provide metadata about the current database. These functions reside in the `main` schema and their names are prefixed with `duckdb_`.

The resultset returned by a `duckdb_` table function may be used just like an ordinary table or view. For example, you can use a `duckdb_` function call in the `FROM` clause of a `SELECT` statement, and you may refer to the columns of its returned resultset elsewhere in the statement, for example in the `WHERE` clause.

Table functions are still functions, and you should write parentheses after the function name to call it to obtain its returned resultset:

```sql
SELECT * FROM duckdb_settings();
```

Alternatively, you may execute table functions also using the `CALL`-syntax:

```sql
CALL duckdb_settings();
```

In this case too, the parentheses are mandatory.

> For some of the `duckdb_%` functions, there is also an identically named view available, which also resides in the `main` schema. Typically, these views do a `SELECT` on the `duckdb_` table function with the same name, while filtering out those objects that are marked as internal. We mention it here, because if you accidentally omit the parentheses in your `duckdb_` table function call, you might still get a result, but from the identically named view.

Example:

The `duckdb_views()` _table function_ returns all views, including those marked internal:

```sql
SELECT * FROM duckdb_views();
```

The `duckdb_views` _view_ returns views that are not marked as internal:

```sql
SELECT * FROM duckdb_views;
```

#### `duckdb_columns` {#docs:current:sql:meta:duckdb_table_functions::duckdb_columns}

The `duckdb_columns()` function provides metadata about the columns available in the DuckDB instance.

| Column | Description | Type |
|:-|:---|:-|
| `database_name` | The name of the database that contains the column object. | `VARCHAR` |
| `database_oid` | Internal identifier of the database that contains the column object. | `BIGINT` |
| `schema_name` | The SQL name of the schema that contains the table object that defines this column. | `VARCHAR` |
| `schema_oid` | Internal identifier of the schema object that contains the table of the column. | `BIGINT` |
| `table_name` | The SQL name of the table that defines the column. | `VARCHAR` |
| `table_oid` | Internal identifier (name) of the table object that defines the column. | `BIGINT` |
| `column_name` | The SQL name of the column. | `VARCHAR` |
| `column_index` | The unique position of the column within its table. | `INTEGER` |
| `comment` | A comment created by the [`COMMENT ON` statement](#docs:current:sql:statements:comment_on). | `VARCHAR` |
| `internal` | `true` if this column is built-in, `false` if it is user-defined. | `BOOLEAN` |
| `column_default` | The default value of the column (expressed in SQL)| `VARCHAR` |
| `is_nullable` | `true` if the column can hold `NULL` values; `false` if the column cannot hold `NULL`-values. | `BOOLEAN` |
| `data_type` | The name of the column data type. | `VARCHAR` |
| `data_type_id` | The internal identifier of the column data type. | `BIGINT` |
| `character_maximum_length` | Always `NULL`. DuckDB [text types](#docs:current:sql:data_types:text) do not enforce a value length restriction based on a length type parameter. | `INTEGER` |
| `numeric_precision` | The number of units (in the base indicated by `numeric_precision_radix`) used for storing column values. For integral and approximate numeric types, this is the number of bits. For decimal types, this is the number of digit positions. | `INTEGER` |
| `numeric_precision_radix` | The number-base of the units in the `numeric_precision` column. For integral and approximate numeric types, this is `2`, indicating the precision is expressed as a number of bits. For the `decimal` type this is `10`, indicating the precision is expressed as a number of decimal positions. | `INTEGER` |
| `numeric_scale` | Applicable to `decimal` type. Indicates the maximum number of fractional digits (i.e., the number of digits that may appear after the decimal separator). | `INTEGER` |

The [`information_schema.columns`](#docs:current:sql:meta:information_schema::columns-columns) system view provides a more standardized way to obtain metadata about database columns, but the `duckdb_columns` function also returns metadata about DuckDB internal objects. (In fact, `information_schema.columns` is implemented as a query on top of `duckdb_columns()`)

#### `duckdb_constraints` {#docs:current:sql:meta:duckdb_table_functions::duckdb_constraints}

The `duckdb_constraints()` function provides metadata about the constraints available in the DuckDB instance.

| Column | Description | Type |
|:-|:---|:-|
| `database_name` | The name of the database that contains the constraint. | `VARCHAR` |
| `database_oid` | Internal identifier of the database that contains the constraint. | `BIGINT` |
| `schema_name` | The SQL name of the schema that contains the table on which the constraint is defined. | `VARCHAR` |
| `schema_oid` | Internal identifier of the schema object that contains the table on which the constraint is defined. | `BIGINT` |
| `table_name` | The SQL name of the table on which the constraint is defined. | `VARCHAR` |
| `table_oid` | Internal identifier (name) of the table object on which the constraint is defined. | `BIGINT` |
| `constraint_index` | Indicates the position of the constraint as it appears in its table definition. | `BIGINT` |
| `constraint_type` | Indicates the type of constraint. Applicable values are `CHECK`, `FOREIGN KEY`, `PRIMARY KEY`, `NOT NULL`, `UNIQUE`. | `VARCHAR` |
| `constraint_text` | The definition of the constraint expressed as a SQL-phrase. (Not necessarily a complete or syntactically valid DDL-statement.)| `VARCHAR` |
| `expression` | If constraint is a check constraint, the definition of the condition being checked, otherwise `NULL`. | `VARCHAR` |
| `constraint_column_indexes` | An array of table column indexes referring to the columns that appear in the constraint definition. | `BIGINT[]` |
| `constraint_column_names` | An array of table column names appearing in the constraint definition. | `VARCHAR[]` |
| `constraint_name` | The name of the constraint. | `VARCHAR` |
| `referenced_table` | The table referenced by the constraint. | `VARCHAR` |
| `referenced_column_names` | The column names referenced by the constraint. | `VARCHAR[]` |

The [`information_schema.referential_constraints`](#docs:current:sql:meta:information_schema::referential_constraints-referential-constraints) and [`information_schema.table_constraints`](#docs:current:sql:meta:information_schema::table_constraints-table-constraints) system views provide a more standardized way to obtain metadata about constraints, but the `duckdb_constraints` function also returns metadata about DuckDB internal objects. (In fact, `information_schema.referential_constraints` and `information_schema.table_constraints` are implemented as a query on top of `duckdb_constraints()`)

#### `duckdb_databases` {#docs:current:sql:meta:duckdb_table_functions::duckdb_databases}

The `duckdb_databases()` function lists the databases that are accessible from within the current DuckDB process.
Apart from the database associated at startup, the list also includes databases that were [attached](#docs:current:sql:statements:attach) later on to the DuckDB process.

| Column | Description | Type |
|:-|:---|:-|
| `database_name` | The name of the database, or the alias if the database was attached using an ALIAS-clause. | `VARCHAR` |
| `database_oid` | The internal identifier of the database. | `VARCHAR` |
| `path` | The file path associated with the database. | `VARCHAR` |
| `comment` | A comment created by the [`COMMENT ON` statement](#docs:current:sql:statements:comment_on). | `VARCHAR` |
| `tags` | A map of string key–value pairs. | `MAP(VARCHAR, VARCHAR)` |
| `internal` | `true` indicates a system or built-in database. `false` indicates a user-defined database. | `BOOLEAN` |
| `type` | The type indicates the type of RDBMS implemented by the attached database. For DuckDB databases, that value is `duckdb`. | `VARCHAR` |
| `readonly` | Denotes whether the database is read-only. | `BOOLEAN` |
| `options` | The options used in the `ATTACH` statement, as a map of option names to their string values. | `MAP(VARCHAR, VARCHAR)` |

#### `duckdb_dependencies` {#docs:current:sql:meta:duckdb_table_functions::duckdb_dependencies}

The `duckdb_dependencies()` function provides metadata about the dependencies available in the DuckDB instance.

| Column | Description | Type |
|:--|:------|:-|
| `classid` | Always 0| `BIGINT` |
| `objid` | The internal id of the object. | `BIGINT` |
| `objsubid` | Always 0| `INTEGER` |
| `refclassid` | Always 0| `BIGINT` |
| `refobjid` | The internal id of the dependent object. | `BIGINT` |
| `refobjsubid` | Always 0| `INTEGER` |
| `deptype` | The type of dependency. Either regular (n) or automatic (a). | `VARCHAR` |

#### `duckdb_extensions` {#docs:current:sql:meta:duckdb_table_functions::duckdb_extensions}

The `duckdb_extensions()` function provides metadata about the extensions available in the DuckDB instance.

| Column | Description | Type |
|:--|:------|:-|
| `extension_name` | The name of the extension. | `VARCHAR` |
| `loaded` | `true` if the extension is loaded, `false` if it's not loaded. | `BOOLEAN` |
| `installed` | `true` if the extension is installed, `false` if it's not installed. | `BOOLEAN` |
| `install_path` | `(BUILT-IN)` if the extension is built-in, otherwise, the filesystem path where the binary that implements the extension resides. | `VARCHAR` |
| `description` | Human readable text that describes the extension's functionality. | `VARCHAR` |
| `aliases` | List of alternative names for this extension. | `VARCHAR[]` |
| `extension_version` | The version of the extension (` vX.Y.Z` for stable versions and 6-character hash for unstable versions). | `VARCHAR` |
| `install_mode` | The installation mode that was used to install the extension: `UNKNOWN`, `REPOSITORY`, `CUSTOM_PATH`, `STATICALLY_LINKED`, `NOT_INSTALLED`, `NULL`. | `VARCHAR` |
| `installed_from` | Name of the repository the extension was installed from, e.g., `community` or `core_nightly`. The empty string denotes the `core` repository. | `VARCHAR` |

#### `duckdb_functions` {#docs:current:sql:meta:duckdb_table_functions::duckdb_functions}

The `duckdb_functions()` function provides metadata about the functions (including macros) available in the DuckDB instance.

| Column | Description | Type |
|:-|:---|:-|
| `database_name` | The name of the database that contains this function. | `VARCHAR` |
| `database_oid` | Internal identifier of the database containing the index. | `BIGINT` |
| `schema_name` | The SQL name of the schema where the function resides. | `VARCHAR` |
| `function_name` | The SQL name of the function. | `VARCHAR` |
| `function_type` | The function kind. Value is one of: `table`, `scalar`, `aggregate`, `pragma`, `macro`, `table_macro` | `VARCHAR` |
| `description` | Description of this function (always `NULL`)| `VARCHAR` |
| `comment` | A comment created by the [`COMMENT ON` statement](#docs:current:sql:statements:comment_on). | `VARCHAR` |
| `tags` | A map of string key–value pairs. | `MAP(VARCHAR, VARCHAR)` |
| `return_type` | The logical data type name of the returned value. Applicable for scalar and aggregate functions. | `VARCHAR` |
| `parameters` | If the function has parameters, the list of parameter names. | `VARCHAR[]` |
| `parameter_types` | If the function has parameters, a list of logical data type names corresponding to the parameter list. | `VARCHAR[]` |
| `varargs` | The name of the data type in case the function has a variable number of arguments, or `NULL` if the function does not have a variable number of arguments. | `VARCHAR` |
| `macro_definition` | If this is a [macro](#docs:current:sql:statements:create_macro), the SQL expression that defines it. | `VARCHAR` |
| `has_side_effects` | `false` if this is a pure function. `true` if this function changes the database state (like sequence functions `nextval()` and `curval()`). | `BOOLEAN` |
| `internal` | `true` if the function is built-in (defined by DuckDB or an extension), `false` if it was defined using the [`CREATE MACRO` statement](#docs:current:sql:statements:create_macro). | `BOOLEAN` |
| `function_oid` | The internal identifier for this function. | `BIGINT` |
| `examples` | Examples of using the function. Used to generate the documentation. | `VARCHAR[]` |
| `stability` | The stability of the function (` CONSISTENT`, `VOLATILE`, `CONSISTENT_WITHIN_QUERY` or `NULL`) | `VARCHAR` |

#### `duckdb_indexes` {#docs:current:sql:meta:duckdb_table_functions::duckdb_indexes}

The `duckdb_indexes()` function provides metadata about secondary indexes available in the DuckDB instance.

| Column | Description | Type |
|:-|:---|:-|
| `database_name` | The name of the database that contains this index. | `VARCHAR` |
| `database_oid` | Internal identifier of the database containing the index. | `BIGINT` |
| `schema_name` | The SQL name of the schema that contains the table with the secondary index. | `VARCHAR` |
| `schema_oid` | Internal identifier of the schema object. | `BIGINT` |
| `index_name` | The SQL name of this secondary index. | `VARCHAR` |
| `index_oid` | The object identifier of this index. | `BIGINT` |
| `table_name` | The name of the table with the index. | `VARCHAR` |
| `table_oid` | Internal identifier (name) of the table object. | `BIGINT` |
| `comment` | A comment created by the [`COMMENT ON` statement](#docs:current:sql:statements:comment_on). | `VARCHAR` |
| `tags` | A map of string key–value pairs. | `MAP(VARCHAR, VARCHAR)` |
| `is_unique` | `true` if the index was created with the `UNIQUE` modifier, `false` if it was not. | `BOOLEAN` |
| `is_primary` | Always `false`. | `BOOLEAN` |
| `expressions` | Always `NULL`. | `VARCHAR` |
| `sql` | The definition of the index, expressed as a `CREATE INDEX` SQL statement. | `VARCHAR` |

Note that `duckdb_indexes` only provides metadata about secondary indexes, i.e., those indexes created by explicit [`CREATE INDEX`](#docs:current:sql:indexes::create-index) statements. Primary keys, foreign keys, and `UNIQUE` constraints are maintained using indexes, but their details are included in the `duckdb_constraints()` function.

#### `duckdb_keywords` {#docs:current:sql:meta:duckdb_table_functions::duckdb_keywords}

The `duckdb_keywords()` function provides metadata about DuckDB's keywords and reserved words.

| Column | Description | Type |
|:-|:---|:-|
| `keyword_name` | The keyword. | `VARCHAR` |
| `keyword_category` | Indicates the category of the keyword. Values are `column_name`, `reserved`, `type_function` and `unreserved`. | `VARCHAR` |

#### `duckdb_log_contexts` {#docs:current:sql:meta:duckdb_table_functions::duckdb_log_contexts}

The `duckdb_log_contexts()` function provides information on the contexts of DuckDB log entries.

| Column | Description | Type |
|:-|:---|:-|
| `context_id` | The identifier of the context. The `context_id` column in the [`duckdb_logs`](#::duckdb_logs) table is a foreign key that points to this column. | `UBIGINT` |
| `scope` | The scope of the context (` connection`, `database` or `file_opener`). | `VARCHAR` |
| `connection_id` | The identifier of the connection. | `UBIGINT` |
| `transaction_id` | The identifier of the transaction. | `UBIGINT` |
| `query_id` | The identifier of the query. | `UBIGINT` |
| `thread_id` | The identifier of the thread. | `UBIGINT` |

#### `duckdb_logs` {#docs:current:sql:meta:duckdb_table_functions::duckdb_logs}

The `duckdb_logs()` function returns a table of DuckDB log entries.

| Column | Description | Type |
|:-|:---|:-|
| `context_id` | The identifier of the context of the log entry. Foreign key to the [`duckdb_log_contexts`](#::duckdb_log_contexts) table. | `UBIGINT` |
| `timestamp` | The timestamp of the log entry. | `TIMESTAMP` |
| `type` | The type of the log entry. | `VARCHAR` |
| `log_level` | The level of the log entry (` TRACE`, `DEBUG`, `INFO`, `WARN`, `ERROR` or `FATAL`). | `VARCHAR` |
| `message` | The message of the log entry. | `VARCHAR` |

#### `duckdb_memory` {#docs:current:sql:meta:duckdb_table_functions::duckdb_memory}

The `duckdb_memory()` function provides metadata about DuckDB's buffer manager.

| Column | Description | Type |
|:-|:---|:-|
| `tag` | The memory tag. It has one of the following values: `BASE_TABLE`, `HASH_TABLE`, `PARQUET_READER`, `CSV_READER`, `ORDER_BY`, `ART_INDEX`, `COLUMN_DATA`, `METADATA`, `OVERFLOW_STRINGS`, `IN_MEMORY_TABLE`, `ALLOCATOR`, `EXTENSION`. | `VARCHAR` |
| `memory_usage_bytes` | The memory used (in bytes). | `BIGINT` |
| `temporary_storage_bytes` | The disk storage used (in bytes). | `BIGINT` |

#### `duckdb_optimizers` {#docs:current:sql:meta:duckdb_table_functions::duckdb_optimizers}

The `duckdb_optimizers()` function provides metadata about the optimization rules (e.g., `expression_rewriter`, `filter_pushdown`) available in the DuckDB instance.
These can be selectively turned off using [`PRAGMA disabled_optimizers`](#docs:current:configuration:pragmas::selectively-disabling-optimizers).

| Column | Description | Type |
|:-|:---|:-|
| `name` | The name of the optimization rule. | `VARCHAR` |

#### `duckdb_prepared_statements` {#docs:current:sql:meta:duckdb_table_functions::duckdb_prepared_statements}

The `duckdb_prepared_statements()` function provides metadata about the [prepared statements](#docs:current:sql:query_syntax:prepared_statements) that exist in the current DuckDB session.

| Column | Description | Type |
|:-|:---|:-|
| `name` | The name of the prepared statement. | `VARCHAR` |
| `statement` | The SQL statement. | `VARCHAR` |
| `parameter_types` | The expected parameter types for the statement's parameters. Currently returns `UNKNOWN` for all parameters. | `VARCHAR[]` |
| `result_types` | The types of the columns in the table returned by the prepared statement. | `VARCHAR[]` |

#### `duckdb_profiling_settings` {#docs:current:sql:meta:duckdb_table_functions::duckdb_profiling_settings}

The `duckdb_profiling_settings()` macro returns the current profiling-related settings from `duckdb_settings()`.

| Column | Description | Type |
|:-|:---|:-|
| `name` | The name of the profiling setting. | `VARCHAR` |
| `value` | The current value of the setting. | `VARCHAR` |
| `description` | A description of the setting. | `VARCHAR` |

#### `duckdb_schemas` {#docs:current:sql:meta:duckdb_table_functions::duckdb_schemas}

The `duckdb_schemas()` function provides metadata about the schemas available in the DuckDB instance.

| Column | Description | Type |
|:-|:---|:-|
| `oid` | Internal identifier of the schema object. | `BIGINT` |
| `database_name` | The name of the database that contains this schema. | `VARCHAR` |
| `database_oid` | Internal identifier of the database containing the schema. | `BIGINT` |
| `schema_name` | The SQL name of the schema. | `VARCHAR` |
| `comment` | A comment created by the [`COMMENT ON` statement](#docs:current:sql:statements:comment_on). | `VARCHAR` |
| `tags` | A map of string key–value pairs. | `MAP(VARCHAR, VARCHAR)` |
| `internal` | `true` if this is an internal (built-in) schema, `false` if this is a user-defined schema. | `BOOLEAN` |
| `sql` | Always `NULL`| `VARCHAR` |

The [`information_schema.schemata`](#docs:current:sql:meta:information_schema::schemata-database-catalog-and-schema) system view provides a more standardized way to obtain metadata about database schemas.

#### `duckdb_secret_types` {#docs:current:sql:meta:duckdb_table_functions::duckdb_secret_types}

The `duckdb_secret_types()` lists secret types that are supported in the current DuckDB session.

| Column | Description | Type |
|:-|:---|:-|
| `type` | The name of the secret type, e.g., `s3`. | `VARCHAR` |
| `default_provider` | The default secret provider, e.g., `config`. | `VARCHAR` |
| `extension` | The extension that registered the secret type, e.g., `aws`. | `VARCHAR` |

#### `duckdb_secrets` {#docs:current:sql:meta:duckdb_table_functions::duckdb_secrets}

The `duckdb_secrets()` function provides metadata about the secrets available in the DuckDB instance.

| Column | Description | Type |
|:-|:---|:-|
| `name` | The name of the secret. | `VARCHAR` |
| `type` | The type of the secret, e.g., `S3`, `GCS`, `R2`, `AZURE`. | `VARCHAR` |
| `provider` | The provider of the secret. | `VARCHAR` |
| `persistent` | Denotes whether the secret is persistent. | `BOOLEAN` |
| `storage` | The backend for storing the secret. | `VARCHAR` |
| `scope` | The scope of the secret. | `VARCHAR[]` |
| `secret_string` | Returns the content of the secret as a string. Sensitive pieces of information, e.g., the access key, are redacted. | `VARCHAR` |

#### `duckdb_sequences` {#docs:current:sql:meta:duckdb_table_functions::duckdb_sequences}

The `duckdb_sequences()` function provides metadata about the sequences available in the DuckDB instance.

| Column | Description | Type |
|:-|:---|:-|
| `database_name` | The name of the database that contains this sequence | `VARCHAR` |
| `database_oid` | Internal identifier of the database containing the sequence. | `BIGINT` |
| `schema_name` | The SQL name of the schema that contains the sequence object. | `VARCHAR` |
| `schema_oid` | Internal identifier of the schema object that contains the sequence object. | `BIGINT` |
| `sequence_name` | The SQL name that identifies the sequence within the schema. | `VARCHAR` |
| `sequence_oid` | The internal identifier of this sequence object. | `BIGINT` |
| `comment` | A comment created by the [`COMMENT ON` statement](#docs:current:sql:statements:comment_on). | `VARCHAR` |
| `tags` | A map of string key–value pairs. | `MAP(VARCHAR, VARCHAR)` |
| `temporary` | Whether this sequence is temporary. Temporary sequences are transient and only visible within the current connection. | `BOOLEAN` |
| `start_value` | The initial value of the sequence. This value will be returned when `nextval()` is called for the very first time on this sequence. | `BIGINT` |
| `min_value` | The minimum value of the sequence. | `BIGINT` |
| `max_value` | The maximum value of the sequence. | `BIGINT` |
| `increment_by` | The value that is added to the current value of the sequence to draw the next value from the sequence. | `BIGINT` |
| `cycle` | Whether the sequence should start over when drawing the next value would result in a value outside the range. | `BOOLEAN` |
| `last_value` | `NULL` if no value was ever drawn from the sequence using `nextval(...)`. `1` if a value was drawn. | `BIGINT` |
| `sql` | The definition of this object, expressed as SQL DDL-statement. | `VARCHAR` |

Attributes like `temporary`, `start_value` etc. correspond to the various options available in the [`CREATE SEQUENCE`](#docs:current:sql:statements:create_sequence) statement and are documented there in full. Note that the attributes will always be filled out in the `duckdb_sequences` resultset, even if they were not explicitly specified in the `CREATE SEQUENCE` statement.

> 1. The column name `last_value` suggests that it contains the last value that was drawn from the sequence, but that is not the case. It's either `NULL` if a value was never drawn from the sequence, or `1` (when there was a value drawn, ever, from the sequence).
>
> 2. If the sequence cycles, then the sequence will start over from the boundary of its range, not necessarily from the value specified as start value.

#### `duckdb_settings` {#docs:current:sql:meta:duckdb_table_functions::duckdb_settings}

The `duckdb_settings()` function provides metadata about the settings available in the DuckDB instance.

| Column | Description | Type |
|:-|:---|:-|
| `name` | Name of the setting. | `VARCHAR` |
| `value` | Current value of the setting. | `VARCHAR` |
| `description` | A description of the setting. | `VARCHAR` |
| `input_type` | The logical data type of the setting's value. | `VARCHAR` |
| `scope` | The scope of the setting (` LOCAL` or `GLOBAL`). | `VARCHAR` |

The various settings are described in the [configuration page](#docs:current:configuration:overview).

#### `duckdb_tables` {#docs:current:sql:meta:duckdb_table_functions::duckdb_tables}

The `duckdb_tables()` function provides metadata about the base tables available in the DuckDB instance.

| Column | Description | Type |
|:-|:---|:-|
| `database_name` | The name of the database that contains this table | `VARCHAR` |
| `database_oid` | Internal identifier of the database containing the table. | `BIGINT` |
| `schema_name` | The SQL name of the schema that contains the base table. | `VARCHAR` |
| `schema_oid` | Internal identifier of the schema object that contains the base table. | `BIGINT` |
| `table_name` | The SQL name of the base table. | `VARCHAR` |
| `table_oid` | Internal identifier of the base table object. | `BIGINT` |
| `comment` | A comment created by the [`COMMENT ON` statement](#docs:current:sql:statements:comment_on). | `VARCHAR` |
| `tags` | A map of string key–value pairs. | `MAP(VARCHAR, VARCHAR)` |
| `internal` | `false` if this is a user-defined table. | `BOOLEAN` |
| `temporary` | Whether this is a temporary table. Temporary tables are not persisted and only visible within the current connection. | `BOOLEAN` |
| `has_primary_key` | `true` if this table object defines a `PRIMARY KEY`. | `BOOLEAN` |
| `estimated_size` | The estimated number of rows in the table. | `BIGINT` |
| `column_count` | The number of columns defined by this object. | `BIGINT` |
| `index_count` | The number of indexes associated with this table. This number includes all secondary indexes, as well as internal indexes generated to maintain `PRIMARY KEY` and/or `UNIQUE` constraints. | `BIGINT` |
| `check_constraint_count` | The number of check constraints active on columns within the table. | `BIGINT` |
| `sql` | The definition of this object, expressed as SQL [`CREATE TABLE`-statement](#docs:current:sql:statements:create_table). | `VARCHAR` |

The [`information_schema.tables`](#docs:current:sql:meta:information_schema::tables-tables-and-views) system view provides a more standardized way to obtain metadata about database tables that also includes views. But the resultset returned by `duckdb_tables` contains a few columns that are not included in `information_schema.tables`.

#### `duckdb_temporary_files` {#docs:current:sql:meta:duckdb_table_functions::duckdb_temporary_files}

The `duckdb_temporary_files()` function provides metadata about the temporary files DuckDB has written to disk, to offload data from memory. This function mostly exists for debugging and testing purposes.

| Column | Description | Type |
|:-|:---|:-|
| `path` | The name of the temporary file. | `VARCHAR` |
| `size` | The size in bytes of the temporary file. | `BIGINT` |

#### `duckdb_types` {#docs:current:sql:meta:duckdb_table_functions::duckdb_types}

The `duckdb_types()` function provides metadata about the data types available in the DuckDB instance.

| Column | Description | Type |
|:-|:---|:-|
| `database_name` | The name of the database that contains this schema. | `VARCHAR` |
| `database_oid` | Internal identifier of the database that contains the data type. | `BIGINT` |
| `schema_name` | The SQL name of the schema containing the type definition. Always `main`. | `VARCHAR` |
| `schema_oid` | Internal identifier of the schema object. | `BIGINT` |
| `type_name` | The name or alias of this data type. | `VARCHAR` |
| `type_oid` | The internal identifier of the data type object. If `NULL`, then this is an alias of the type (as identified by the value in the `logical_type` column). | `BIGINT` |
| `type_size` | The number of bytes required to represent a value of this type in memory. | `BIGINT` |
| `logical_type` | The 'canonical' name of this data type. The same `logical_type` may be referenced by several types having different `type_name`s. | `VARCHAR` |
| `type_category` | The category to which this type belongs. Data types within the same category generally expose similar behavior when values of this type are used in expressions. For example, the `NUMERIC` type_category includes integers, decimals and floating point numbers. | `VARCHAR` |
| `comment` | A comment created by the [`COMMENT ON` statement](#docs:current:sql:statements:comment_on). | `VARCHAR` |
| `tags` | A map of string key–value pairs. | `MAP(VARCHAR, VARCHAR)` |
| `internal` | Whether this is an internal (built-in) or a user object. | `BOOLEAN` |
| `labels` | Labels for categorizing types. Used for generating the documentation. | `VARCHAR[]` |

#### `duckdb_variables` {#docs:current:sql:meta:duckdb_table_functions::duckdb_variables}

The `duckdb_variables()` function provides metadata about the variables available in the DuckDB instance.

| Column | Description | Type |
|:-|:---|:-|
| `name` | The name of the variable, e.g., `x`. | `VARCHAR` |
| `value` | The value of the variable, e.g., `12`. | `VARCHAR` |
| `type` | The type of the variable, e.g., `INTEGER`. | `VARCHAR` |

#### `duckdb_views` {#docs:current:sql:meta:duckdb_table_functions::duckdb_views}

The `duckdb_views()` function provides metadata about the views available in the DuckDB instance.

| Column | Description | Type |
|:-|:---|:-|
| `database_name` | The name of the database that contains this view. | `VARCHAR` |
| `database_oid` | Internal identifier of the database that contains this view. | `BIGINT` |
| `schema_name` | The SQL name of the schema where the view resides. | `VARCHAR` |
| `schema_oid` | Internal identifier of the schema object that contains the view. | `BIGINT` |
| `view_name` | The SQL name of the view object. | `VARCHAR` |
| `view_oid` | The internal identifier of this view object. | `BIGINT` |
| `comment` | A comment created by the [`COMMENT ON` statement](#docs:current:sql:statements:comment_on). | `VARCHAR` |
| `tags` | A map of string key–value pairs. | `MAP(VARCHAR, VARCHAR)` |
| `internal` | `true` if this is an internal (built-in) view, `false` if this is a user-defined view. | `BOOLEAN` |
| `temporary` | `true` if this is a temporary view. Temporary views are not persistent and are only visible within the current connection. | `BOOLEAN` |
| `column_count` | The number of columns defined by this view object. | `BIGINT` |
| `sql` | The definition of this object, expressed as SQL DDL-statement. | `VARCHAR` |

The [`information_schema.tables`](#docs:current:sql:meta:information_schema::tables-tables-and-views) system view provides a more standardized way to obtain metadata about database views that also includes base tables. But the resultset returned by `duckdb_views` contains also definitions of internal view objects as well as a few columns that are not included in `information_schema.tables`.

## DuckDB's SQL Dialect {#sql:dialect}

### Overview {#docs:current:sql:dialect:overview}

DuckDB's SQL dialect is based on PostgreSQL.
DuckDB tries to closely match PostgreSQL's semantics, however, some use cases require slightly different behavior.
For example, interchangeability with data frame libraries necessitates [order preservation of inserts](#docs:current:sql:dialect:order_preservation) to be supported by default.
These differences are documented in the pages below.

### Indexing {#docs:current:sql:dialect:indexing}

DuckDB uses 1-based indexing except for [JSON objects](#docs:current:data:json:overview), which use 0-based indexing.

#### Examples {#docs:current:sql:dialect:indexing::examples}

The index origin is 1 for strings, lists, etc.

```sql
SELECT list[1] AS element
FROM (SELECT ['first', 'second', 'third'] AS list);
```

```text
┌─────────┐
│ element │
│ varchar │
├─────────┤
│ first   │
└─────────┘
```

The index origin is 0 for JSON objects.

```sql
SELECT json[1] AS element
FROM (SELECT '["first", "second", "third"]'::JSON AS json);
```

```text
┌──────────┐
│ element  │
│   json   │
├──────────┤
│ "second" │
└──────────┘
```

### Friendly SQL {#docs:current:sql:dialect:friendly_sql}

DuckDB offers several advanced SQL features and syntactic sugar to make SQL queries more concise. We refer to these colloquially as “friendly SQL”.

> Several of these features were first introduced by DuckDB, while some are inspired by other systems.
> Many of the features originally introduced by DuckDB (e.g., [`GROUP BY ALL`](#docs:current:sql:query_syntax:groupby::group-by-all)) have been since adapted by other systems.

> **Tip.** We have a [Friendly SQL 2026 Calendar](https://blobs.duckdb.org/merch/duckdb-friendly-sql-calendar-2026.pdf) with a short explanation, an example, and an abstract illustration for 12 friendly SQL features.

#### Clauses {#docs:current:sql:dialect:friendly_sql::clauses}

* Creating tables and inserting data:
    * [`CREATE OR REPLACE TABLE`](#docs:current:sql:statements:create_table::create-or-replace): avoid `DROP TABLE IF EXISTS` statements in scripts.
    * [`CREATE TABLE ... AS SELECT` (CTAS)](#docs:current:sql:statements:create_table::create-table--as-select-ctas): create a new table from the output of a table without manually defining a schema.
    * [`INSERT INTO ... BY NAME`](#docs:current:sql:statements:insert::insert-into--by-name): this variant of the `INSERT` statement allows using column names instead of positions.
    * [`INSERT OR IGNORE INTO ...`](#docs:current:sql:statements:insert::insert-or-ignore-into): insert the rows that do not result in a conflict due to `UNIQUE` or `PRIMARY KEY` constraints.
    * [`INSERT OR REPLACE INTO ...`](#docs:current:sql:statements:insert::insert-or-replace-into): insert the rows that do not result in a conflict due to `UNIQUE` or `PRIMARY KEY` constraints. For those that result in a conflict, replace the columns of the existing row to the new values of the to-be-inserted row.
* Describing tables and computing statistics:
    * [`DESCRIBE`](#docs:current:guides:meta:describe): provides a succinct summary of the schema of a table or query.
    * [`SUMMARIZE`](#docs:current:guides:meta:summarize): returns summary statistics for a table or query.
* Making SQL clauses more compact and readable:
    * [`FROM`-first syntax with an optional `SELECT` clause](#docs:current:sql:query_syntax:from::from-first-syntax): DuckDB allows queries in the form of `FROM tbl` which selects all columns (performing a `SELECT *` statement).
    * [`GROUP BY ALL`](#docs:current:sql:query_syntax:groupby::group-by-all): omit the group-by columns by inferring them from the list of attributes in the `SELECT` clause.
    * [`ORDER BY ALL`](#docs:current:sql:query_syntax:orderby::order-by-all): shorthand to order on all columns (e.g., to ensure deterministic results).
    * [`SELECT * EXCLUDE`](#docs:current:sql:expressions:star::exclude-clause): the `EXCLUDE` option allows excluding specific columns from the `*` expression.
    * [`SELECT * REPLACE`](#docs:current:sql:expressions:star::replace-clause): the `REPLACE` option allows replacing specific columns with different expressions in a `*` expression.
    * [`UNION BY NAME`](#docs:current:sql:query_syntax:setops::union-all-by-name): perform the `UNION` operation along the names of columns (instead of relying on positions).
    * [Prefix aliases in the `SELECT` and `FROM` clauses](#docs:current:sql:query_syntax:select): write `x: 42` instead of `42 AS x` for improved readability.
    * [Specifying a percentage of the table size for the `LIMIT` clause](#docs:current:sql:query_syntax:limit): write `LIMIT 10%` to return 10% of the query results.
* Transforming tables:
    * [`PIVOT`](#docs:current:sql:statements:pivot) to turn long tables to wide tables.
    * [`UNPIVOT`](#docs:current:sql:statements:unpivot) to turn wide tables to long tables.
* Defining SQL-level variables:
    * [`SET VARIABLE`](#docs:current:sql:statements:set_variable::set-variable)
    * [`RESET VARIABLE`](#docs:current:sql:statements:set_variable::reset-variable)

#### Query Features {#docs:current:sql:dialect:friendly_sql::query-features}

* [Column aliases in `WHERE`, `GROUP BY`, and `HAVING`](https://duckdb.org/2022/05/04/friendlier-sql#column-aliases-in-where--group-by--having). (Note that column aliases cannot be used in the `ON` clause of [`JOIN` clauses](#docs:current:sql:query_syntax:from::joins).)
* [`COLUMNS()` expression](#docs:current:sql:expressions:star::columns-expression) can be used to execute the same expression on multiple columns:
    * [with regular expressions](https://duckdb.org/2023/08/23/even-friendlier-sql#columns-with-regular-expressions)
    * [with `EXCLUDE` and `REPLACE`](https://duckdb.org/2023/08/23/even-friendlier-sql#columns-with-exclude-and-replace)
    * [with lambda functions](https://duckdb.org/2023/08/23/even-friendlier-sql#columns-with-lambda-functions)
* Reusable column aliases (also known as “lateral column aliases”), e.g.: `SELECT i + 1 AS j, j + 2 AS k FROM range(0, 3) t(i)`
* Advanced aggregation features for analytical (OLAP) queries:
    * [`FILTER` clause](#docs:current:sql:query_syntax:filter)
    * [`GROUPING SETS`, `GROUP BY CUBE`, `GROUP BY ROLLUP` clauses](#docs:current:sql:query_syntax:grouping_sets)
* [`count()` shorthand](#docs:current:sql:functions:aggregates) for `count(*)`
* [`IN` operator for lists and maps](#docs:current:sql:expressions:in)
* [Specifying column names for common table expressions (` WITH`)](#docs:current:sql:query_syntax:with::basic-cte-examples)
* [Specifying column names in the `JOIN` clause](#docs:current:sql:query_syntax:from::shorthands-in-the-join-clause)
* [Using `VALUES` in the `JOIN` clause](#docs:current:sql:query_syntax:from::shorthands-in-the-join-clause)
* [Using `VALUES` in the anchor part of common table expressions](#docs:current:sql:query_syntax:with::using-values)
* [`SWITCH` statements as syntactic sugar for the `CASE` expression](#docs:current:sql:expressions:case::switch-expression)

#### Literals and Identifiers {#docs:current:sql:dialect:friendly_sql::literals-and-identifiers}

* [Case-insensitivity while maintaining case of entities in the catalog](#docs:current:sql:dialect:keywords_and_identifiers::case-sensitivity-of-identifiers)
* [Deduplicating identifiers](#docs:current:sql:dialect:keywords_and_identifiers::deduplicating-identifiers)
* [Underscores as digit separators in numeric literals](#docs:current:sql:data_types:literal_types::underscores-in-numeric-literals)

#### Data Types {#docs:current:sql:dialect:friendly_sql::data-types}

* [`MAP` data type](#docs:current:sql:data_types:map)
* [`UNION` data type](#docs:current:sql:data_types:union)

#### Data Import {#docs:current:sql:dialect:friendly_sql::data-import}

* [Auto-detecting the headers and schema of CSV files](#docs:current:data:csv:auto_detection)
* Directly querying [CSV files](#docs:current:data:csv:overview) and [Parquet files](#docs:current:data:parquet:overview)
* [Replacement scans](#docs:current:guides:glossary):
    * You can load from files using the syntax `FROM 'my.csv'`, `FROM 'my.csv.gz'`, `FROM 'my.parquet'`, etc.
    * In Python, you can [access Pandas data frames using `FROM df`](#docs:current:guides:python:export_pandas).
* [Filename expansion (globbing)](#docs:current:sql:functions:pattern_matching::globbing), e.g.: `FROM 'my-data/part-*.parquet'`

#### Functions and Expressions {#docs:current:sql:dialect:friendly_sql::functions-and-expressions}

* [Dot operator for function chaining](#docs:current:sql:functions:overview::function-chaining-via-the-dot-operator): `SELECT ('hello').upper()`
* String formatters:
    the [`format()` function with the `fmt` syntax](#docs:current:sql:functions:text::fmt-syntax) and
    the [`printf() function`](#docs:current:sql:functions:text::printf-syntax)
* [List comprehensions](https://duckdb.org/2023/08/23/even-friendlier-sql#list-comprehensions)
* [List slicing](https://duckdb.org/2022/05/04/friendlier-sql#string-slicing) and indexing from the back (` [-1]`)
* [String slicing](https://duckdb.org/2022/05/04/friendlier-sql#string-slicing)
* [`STRUCT.*` notation](https://duckdb.org/2022/05/04/friendlier-sql#struct-dot-notation)
* [Creating `LIST` using square brackets](#docs:current:sql:data_types:list::creating-lists)
* [Simple `LIST` and `STRUCT` creation](https://duckdb.org/2022/05/04/friendlier-sql#simple-list-and-struct-creation)
* [Updating the schema of `STRUCT`s](#docs:current:sql:data_types:struct::updating-the-schema)

#### Join Types {#docs:current:sql:dialect:friendly_sql::join-types}

* [`ASOF` joins](#docs:current:sql:query_syntax:from::as-of-joins)
* [`LATERAL` joins](#docs:current:sql:query_syntax:from::lateral-joins)
* [`POSITIONAL` joins](#docs:current:sql:query_syntax:from::positional-joins)

#### Trailing Commas {#docs:current:sql:dialect:friendly_sql::trailing-commas}

DuckDB allows [trailing commas](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Trailing_commas),
both when listing entities (e.g., column and table names) and when constructing [`LIST` items](#docs:current:sql:data_types:list::creating-lists).
For example, the following query works:

```sql
SELECT
    42 AS x,
    ['a', 'b', 'c',] AS y,
    'hello world' AS z,
;
```

#### "Top-N in Group" Queries {#docs:current:sql:dialect:friendly_sql::top-n-in-group-queries}

Computing the "top-N rows in a group" ordered by some criteria is a common task in SQL that unfortunately often requires a complex query involving window functions and/or subqueries.

To aid in this, DuckDB provides the aggregate functions [`max(arg, n)`](#docs:current:sql:functions:aggregates::maxarg-n), [`min(arg, n)`](#docs:current:sql:functions:aggregates::minarg-n), [`arg_max(arg, val, n)`](#docs:current:sql:functions:aggregates::arg_maxarg-val-n), [`arg_min(arg, val, n)`](#docs:current:sql:functions:aggregates::arg_minarg-val-n), [`max_by(arg, val, n)`](#docs:current:sql:functions:aggregates::max_byarg-val-n) and [`min_by(arg, val, n)`](#docs:current:sql:functions:aggregates::min_byarg-val-n) to efficiently return the "top" `n` rows in a group based on a specific column in either ascending or descending order.

For example, let's use the following table:

```sql
SELECT * FROM t1;
```

```text
┌─────────┬───────┐
│   grp   │  val  │
│ varchar │ int32 │
├─────────┼───────┤
│ a       │     2 │
│ a       │     1 │
│ b       │     5 │
│ b       │     4 │
│ a       │     3 │
│ b       │     6 │
└─────────┴───────┘
```

We want to get a list of the top-3 `val` values in each group `grp`. The conventional way to do this is to use a window function in a subquery:

```sql
SELECT array_agg(rs.val), rs.grp
FROM
    (SELECT val, grp, row_number() OVER (PARTITION BY grp ORDER BY val DESC) AS rid
    FROM t1 ORDER BY val DESC) AS rs
WHERE rid < 4
GROUP BY rs.grp;
```

```text
┌───────────────────┬─────────┐
│ array_agg(rs.val) │   grp   │
│      int32[]      │ varchar │
├───────────────────┼─────────┤
│ [3, 2, 1]         │ a       │
│ [6, 5, 4]         │ b       │
└───────────────────┴─────────┘
```

But in DuckDB, we can do this much more concisely (and efficiently!):

```sql
SELECT max(val, 3) FROM t1 GROUP BY grp;
```

```text
┌─────────────┐
│ max(val, 3) │
│   int32[]   │
├─────────────┤
│ [3, 2, 1]   │
│ [6, 5, 4]   │
└─────────────┘
```

#### Related Blog Posts {#docs:current:sql:dialect:friendly_sql::related-blog-posts}

* [“Friendlier SQL with DuckDB”](https://duckdb.org/2022/05/04/friendlier-sql) blog post
* [“Even Friendlier SQL with DuckDB”](https://duckdb.org/2023/08/23/even-friendlier-sql) blog post
* [“SQL Gymnastics: Bending SQL into Flexible New Shapes”](https://duckdb.org/2024/03/01/sql-gymnastics) blog post

### Keywords and Identifiers {#docs:current:sql:dialect:keywords_and_identifiers}

#### Identifiers {#docs:current:sql:dialect:keywords_and_identifiers::identifiers}

Similarly to other SQL dialects and programming languages, identifiers in DuckDB's SQL are subject to several rules.

* Unquoted identifiers need to conform to a number of rules:
    * They must not be a reserved keyword (see [`duckdb_keywords()`](#docs:current:sql:meta:duckdb_table_functions::duckdb_keywords)), e.g., `SELECT 123 AS SELECT` will fail.
    * They must not start with a number or special character, e.g., `SELECT 123 AS 1col` is invalid.
    * They cannot contain whitespaces (including tabs and newline characters).
* Identifiers can be quoted using double-quote characters (` "`). Quoted identifiers can use any keyword, whitespace or special character, e.g., `"SELECT"` and `" § 🦆 ¶ "` are valid identifiers.
* Double quotes can be escaped by repeating the quote character, e.g., to create an identifier named `IDENTIFIER "X"`, use `"IDENTIFIER ""X"""`.

##### Deduplicating Identifiers {#docs:current:sql:dialect:keywords_and_identifiers::deduplicating-identifiers}

In some cases, duplicate identifiers can occur, e.g., column names may conflict when unnesting a nested data structure.
In these cases, DuckDB automatically deduplicates column names by renaming them according to the following rules:

* For a column named `⟨name⟩`{:.language-sql .highlight}, the first instance is not renamed.
* Subsequent instances are renamed to `⟨name⟩_⟨count⟩`{:.language-sql .highlight}, where `⟨count⟩`{:.language-sql .highlight} starts at 1.

For example:

```sql
SELECT *
FROM (SELECT unnest({'a': 42, 'b': {'a': 88, 'b': 99}}, recursive := true));
```

| a  | a_1 | b  |
|---:|----:|---:|
| 42 | 88  | 99 |

#### Database Names {#docs:current:sql:dialect:keywords_and_identifiers::database-names}

Database names are subject to the rules for [identifiers](#::identifiers).

Additionally, it is best practice to avoid DuckDB's two internal [database schema names](#docs:current:sql:meta:duckdb_table_functions::duckdb_databases), `system` and `temp`.
By default, persistent databases are named after their filename without the extension.
Therefore, the filenames `system.db` and `temp.db` (as well as `system.duckdb` and `temp.duckdb`) result in the database names `system` and `temp`, respectively.
If you need to attach to a database that has one of these names, use an alias, e.g.:

```sql
ATTACH 'temp.db' AS temp2;
USE temp2;
```

#### Rules for Case-Sensitivity {#docs:current:sql:dialect:keywords_and_identifiers::rules-for-case-sensitivity}

##### Keywords and Function Names {#docs:current:sql:dialect:keywords_and_identifiers::keywords-and-function-names}

SQL keywords and function names are case-insensitive in DuckDB.

For example, the following two queries are equivalent:

```matlab
select COS(Pi()) as CosineOfPi;
SELECT cos(pi()) AS CosineOfPi;
```

| CosineOfPi |
|-----------:|
| -1.0       |

##### Case-Sensitivity of Identifiers {#docs:current:sql:dialect:keywords_and_identifiers::case-sensitivity-of-identifiers}

Identifiers in DuckDB are always case-insensitive, similarly to PostgreSQL.
However, unlike PostgreSQL (and some other major SQL implementations), DuckDB also treats quoted identifiers as case-insensitive.

**Comparison of identifiers:**
Case-insensitivity is implemented using an ASCII-based comparison:
`col_A` and `col_a` are equal but `col_á` is not equal to them.

```sql
SELECT col_A FROM (SELECT 'x' AS col_a); -- succeeds
SELECT col_á FROM (SELECT 'x' AS col_a); -- fails
```

**Preserving cases:**
While DuckDB treats identifiers in a case-insensitive manner, it preserves the cases of these identifiers.
That is, each character's case (uppercase/lowercase) is maintained as originally specified by the user even if a query uses different cases when referring to the identifier.
For example:

```sql
CREATE TABLE tbl AS SELECT cos(pi()) AS CosineOfPi;
SELECT cosineofpi FROM tbl;
```

| CosineOfPi |
|-----------:|
| -1.0       |

To change this behavior, set the `preserve_identifier_case` [configuration option](#docs:current:configuration:overview::configuration-reference) to `false`.

##### Case-Sensitivity of Keys in Nested Data Structures {#docs:current:sql:dialect:keywords_and_identifiers::case-sensitivity-of-keys-in-nested-data-structures}

The keys of `MAP`s are case-sensitive:

```sql
SELECT MAP(['key1'], [1]) = MAP(['KEY1'], [1]) AS equal;
```

```text
false
```

The keys of `UNION`s and `STRUCT`s are case-insensitive:

```sql
SELECT {'key1': 1} = {'KEY1': 1} AS equal;
```

```text
true
```

```sql
SELECT union_value(key1 := 1) = union_value(KEY1 := 1) as equal;
```

```text
true
```

###### Handling Conflicts {#docs:current:sql:dialect:keywords_and_identifiers::handling-conflicts}

In case of a conflict, when the same identifier is spelt with different cases, one will be selected randomly. For example:

```sql
CREATE TABLE t1 (idfield INTEGER, x INTEGER);
CREATE TABLE t2 (IdField INTEGER, y INTEGER);
INSERT INTO t1 VALUES (1, 123);
INSERT INTO t2 VALUES (1, 456);
SELECT * FROM t1 NATURAL JOIN t2;
```

| idfield |  x  |  y  |
|--------:|----:|----:|
| 1       | 123 | 456 |

###### Disabling Preserving Cases {#docs:current:sql:dialect:keywords_and_identifiers::disabling-preserving-cases}

With the `preserve_identifier_case` [configuration option](#docs:current:configuration:overview::configuration-reference) set to `false`, all identifiers are turned into lowercase:

```sql
SET preserve_identifier_case = false;
CREATE TABLE tbl AS SELECT cos(pi()) AS CosineOfPi;
SELECT CosineOfPi FROM tbl;
```

| cosineofpi |
|-----------:|
| -1.0       |

### Order Preservation {#docs:current:sql:dialect:order_preservation}

For many operations, DuckDB preserves the order of rows, similarly to data frame libraries such as Pandas.

#### Example {#docs:current:sql:dialect:order_preservation::example}

Take the following table for example:

```sql
CREATE TABLE tbl AS
    SELECT *
    FROM (VALUES (1, 'a'), (2, 'b'), (3, 'c')) t(x, y);

SELECT *
FROM tbl;
```

| x | y |
|--:|---|
| 1 | a |
| 2 | b |
| 3 | c |

Let's take the following query that returns the rows where `x` is an odd number:

```sql
SELECT *
FROM tbl
WHERE x % 2 == 1;
```

| x | y |
|--:|---|
| 1 | a |
| 3 | c |

Because the row `(1, 'a')` occurs before `(3, 'c')` in the original table, it is guaranteed to come before that row in this table too.

#### Clauses {#docs:current:sql:dialect:order_preservation::clauses}

The following clauses guarantee that the original row order is preserved:

* `COPY` (see [Insertion Order](#::insertion-order))
* `FROM` with a single table
* `LIMIT`
* `OFFSET`
* `SELECT`
* `UNION ALL`
* `WHERE`
* Window functions with an empty `OVER` clause
* Common table expressions and table subqueries as long as they only contain the aforementioned components

> **Tip.** `row_number() OVER ()` allows turning the original row order into an explicit column that can be referenced in the operations that don't preserve row order by default. On materialized tables, the `rowid` pseudo-column can be used to the same effect.

The following operations **do not** guarantee that the row order is preserved:

* `FROM` with multiple tables and/or subqueries
* `JOIN`
* `UNION`
* `USING SAMPLE`
* Whole-table aggregation (the input order, that is, the order in which rows are fed into [order-sensitive aggregate functions](https://duckdb.org/docs/sql/functions/aggregates.html#order-by-clause-in-aggregate-functions) is not guaranteed unless explicitly specified in the aggregate function)
* `GROUP BY` (neither in- nor output order are guaranteed)
* `ORDER BY` (specifically, `ORDER BY` may not use a [stable algorithm](https://en.m.wikipedia.org/wiki/Stable_algorithm))
* Scalar subqueries

#### Insertion Order {#docs:current:sql:dialect:order_preservation::insertion-order}

By default, the following components preserve insertion order:

* [CSV reader](#docs:current:data:csv:overview::order-preservation) (` read_csv` function)
* [JSON reader](#docs:current:data:json:overview::order-preservation) (` read_json` function)
* [Parquet reader](#docs:current:data:parquet:overview::order-preservation) (` read_parquet` function)

Preservation of insertion order is controlled by the `preserve_insertion_order` [configuration option](#docs:current:configuration:overview).
This setting is `true` by default, indicating that the order should be preserved.
To change this setting, use:

```sql
SET preserve_insertion_order = false;
```

### PostgreSQL Compatibility {#docs:current:sql:dialect:postgresql_compatibility}

DuckDB's SQL dialect closely follows the conventions of the PostgreSQL dialect.
The few exceptions to this are listed on this page.

#### Floating-Point Arithmetic {#docs:current:sql:dialect:postgresql_compatibility::floating-point-arithmetic}

DuckDB and PostgreSQL handle floating-point arithmetic differently for division by zero. DuckDB conforms to the [IEEE Standard for Floating-Point Arithmetic (IEEE 754)](https://en.wikipedia.org/wiki/IEEE_754) for both division by zero and operations involving infinity values. PostgreSQL returns an error for division by zero but aligns with IEEE 754 for handling infinity values. To show the differences, run the following SQL queries:

```sql
SELECT 1.0 / 0.0 AS x;
SELECT 0.0 / 0.0 AS x;
SELECT -1.0 / 0.0 AS x;
SELECT 'Infinity'::FLOAT / 'Infinity'::FLOAT AS x;
SELECT 1.0 / 'Infinity'::FLOAT AS x;
SELECT 'Infinity'::FLOAT - 'Infinity'::FLOAT AS x;
SELECT 'Infinity'::FLOAT - 1.0 AS x;
```



| Expression              | PostgreSQL |    DuckDB |  IEEE 754 |
| :---------------------- | ---------: | --------: | --------: |
| 1.0 / 0.0               |      error |  Infinity |  Infinity |
| 0.0 / 0.0               |      error |       NaN |       NaN |
| -1.0 / 0.0              |      error | -Infinity | -Infinity |
| 'Infinity' / 'Infinity' |        NaN |       NaN |       NaN |
| 1.0 / 'Infinity'        |        0.0 |       0.0 |       0.0 |
| 'Infinity' - 'Infinity' |        NaN |       NaN |       NaN |
| 'Infinity' - 1.0        |   Infinity |  Infinity |  Infinity |

#### Division on Integers {#docs:current:sql:dialect:postgresql_compatibility::division-on-integers}

When computing division on integers, PostgreSQL performs integer division, while DuckDB performs float division:

```sql
SELECT 1 / 2 AS x;
```

PostgreSQL returns `0`, while DuckDB returns `0.5`.

To perform integer division in DuckDB, use the `//` operator:

```sql
SELECT 1 // 2 AS x;
```

This returns `0`.

#### `UNION` of Boolean and Integer Values {#docs:current:sql:dialect:postgresql_compatibility::union-of-boolean-and-integer-values}

The following query fails in PostgreSQL but successfully completes in DuckDB:

```sql
SELECT true AS x
UNION
SELECT 2;
```

PostgreSQL returns an error:

```console
ERROR:  UNION types boolean and integer cannot be matched
```

DuckDB performs an enforced cast, therefore, it completes the query and returns the following:

|    x |
| ---: |
|    1 |
|    2 |

#### Implicit Casting on Equality Checks {#docs:current:sql:dialect:postgresql_compatibility::implicit-casting-on-equality-checks}

DuckDB performs implicit casting on equality checks, e.g., converting strings to numeric and boolean values.
Therefore, there are several instances, where PostgreSQL throws an error while DuckDB successfully computes the result:



| Expression    | PostgreSQL | DuckDB |
| :------------ | ---------- | ------ |
| '1.1' = 1     | error      | true   |
| '1.1' = 1.1   | true       | true   |
| 1 = 1.1       | false      | false  |
| true = 'true' | true       | true   |
| true = 1      | error      | true   |
| 'true' = 1    | error      | error  |

#### Case Sensitivity for Quoted Identifiers {#docs:current:sql:dialect:postgresql_compatibility::case-sensitivity-for-quoted-identifiers}

PostgreSQL is case-insensitive. The way PostgreSQL achieves case insensitivity is by lowercasing unquoted identifiers within SQL, whereas quoting preserves case, e.g., the following command creates a table named `mytable` but tries to query for `MyTaBLe` because quotes preserve the case.

```sql
CREATE TABLE MyTaBLe (x INTEGER);
SELECT * FROM "MyTaBLe";
```

```console
ERROR:  relation "MyTaBLe" does not exist
```

PostgreSQL does not only treat quoted identifiers as case-sensitive; it treats all identifiers as case-sensitive, e.g., this also does not work:

```sql
CREATE TABLE "PreservedCase" (x INTEGER);
SELECT * FROM PreservedCase;
```

```console
ERROR:  relation "preservedcase" does not exist
```

Therefore, case-insensitivity in PostgreSQL only works if you never use quoted identifiers with different cases.

For DuckDB, this behavior was problematic when interfacing with other tools (e.g., Parquet, Pandas) that are case-sensitive by default – since all identifiers would be lowercased all the time.
Therefore, DuckDB achieves case insensitivity by making identifiers fully case insensitive throughout the system but [_preserving their case_](#docs:current:sql:dialect:keywords_and_identifiers::rules-for-case-sensitivity).

In DuckDB, the scripts above complete successfully:

```sql
CREATE TABLE MyTaBLe (x INTEGER);
SELECT * FROM "MyTaBLe";
CREATE TABLE "PreservedCase" (x INTEGER);
SELECT * FROM PreservedCase;
SELECT tbl FROM duckdb_tables();
```



| tbl    |
| ------------- |
| MyTaBLe       |
| PreservedCase |

PostgreSQL's behavior of lowercasing identifiers is accessible using the [`preserve_identifier_case` option](#docs:current:configuration:overview::local-configuration-options):

```sql
SET preserve_identifier_case = false;
CREATE TABLE MyTaBLe (x INTEGER);
SELECT tbl FROM duckdb_tables();
```



| tbl |
| ---------- |
| mytable    |

However, the case insensitive matching in the system for identifiers cannot be turned off.

#### Using Double Equality Sign for Comparison {#docs:current:sql:dialect:postgresql_compatibility::using-double-equality-sign-for-comparison}

DuckDB supports both `=` and `==` for equality comparison, while PostgreSQL only supports `=`.

```sql
SELECT 1 == 1 AS t;
```

DuckDB returns `true`, while PostgreSQL returns:

```console
postgres=# SELECT 1 == 1 AS t;
ERROR:  operator does not exist: integer == integer
LINE 1: SELECT 1 == 1 AS t;
```

Note that the use of `==` is not encouraged due to its limited portability.

#### Vacuuming Tables {#docs:current:sql:dialect:postgresql_compatibility::vacuuming-tables}

In PostgreSQL, the `VACUUM` statement garbage collects tables and analyzes tables.
In DuckDB, the [`VACUUM` statement](#docs:current:sql:statements:vacuum) is only used to rebuild statistics.
For instruction on reclaiming space, refer to the [“Reclaiming space” page](#docs:current:operations_manual:footprint_of_duckdb:reclaiming_space).

#### Strings {#docs:current:sql:dialect:postgresql_compatibility::strings}

Since version 1.3.0, DuckDB escapes characters such as `'` in strings serialized in nested data structures.
PostgreSQL does not do this.

For an example, run:

```sql
SELECT ARRAY[''''];
```

PostgreSQL returns:

```text
{'}
```

DuckDB returns:

```text
['\'']
```

#### Functions {#docs:current:sql:dialect:postgresql_compatibility::functions}

##### `regexp_extract` Function {#docs:current:sql:dialect:postgresql_compatibility::regexp_extract-function}

Unlike PostgreSQL's `regexp_substr` function, DuckDB's `regexp_extract` returns empty strings instead of `NULL`s when there is no match.

##### `to_date` Function {#docs:current:sql:dialect:postgresql_compatibility::to_date-function}

DuckDB does not support the [`to_date` PostgreSQL date formatting function](https://www.postgresql.org/docs/17/functions-formatting.html).
Instead, please use the [`strptime` function](#docs:current:sql:functions:dateformat::strptime-examples).

##### `date_part` Function {#docs:current:sql:dialect:postgresql_compatibility::date_part-function}

Most parts extracted by the [`date_part` function](#docs:current:sql:functions:datepart) are returned as integers. Since there are no infinite integer values in DuckDB, `NULL`s are returned for infinite timestamps.

#### Resolution of Type Names in the Schema {#docs:current:sql:dialect:postgresql_compatibility::resolution-of-type-names-in-the-schema}

For [`CREATE TABLE` statements](#docs:current:sql:statements:create_table), DuckDB attempts to resolve type names in the schema where a table is created. For example:

```sql
CREATE SCHEMA myschema;
CREATE TYPE myschema.mytype AS ENUM ('as', 'df');
CREATE TABLE myschema.mytable (v mytype);
```

PostgreSQL returns an error on the last statement:

```console
ERROR:  type "mytype" does not exist
LINE 1: CREATE TABLE myschema.mytable (v mytype);
```

DuckDB runs the statement and creates the table successfully, confirmed by the following query:

```sql
DESCRIBE myschema.mytable;
```



| column_name | column_type      | null | key  | default | extra |
| ----------- | ---------------- | ---- | ---- | ------- | ----- |
| v           | ENUM('as', 'df') | YES  | NULL | NULL    | NULL  |

#### Exploiting Functional Dependencies for `GROUP BY` {#docs:current:sql:dialect:postgresql_compatibility::exploiting-functional-dependencies-for-group-by}

PostgreSQL can exploit functional dependencies, such as `i -> j` in the following query:

```sql
CREATE TABLE tbl (i INTEGER, j INTEGER, PRIMARY KEY (i));
SELECT j
FROM tbl
GROUP BY i;
```

PostgreSQL runs the query.

DuckDB fails:

```console
Binder Error:
column "j" must appear in the GROUP BY clause or must be part of an aggregate function.
Either add it to the GROUP BY list, or use "ANY_VALUE(j)" if the exact value of "j" is not important.
```

To work around this, add the other attributes or use the [`GROUP BY ALL` clause](https://duckdb.org/docs/sql/query_syntax/groupby#group-by-all).

#### Behavior of Regular Expression Match Operators {#docs:current:sql:dialect:postgresql_compatibility::behavior-of-regular-expression-match-operators}

PostgreSQL supports the [POSIX regular expression matching operators](#docs:current:sql:functions:pattern_matching) `~` (case-sensitive partial regex matching) and `~*` (case-insensitive partial regex matching) as well as their negated variants, `!~` and `!~*`, respectively.

In DuckDB, `~` is equivalent to [`regexp_full_match`](#docs:current:sql:functions:text::regexp_full_matchstring-regex) and `!~` is equivalent to `NOT regexp_full_match`.
The operators `~*` and `!~*` are not supported.

The table below shows that the correspondence between these functions in PostgreSQL and DuckDB is almost non-existent.
Avoid using the POSIX regular expression matching operators in DuckDB.





| Expression          | PostgreSQL | DuckDB |
| :------------------ | ---------- | ------ |
| `'aaa' ~ '(a|b)'`   | true       | false  |
| `'AAA' ~* '(a|b)'`  | true       | error  |
| `'aaa' !~ '(a|b)'`  | false      | true   |
| `'AAA' !~* '(a|b)'` | false      | error  |


### SQL Quirks {#docs:current:sql:dialect:sql_quirks}

Like all programming languages and libraries, DuckDB has its share of idiosyncrasies and inconsistencies.  
Some are vestiges of our feathered friend's evolution; others are inevitable because we strive to adhere to the [SQL Standard](https://blog.ansi.org/sql-standard-iso-iec-9075-2023-ansi-x3-135/) and specifically to PostgreSQL's dialect (see the [“PostgreSQL Compatibility”](#docs:current:sql:dialect:postgresql_compatibility) page for exceptions).
The rest may simply come down to different preferences, or we may even agree on what _should_ be done but just haven’t gotten around to it yet.

Acknowledging these quirks is the best we can do, which is why we have compiled below a list of examples.

#### Aggregating Empty Groups {#docs:current:sql:dialect:sql_quirks::aggregating-empty-groups}

On empty groups, the aggregate functions `sum`, `list`, and `string_agg` all return `NULL` instead of `0`, `[]` and `''`, respectively. This is dictated by the SQL Standard and obeyed by all SQL implementations we know. This behavior is inherited by the list aggregate [`list_sum`](#docs:current:sql:functions:list::list_-rewrite-functions), but not by the DuckDB original [`list_dot_product`](#docs:current:sql:functions:list::list_dot_productlist1-list2) which returns `0` on empty lists.

#### 0 vs. 1-Based Indexing {#docs:current:sql:dialect:sql_quirks::0-vs-1-based-indexing}

To comply with standard SQL, one-based indexing is used almost everywhere, e.g., array and string indexing and slicing, and window functions (` row_number`, `rank`, `dense_rank`). However, similarly to PostgreSQL, [JSON features use a zero-based indexing](#docs:current:data:json:overview::indexing).

#### Types {#docs:current:sql:dialect:sql_quirks::types}

##### `UINT8` vs. `INT8` {#docs:current:sql:dialect:sql_quirks::uint8-vs-int8}

`UINT8` and `INT8` are aliases to integer types of different widths:

* `UINT8` corresponds to `UTINYINT` because it's an _8-bit_ unsigned integer
* `INT8` corresponds to `BIGINT` because it's an _8-byte_ signed integer

Explanation: the `n` in the numeric type `INTn` and `UINTn` denote the width of the number in either bytes or bits.
`INT1`, `INT2`, `INT4` correspond to the number of bytes, while `INT16`, `INT32` and `INT64` correspond to the number of bits.
The same applies to `UINT` values.
However, the value `n = 8` is a valid choice for both the number of bits and bytes.
For unsigned values, `UINT8` corresponds to `UTINYINT` (8 bits).
For signed values, `INT8` corresponds to `BIGINT` (8 bytes).

#### Expressions {#docs:current:sql:dialect:sql_quirks::expressions}

##### Results That May Surprise You {#docs:current:sql:dialect:sql_quirks::results-that-may-surprise-you}



| Expression                 | Result  | Note                                                                          |
|----------------------------|---------|-------------------------------------------------------------------------------|
| `-2^2`                     | `4.0`   | PostgreSQL compatibility means the unary minus has higher precedence than the exponentiation operator. Use additional parentheses, e.g., `-(2^2)` or the [`pow` function](#docs:current:sql:functions:numeric::powx-y), e.g., `-pow(2, 2)`, to avoid mistakes. |
| `'t' = true`               | `true`  | Compatible with PostgreSQL.                                                   |
| `1 = '1'`                  | `true`  | Compatible with PostgreSQL.                                                   |
| `1 = ' 1'`                 | `true`  | Compatible with PostgreSQL.                                                   |
| `1 = '01'`                 | `true`  | Compatible with PostgreSQL.                                                   |
| `1 = ' 01 '`               | `true`  | Compatible with PostgreSQL.                                                   |
| `1 = true`                 | `true`  | Not compatible with PostgreSQL.                                               |
| `1 = '1.1'`                | `true`  | Not compatible with PostgreSQL.                                               |
| `1 IN (0, NULL)`           | `NULL`  | Makes sense if you think of the `NULL`s in the input and output as `UNKNOWN`. |
| `1 in [0, NULL]`           | `false` |                                                                               |
| `concat('abc', NULL)`      | `abc`   | Compatible with PostgreSQL. `list_concat` behaves similarly.                  |
| `'abc' || NULL`            | `NULL`  |                                                                               |



##### `NaN` Values {#docs:current:sql:dialect:sql_quirks::nan-values}

`'NaN'::FLOAT = 'NaN'::FLOAT` and `'NaN'::FLOAT > 3` violate IEEE-754 but mean floating point data types have a total order, like all other data types (beware the consequences for `greatest` / `least`).

##### `age` Function {#docs:current:sql:dialect:sql_quirks::age-function}

`age(x)` is `current_date - x` instead of `current_timestamp - x`. Another quirk inherited from PostgreSQL.

##### Extract Functions {#docs:current:sql:dialect:sql_quirks::extract-functions}

`list_extract` / `map_extract` return `NULL` on non-existing keys. `struct_extract` throws an error because keys of structs are like columns.

#### Clauses {#docs:current:sql:dialect:sql_quirks::clauses}

##### Automatic Column Deduplication in `SELECT` {#docs:current:sql:dialect:sql_quirks::automatic-column-deduplication-in-select}

Column names are deduplicated with the first occurrence shadowing the others:

```sql
CREATE TABLE tbl AS SELECT 1 AS a;
SELECT a FROM (SELECT *, 2 AS a FROM tbl);
```

| a |
|--:|
| 1 |

##### Case Insensitivity for `SELECT`ing Columns {#docs:current:sql:dialect:sql_quirks::case-insensitivity-for-selecting-columns}

Due to case-insensitivity, it's not possible to use `SELECT a FROM 'file.parquet'` when a column called `A` appears before the desired column `a` in `file.parquet`.

##### `USING SAMPLE` {#docs:current:sql:dialect:sql_quirks::using-sample}

The `USING SAMPLE` clause is syntactically placed after the `WHERE` and `GROUP BY` clauses (same as the `LIMIT` clause) but is semantically applied before both (unlike the `LIMIT` clause).

## Samples {#docs:current:sql:samples}

Samples are used to randomly select a subset of a dataset.

#### Examples {#docs:current:sql:samples::examples}

Select a sample of exactly 5 rows from `tbl` using `reservoir` sampling:

```sql
SELECT *
FROM tbl
USING SAMPLE 5;
```

Select a sample of *approximately* 10% of the table using `system` sampling:

```sql
SELECT *
FROM tbl
USING SAMPLE 10%;
```

> **Warning.** By default, when you specify a percentage, each [*vector*](#docs:current:internals:vector) is included in the sample with that probability. If your table contains fewer than ~10k rows, it makes sense to specify the `bernoulli` sampling option instead, which applies the probability to each row independently. Even then, you'll sometimes get more and sometimes less than the specified percentage of the number of rows, but it is much less likely that you get no rows at all. To get exactly 10% of rows (up to rounding), you must use the `reservoir` sampling option.

Select a sample of *approximately* 10% of the table using `bernoulli` sampling:

```sql
SELECT *
FROM tbl
USING SAMPLE 10 PERCENT (bernoulli);
```

Select a sample of *exactly* 10% (up to rounding) of the table using `reservoir` sampling:

```sql
SELECT *
FROM tbl
USING SAMPLE 10 PERCENT (reservoir);
```

Select a sample of *exactly* 50 rows of the table using reservoir sampling with a fixed seed (100):

```sql
SELECT *
FROM tbl
USING SAMPLE reservoir(50 ROWS)
REPEATABLE (100);
```

Select a sample of *approximately* 20% of the table using `system` sampling with a fixed seed (377):

```sql
SELECT *
FROM tbl
USING SAMPLE 20% (system, 377);
```

Select a sample of *approximately* 20% of `tbl` **before** the join with `tbl2`:

```sql
SELECT *
FROM tbl TABLESAMPLE reservoir(20%), tbl2
WHERE tbl.i = tbl2.i;
```

Select a sample of *approximately* 20% of `tbl` **after** the join with `tbl2`:

```sql
SELECT *
FROM tbl, tbl2
WHERE tbl.i = tbl2.i
USING SAMPLE reservoir(20%);
```

#### Syntax {#docs:current:sql:samples::syntax}



Samples allow you to randomly extract a subset of a dataset. Samples are useful for exploring a dataset faster, as often you might not be interested in the exact answers to queries, but only in rough indications of what the data looks like and what is in the data. Samples allow you to get approximate answers to queries faster, as they reduce the amount of data that needs to pass through the query engine.

DuckDB supports three different types of sampling methods: `reservoir`, `bernoulli` and `system`. By default, DuckDB uses `reservoir` sampling when an exact number of rows is sampled, and `system` sampling when a percentage is specified. The sampling methods are described in detail below.

Samples require a *sample size*, which is an indication of how many elements will be sampled from the total population. Samples can either be given as a percentage (` 10%` or `10 PERCENT`) or as a fixed number of rows (` 10` or `10 ROWS`). All three sampling methods support sampling over a percentage, but **only** reservoir sampling supports sampling a fixed number of rows.

Samples are probabilistic, that is to say, samples can be different between runs *unless* the seed is specifically specified. Specifying the seed *only* guarantees that the sample is the same if multi-threading is not enabled (i.e., `SET threads = 1`). In the case of multiple threads running over a sample, samples are not necessarily consistent even with a fixed seed.

#### Sampling Methods {#docs:current:sql:samples::sampling-methods}

##### `reservoir` {#docs:current:sql:samples::reservoir}

Reservoir sampling is a stream sampling technique that selects a random sample by keeping a *reservoir* of size equal to the sample size, and randomly replacing elements as more elements come in. Reservoir sampling allows us to specify *exactly* how many elements we want in the resulting sample (by selecting the size of the reservoir). As a result, reservoir sampling *always* outputs the same amount of elements, unlike system and bernoulli sampling.

Reservoir sampling is only recommended for small sample sizes, and is not recommended for use with percentages. That is because reservoir sampling needs to materialize the entire sample and randomly replace tuples within the materialized sample. The larger the sample size, the higher the performance hit incurred by this process.

Reservoir sampling also incurs an additional performance penalty when multi-processing is used, since the reservoir is to be shared amongst the different threads to ensure unbiased sampling. This is not a big problem when the reservoir is very small, but becomes costly when the sample is large.

> **Best practice.** Avoid using reservoir sampling with large sample sizes if possible.
> Reservoir sampling requires the entire sample to be materialized in memory.

##### `bernoulli` {#docs:current:sql:samples::bernoulli}

Bernoulli sampling can only be used when a sampling percentage is specified. It is rather straightforward: every row in the underlying table is included with a chance equal to the specified percentage. As a result, bernoulli sampling can return a different number of tuples even if the same percentage is specified. The *expected* number of rows is equal to the specified percentage of the table, but there will be some *variance*.

Because bernoulli sampling is completely independent (there is no shared state), there is no penalty for using bernoulli sampling together with multiple threads.

##### `system` {#docs:current:sql:samples::system}

System sampling is a variant of bernoulli sampling with one crucial difference: every *vector* is included with a chance equal to the sampling percentage. This is a form of cluster sampling. System sampling is more efficient than bernoulli sampling, as no per-tuple selections have to be performed.

The *expected* number of rows is still equal to the specified percentage of the table, but the *variance* is `vectorSize` times higher. As such, system sampling is not suitable for datasets with fewer than ~10k rows, where it can happen that all rows will be filtered out, or all the data will be included, even when you ask for `50 PERCENT`.

#### Table Samples {#docs:current:sql:samples::table-samples}

The `TABLESAMPLE` and `USING SAMPLE` clauses are identical in terms of syntax and effect, with one important difference: tablesamples sample directly from the table for which they are specified, whereas the sample clause samples after the entire from clause has been resolved. This is relevant when there are joins present in the query plan.

The `TABLESAMPLE` clause is essentially equivalent to creating a subquery with the `USING SAMPLE` clause, i.e., the following two queries are identical:

Sample 20% of `tbl` **before** the join:

```sql
SELECT *
FROM
    tbl TABLESAMPLE reservoir(20%),
    tbl2
WHERE tbl.i = tbl2.i;
```

Sample 20% of `tbl` **before** the join:

```sql
SELECT *
FROM
    (SELECT * FROM tbl USING SAMPLE reservoir(20%)) tbl,
    tbl2
WHERE tbl.i = tbl2.i;
```

Sample 20% **after** the join (i.e., sample 20% of the join result):

```sql
SELECT *
FROM tbl, tbl2
WHERE tbl.i = tbl2.i
USING SAMPLE reservoir(20%);
```

# Configuration {#configuration}

## Configuration {#docs:current:configuration:overview}

DuckDB has a number of configuration options that can be used to change the behavior of the system.

The configuration options can be set using either the [`SET` statement](#docs:current:sql:statements:set) or the [`PRAGMA` statement](#docs:current:configuration:pragmas).
They can be reset to their original values using the [`RESET` statement](#docs:current:sql:statements:set::reset).

The values of configuration options can be queried via the [`current_setting()` scalar function](#docs:current:sql:functions:utility) or using the [`duckdb_settings()` table function](#docs:current:sql:meta:duckdb_table_functions::duckdb_settings). For example:

```sql
SELECT current_setting('memory_limit') AS memlimit;
```

Or:

```sql
SELECT value AS memlimit
FROM duckdb_settings()
WHERE name = 'memory_limit';
```

#### Examples {#docs:current:configuration:overview::examples}

Set the memory limit of the system to 10 GB.

```sql
SET memory_limit = '10GB';
```

Configure the system to use 1 thread.

```sql
SET threads TO 1;
```

Turn logging on and set the logging level to `debug`.
For additional details on logging levels, see [Log Level](#docs:current:operations_manual:logging:overview::log-level).

```sql
SET enable_logging = true;
SET logging_level = 'debug';
```

Write a single log message with the `debug` level and a `connection` scope:

```sql
SELECT write_log('A new client has connected.', level := 'debug', scope := 'connection');
```

Write a single log message with a `debug` level and a `connection` scope and a custom `log_type`:

```sql
SELECT write_log(
        'A new duck has connected to the lake.', 
        level := 'debug', 
        scope := 'connection', 
        log_type := 'duckdb.docs.example.quack'
    );
```

Check logs with the `DEBUG` log level:

```sql
SELECT * FROM duckdb_logs WHERE log_level = 'DEBUG';
```

Check logs with the `QueryLog` type:

```sql
SELECT * FROM duckdb_logs WHERE type = 'QueryLog';
```

Check current logging settings:

```sql
SELECT * FROM duckdb_settings() WHERE name LIKE '%logging%';
```

Enable printing of a progress bar during long-running queries:

```sql
SET enable_progress_bar = true;
```

Set the default null order to `NULLS LAST`:

```sql
SET default_null_order = 'nulls_last';
```

Return the current value of a specific setting:

```sql
SELECT current_setting('threads') AS threads;
```

| threads |
| ------: |
|      10 |

Query a specific setting:

```sql
SELECT *
FROM duckdb_settings()
WHERE name = 'threads';
```

| name    | value | description                                     | input_type | scope  |
| ------- | ----- | ----------------------------------------------- | ---------- | ------ |
| threads | 1     | The number of total threads used by the system. | BIGINT     | GLOBAL |

Show a list of all available settings:

```sql
SELECT *
FROM duckdb_settings();
```

Reset the memory limit of the system back to the default:

```sql
RESET memory_limit;
```

#### Secrets Manager {#docs:current:configuration:overview::secrets-manager}

DuckDB has a [Secrets manager](#docs:current:sql:statements:create_secret), which provides a unified user interface for secrets across all backends (e.g., AWS S3) that use them.

#### Configuration Reference {#docs:current:configuration:overview::configuration-reference}



Configuration options come with different default [scopes](#docs:current:sql:statements:set::scopes): `GLOBAL` and `LOCAL`. Below is a list of all available configuration options by scope.

##### Global Configuration Options {#docs:current:configuration:overview::global-configuration-options}

|                     Name                      |                                                                                                  Description                                                                                                  |    Type     |                    Default value                    |
|----|--------|--|---|
| `Calendar`                                    | The current calendar                                                                                                                                                                                          | `VARCHAR`   | System (locale) calendar                            |
| `TimeZone`                                    | The current time zone                                                                                                                                                                                         | `VARCHAR`   | System (locale) timezone                            |
| `__delta_only_variant_encoding_enabled`       | Enables the Parquet reader to identify a Variant structurally.                                                                                                                                                | `BOOLEAN`   | `false`                                             |
| `access_mode`                                 | Access mode of the database (` AUTOMATIC`, `READ_ONLY` or `READ_WRITE`)                                                                                                                                        | `VARCHAR`   | `automatic`                                         |
| `allocator_background_threads`                | Whether to enable the allocator background thread.                                                                                                                                                            | `BOOLEAN`   | `false`                                             |
| `allocator_bulk_deallocation_flush_threshold` | If a bulk deallocation larger than this occurs, flush outstanding allocations.                                                                                                                                | `VARCHAR`   | `512.0 MiB`                                         |
| `allocator_flush_threshold`                   | Peak allocation threshold at which to flush the allocator after completing a task.                                                                                                                            | `VARCHAR`   | `128.0 MiB`                                         |
| `allow_asterisks_in_http_paths`               | Allow '*' character in URLs users can query                                                                                                                                                                   | `BOOLEAN`   | `false`                                             |
| `allow_community_extensions`                  | Allow to load community built extensions                                                                                                                                                                      | `BOOLEAN`   | `true`                                              |
| `allow_extensions_metadata_mismatch`          | Allow to load extensions with not compatible metadata                                                                                                                                                         | `BOOLEAN`   | `false`                                             |
| `allow_parser_override_extension`             | Allow extensions to override the current parser                                                                                                                                                               | `VARCHAR`   | `DEFAULT`                                           |
| `allow_persistent_secrets`                    | Allow the creation of persistent secrets, that are stored and loaded on restarts                                                                                                                              | `BOOLEAN`   | `true`                                              |
| `allow_unredacted_secrets`                    | Allow printing unredacted secrets                                                                                                                                                                             | `BOOLEAN`   | `false`                                             |
| `allow_unsigned_extensions`                   | Allow to load extensions with invalid or missing signatures                                                                                                                                                   | `BOOLEAN`   | `false`                                             |
| `allowed_configs`                             | List of configuration options that are ALWAYS allowed to be changed - even when lock_configuration is true                                                                                                    | `VARCHAR[]` | `[]`                                                |
| `allowed_directories`                         | List of directories/prefixes that are ALWAYS allowed to be queried - even when enable_external_access is false                                                                                                | `VARCHAR[]` | `[]`                                                |
| `allowed_paths`                               | List of files that are ALWAYS allowed to be queried - even when enable_external_access is false                                                                                                               | `VARCHAR[]` | `[]`                                                |
| `arrow_large_buffer_size`                     | Whether Arrow buffers for strings, blobs, uuids and bits should be exported using large buffers                                                                                                               | `BOOLEAN`   | `false`                                             |
| `arrow_lossless_conversion`                   | Whenever a DuckDB type does not have a clear native or canonical extension match in Arrow, export the types with a duckdb.type_name extension name.                                                           | `BOOLEAN`   | `false`                                             |
| `arrow_output_list_view`                      | Whether export to Arrow format should use ListView as the physical layout for LIST columns                                                                                                                    | `BOOLEAN`   | `false`                                             |
| `arrow_output_version`                        | Whether strings should be produced by DuckDB in Utf8View format instead of Utf8                                                                                                                               | `VARCHAR`   | `1.0`                                               |
| `asof_loop_join_threshold`                    | The maximum number of rows we need on the left side of an ASOF join to use a nested loop join                                                                                                                 | `UBIGINT`   | `64`                                                |
| `auto_checkpoint_skip_wal_threshold`          | The estimated WAL write size at which point we will skip writing to the WAL and only checkpoint. Skipping writing to the WAL means concurrent commits are blocked while the checkpoint is happening.          | `UBIGINT`   | `100000`                                            |
| `auto_fallback_to_full_download`              | Allows automatically falling back to full file downloads when possible.                                                                                                                                       | `BOOLEAN`   | `true`                                              |
| `autoinstall_extension_repository`            | Overrides the custom endpoint for extension installation on autoloading                                                                                                                                       | `VARCHAR`   |                                                     |
| `autoinstall_known_extensions`                | Whether known extensions are allowed to be automatically installed when a query depends on them                                                                                                               | `BOOLEAN`   | `true`                                              |
| `autoload_known_extensions`                   | Whether known extensions are allowed to be automatically loaded when a query depends on them                                                                                                                  | `BOOLEAN`   | `true`                                              |
| `binary_as_string`                            | In Parquet files, interpret binary data as a string.                                                                                                                                                          | `BOOLEAN`   | `false`                                             |
| `block_allocator_memory`                      | Physical memory that the block allocator is allowed to use (this memory is never freed and cannot be reduced).                                                                                                | `VARCHAR`   | `0 bytes`                                           |
| `ca_cert_file`                                | Path to a custom certificate file for self-signed certificates.                                                                                                                                               | `VARCHAR`   |                                                     |
| `catalog_error_max_schemas`                   | The maximum number of schemas the system will scan for "did you mean..." style errors in the catalog                                                                                                          | `UBIGINT`   | `100`                                               |
| `checkpoint_threshold`, `wal_autocheckpoint`  | The WAL size threshold at which to automatically trigger a checkpoint (e.g., 1GB)                                                                                                                             | `VARCHAR`   | `16.0 MiB`                                          |
| `current_transaction_invalidation_policy`     | Which types of exceptions invalidate the database for the current transaction                                                                                                                                 | `VARCHAR`   | `STANDARD_POLICY`                                   |
| `custom_extension_repository`                 | Overrides the custom endpoint for remote extension installation                                                                                                                                               | `VARCHAR`   |                                                     |
| `custom_user_agent`                           | Metadata from DuckDB callers                                                                                                                                                                                  | `VARCHAR`   |                                                     |
| `default_block_size`                          | The default block size for new duckdb database files (new as-in, they do not yet exist).                                                                                                                      | `UBIGINT`   | `262144`                                            |
| `default_collation`                           | The collation setting used when none is specified                                                                                                                                                             | `VARCHAR`   |                                                     |
| `default_null_order`, `null_order`            | NULL ordering used when none is specified (` NULLS_FIRST` or `NULLS_LAST`)                                                                                                                                     | `VARCHAR`   | `NULLS_LAST`                                        |
| `default_order`                               | The order type used when none is specified (` ASC` or `DESC`)                                                                                                                                                  | `VARCHAR`   | `ASCENDING`                                         |
| `default_secret_storage`                      | Allows switching the default storage for secrets                                                                                                                                                              | `VARCHAR`   | `local_file`                                        |
| `deprecated_using_key_syntax`                 | Configures the use of the deprecated union syntax for USING KEY CTEs.                                                                                                                                         | `VARCHAR`   | `DEFAULT`                                           |
| `disable_database_invalidation`               | Disables invalidating the database instance when encountering a fatal error. Should be used with great care, as DuckDB cannot guarantee correct behavior after a fatal error.                                 | `BOOLEAN`   | `false`                                             |
| `disable_parquet_prefetching`                 | Disable the prefetching mechanism in Parquet                                                                                                                                                                  | `BOOLEAN`   | `false`                                             |
| `disable_timestamptz_casts`                   | Disable casting from timestamp to timestamptz                                                                                                                                                                 | `BOOLEAN`   | `false`                                             |
| `disabled_compression_methods`                | Disable a specific set of compression methods (comma separated)                                                                                                                                               | `VARCHAR`   |                                                     |
| `disabled_filesystems`                        | Disable specific file systems preventing access (e.g., LocalFileSystem)                                                                                                                                       | `VARCHAR`   |                                                     |
| `disabled_log_types`                          | Sets the list of disabled loggers                                                                                                                                                                             | `VARCHAR`   |                                                     |
| `duckdb_api`                                  | DuckDB API surface                                                                                                                                                                                            | `VARCHAR`   | `cli`                                               |
| `dynamic_or_filter_threshold`                 | The maximum amount of OR filters we generate dynamically from a hash join                                                                                                                                     | `UBIGINT`   | `50`                                                |
| `enable_curl_server_cert_verification`        | Enable server side certificate verification for CURL backend.                                                                                                                                                 | `BOOLEAN`   | `true`                                              |
| `enable_external_access`                      | Allow the database to access external state (through e.g., loading/installing modules, COPY TO/FROM, CSV readers, pandas replacement scans, etc)                                                              | `BOOLEAN`   | `true`                                              |
| `enable_external_file_cache`                  | Allow the database to cache external files (e.g., Parquet) in memory.                                                                                                                                         | `BOOLEAN`   | `true`                                              |
| `enable_fsst_vectors`                         | Allow scans on FSST compressed segments to emit compressed vectors to utilize late decompression                                                                                                              | `BOOLEAN`   | `false`                                             |
| `enable_geoparquet_conversion`                | Attempt to decode/encode geometry data in/as GeoParquet files if the spatial extension is present.                                                                                                            | `BOOLEAN`   | `true`                                              |
| `enable_global_s3_configuration`              | Automatically fetch AWS credentials from environment variables.                                                                                                                                               | `BOOLEAN`   | `true`                                              |
| `enable_http_metadata_cache`                  | Whether or not the global http metadata is used to cache HTTP metadata                                                                                                                                        | `BOOLEAN`   | `false`                                             |
| `enable_logging`                              | Enables the logger                                                                                                                                                                                            | `BOOLEAN`   | `1`                                                 |
| `enable_macro_dependencies`                   | Enable created MACROs to create dependencies on the referenced objects (such as tables)                                                                                                                       | `BOOLEAN`   | `false`                                             |
| `enable_object_cache`                         | [PLACEHOLDER] Legacy setting - does nothing                                                                                                                                                                   | `BOOLEAN`   | `false`                                             |
| `enable_server_cert_verification`             | Enable server side certificate verification.                                                                                                                                                                  | `BOOLEAN`   | `false`                                             |
| `enable_view_dependencies`                    | Enable created VIEWs to create dependencies on the referenced objects (such as tables)                                                                                                                        | `BOOLEAN`   | `false`                                             |
| `enabled_log_types`                           | Sets the list of enabled loggers                                                                                                                                                                              | `VARCHAR`   |                                                     |
| `errors_as_json`                              | Output error messages as structured `JSON` instead of as a raw string                                                                                                                                         | `BOOLEAN`   | `false`                                             |
| `experimental_metadata_reuse`                 | EXPERIMENTAL: Re-use row group and table metadata when checkpointing.                                                                                                                                         | `BOOLEAN`   | `true`                                              |
| `explain_output`                              | Output of EXPLAIN statements (` ALL`, `OPTIMIZED_ONLY`, `PHYSICAL_ONLY`)                                                                                                                                       | `VARCHAR`   | `PHYSICAL_ONLY`                                     |
| `extension_directories`                       | Set the directories to store extensions in                                                                                                                                                                    | `VARCHAR[]` | `[]`                                                |
| `extension_directory`                         | Set the directory to store extensions in                                                                                                                                                                      | `VARCHAR`   |                                                     |
| `external_threads`                            | The number of external threads that work on DuckDB tasks.                                                                                                                                                     | `UBIGINT`   | `1`                                                 |
| `file_search_path`                            | A comma separated list of directories to search for input files                                                                                                                                               | `VARCHAR`   |                                                     |
| `force_download_threshold`                    | Forces upfront download of files smaller than the given size in bytes                                                                                                                                         | `UBIGINT`   | `0`                                                 |
| `force_download`                              | Forces upfront download of file                                                                                                                                                                               | `BOOLEAN`   | `false`                                             |
| `force_mbedtls_unsafe`                        | Enable mbedtls for encryption (WARNING: unsafe to use)                                                                                                                                                        | `BOOLEAN`   | `false`                                             |
| `force_variant_shredding`                     | Forces the VARIANT shredding that happens at checkpoint to use the provided schema for the shredding.                                                                                                         | `VARCHAR`   | `INVALID`                                           |
| `geometry_minimum_shredding_size`             | Minimum size of a rowgroup to enable GEOMETRY shredding, or set to -1 to disable entirely. Defaults to 1/4th of a rowgroup                                                                                    | `BIGINT`    | `30000`                                             |
| `home_directory`                              | Sets the home directory used by the system                                                                                                                                                                    | `VARCHAR`   |                                                     |
| `http_keep_alive`                             | Keep alive connections. Setting this to false can help when running into connection failures                                                                                                                  | `BOOLEAN`   | `true`                                              |
| `http_proxy_password`                         | Password for HTTP proxy                                                                                                                                                                                       | `VARCHAR`   |                                                     |
| `http_proxy_username`                         | Username for HTTP proxy                                                                                                                                                                                       | `VARCHAR`   |                                                     |
| `http_proxy`                                  | HTTP proxy host                                                                                                                                                                                               | `VARCHAR`   |                                                     |
| `http_retries`                                | HTTP retries on I/O error                                                                                                                                                                                     | `UBIGINT`   | `3`                                                 |
| `http_retry_backoff`                          | Backoff factor for exponentially increasing retry wait time                                                                                                                                                   | `FLOAT`     | `4`                                                 |
| `http_retry_wait_ms`                          | Time between retries                                                                                                                                                                                          | `UBIGINT`   | `100`                                               |
| `http_timeout`                                | HTTP timeout read/write/connection/retry (in seconds)                                                                                                                                                         | `UBIGINT`   | `30`                                                |
| `httpfs_client_implementation`                | Select which is the HTTPUtil implementation to be used                                                                                                                                                        | `VARCHAR`   | `default`                                           |
| `httpfs_connection_caching`                   | Enable connection caching for HTTP requests                                                                                                                                                                   | `BOOLEAN`   | `false`                                             |
| `ieee_floating_point_ops`                     | Use IEE754-compliant floating point operations (returning NAN instead of errors/NULL).                                                                                                                        | `BOOLEAN`   | `true`                                              |
| `ignore_unknown_crs`                          | Ignore unknown Coordinate Reference Systems (CRS) when creating geometry types or importing geospatial data.                                                                                                  | `BOOLEAN`   | `false`                                             |
| `immediate_transaction_mode`                  | Whether transactions should be started lazily when needed, or immediately when BEGIN TRANSACTION is called                                                                                                    | `BOOLEAN`   | `false`                                             |
| `index_scan_max_count`                        | The maximum index scan count sets a threshold for index scans. If fewer than MAX(index_scan_max_count, index_scan_percentage * total_row_count) rows match, we perform an index scan instead of a table scan. | `UBIGINT`   | `2048`                                              |
| `index_scan_percentage`                       | The index scan percentage sets a threshold for index scans. If fewer than MAX(index_scan_max_count, index_scan_percentage * total_row_count) rows match, we perform an index scan instead of a table scan.    | `DOUBLE`    | `0.001`                                             |
| `integer_division`                            | Whether or not the / operator defaults to integer division, or to floating point division                                                                                                                     | `BOOLEAN`   | `false`                                             |
| `lambda_syntax`                               | Configures the use of the deprecated single arrow operator (->) for lambda functions.                                                                                                                         | `VARCHAR`   | `DEFAULT`                                           |
| `late_materialization_max_rows`               | The maximum amount of rows in the LIMIT/SAMPLE for which we trigger late materialization                                                                                                                      | `UBIGINT`   | `50`                                                |
| `lock_configuration`                          | Whether or not configurations can be altered                                                                                                                                                                  | `BOOLEAN`   | `false`                                             |
| `log_query_path`                              | Specifies the path to which queries should be logged (default: NULL, queries are not logged)                                                                                                                  | `VARCHAR`   |                                                     |
| `logging_level`                               | The log level which will be recorded in the log                                                                                                                                                               | `VARCHAR`   | `WARNING`                                           |
| `logging_mode`                                | Determines which types of log messages are logged                                                                                                                                                             | `VARCHAR`   | `LEVEL_ONLY`                                        |
| `logging_storage`                             | Set the logging storage (memory/stdout/file/<custom>)                                                                                                                                                         | `VARCHAR`   | `shell_log_storage`                                 |
| `max_expression_depth`                        | The maximum expression depth limit in the parser. WARNING: increasing this setting and using very deep expressions might lead to stack overflow errors.                                                       | `UBIGINT`   | `1000`                                              |
| `max_memory`, `memory_limit`                  | The maximum memory of the system (e.g., 1GB)                                                                                                                                                                  | `VARCHAR`   | 80% of RAM                                          |
| `max_temp_directory_size`                     | The maximum amount of data stored inside the 'temp_directory' (when set) (e.g., 1GB)                                                                                                                          | `VARCHAR`   | `90% of available disk space`                       |
| `max_vacuum_tasks`                            | The maximum vacuum tasks to schedule during a checkpoint.                                                                                                                                                     | `UBIGINT`   | `100`                                               |
| `merge_http_secret_into_s3_request`           | Merges http secret params into S3 requests                                                                                                                                                                    | `BOOLEAN`   | `true`                                              |
| `merge_join_threshold`                        | The maximum number of rows on either table to choose a merge join                                                                                                                                             | `UBIGINT`   | `1000`                                              |
| `nested_loop_join_threshold`                  | The maximum number of rows on either table to choose a nested loop join                                                                                                                                       | `UBIGINT`   | `5`                                                 |
| `old_implicit_casting`                        | Allow implicit casting to/from VARCHAR                                                                                                                                                                        | `BOOLEAN`   | `false`                                             |
| `order_by_non_integer_literal`                | Allow ordering by non-integer literals - ordering by such literals has no effect.                                                                                                                             | `BOOLEAN`   | `false`                                             |
| `ordered_aggregate_threshold`                 | The number of rows to accumulate before sorting, used for tuning                                                                                                                                              | `UBIGINT`   | `262144`                                            |
| `parquet_metadata_cache`                      | Cache Parquet metadata - useful when reading the same files multiple times                                                                                                                                    | `BOOLEAN`   | `false`                                             |
| `partitioned_write_flush_threshold`           | The threshold in number of rows after which we flush a thread state when writing using `PARTITION_BY`                                                                                                         | `UBIGINT`   | `524288`                                            |
| `partitioned_write_max_open_files`            | The maximum amount of files the system can keep open before flushing to disk when writing using `PARTITION_BY`                                                                                                | `UBIGINT`   | `100`                                               |
| `password`                                    | The password to use. Ignored for legacy compatibility.                                                                                                                                                        | `VARCHAR`   |                                                     |
| `perfect_ht_threshold`                        | Threshold in bytes for when to use a perfect hash table                                                                                                                                                       | `UBIGINT`   | `12`                                                |
| `pin_threads`                                 | Whether to pin threads to cores (Linux only, default AUTO: on when there are more than 64 cores)                                                                                                              | `VARCHAR`   | `auto`                                              |
| `pivot_filter_threshold`                      | The threshold to switch from using filtered aggregates to LIST with a dedicated pivot operator                                                                                                                | `UBIGINT`   | `20`                                                |
| `pivot_limit`                                 | The maximum number of pivot columns in a pivot statement                                                                                                                                                      | `UBIGINT`   | `100000`                                            |
| `prefer_range_joins`                          | Force use of range joins with mixed predicates                                                                                                                                                                | `BOOLEAN`   | `false`                                             |
| `prefetch_all_parquet_files`                  | Use the prefetching mechanism for all types of parquet files                                                                                                                                                  | `BOOLEAN`   | `false`                                             |
| `preserve_identifier_case`                    | Whether or not to preserve the identifier case, instead of always lowercasing all non-quoted identifiers                                                                                                      | `BOOLEAN`   | `true`                                              |
| `preserve_insertion_order`                    | Whether or not to preserve insertion order. If set to false the system is allowed to re-order any results that do not contain ORDER BY clauses.                                                               | `BOOLEAN`   | `true`                                              |
| `produce_arrow_string_view`                   | Whether Arrow strings should be produced by DuckDB in Utf8View format instead of Utf8                                                                                                                         | `BOOLEAN`   | `false`                                             |
| `s3_access_key_id`                            | S3 Access Key ID                                                                                                                                                                                              | `VARCHAR`   | NULL                                                |
| `s3_allow_recursive_globbing`                 | Whether globs on S3-like storage are optimized with recursive strategy (alterative is listing)                                                                                                                | `BOOLEAN`   | `true`                                              |
| `s3_endpoint`                                 | S3 Endpoint                                                                                                                                                                                                   | `VARCHAR`   | NULL                                                |
| `s3_kms_key_id`                               | S3 KMS Key ID                                                                                                                                                                                                 | `VARCHAR`   | NULL                                                |
| `s3_region`                                   | S3 Region                                                                                                                                                                                                     | `VARCHAR`   | NULL                                                |
| `s3_requester_pays`                           | S3 use requester pays mode                                                                                                                                                                                    | `BOOLEAN`   | `false`                                             |
| `s3_secret_access_key`                        | S3 Access Key                                                                                                                                                                                                 | `VARCHAR`   | NULL                                                |
| `s3_session_token`                            | S3 Session Token                                                                                                                                                                                              | `VARCHAR`   | NULL                                                |
| `s3_uploader_max_filesize`                    | S3 Uploader max filesize (between 50GB and 5TB)                                                                                                                                                               | `VARCHAR`   | `800GB`                                             |
| `s3_uploader_max_parts_per_file`              | S3 Uploader max parts per file (between 1 and 10000)                                                                                                                                                          | `UBIGINT`   | `10000`                                             |
| `s3_uploader_thread_limit`                    | S3 Uploader global thread limit                                                                                                                                                                               | `UBIGINT`   | `50`                                                |
| `s3_url_compatibility_mode`                   | Disable Globs and Query Parameters on S3 URLs                                                                                                                                                                 | `BOOLEAN`   | `false`                                             |
| `s3_url_style`                                | S3 URL style                                                                                                                                                                                                  | `VARCHAR`   | `vhost`                                             |
| `s3_use_ssl`                                  | S3 use SSL                                                                                                                                                                                                    | `BOOLEAN`   | `true`                                              |
| `s3_version_id_pinning`                       | Pin S3 reads to a specific object version for consistency                                                                                                                                                     | `BOOLEAN`   | `false`                                             |
| `scalar_subquery_error_on_multiple_rows`      | When a scalar subquery returns multiple rows - return a random row instead of returning an error.                                                                                                             | `BOOLEAN`   | `true`                                              |
| `scheduler_process_partial`                   | Partially process tasks before rescheduling - allows for more scheduler fairness between separate queries                                                                                                     | `BOOLEAN`   | `false`                                             |
| `secret_directory`                            | Set the directory to which persistent secrets are stored                                                                                                                                                      | `VARCHAR`   | `~/.duckdb/stored_secrets`                          |
| `storage_block_prefetch`                      | In which scenarios to use storage block prefetching                                                                                                                                                           | `VARCHAR`   | `REMOTE_ONLY`                                       |
| `storage_compatibility_version`               | Serialize on checkpoint with compatibility for a given duckdb version                                                                                                                                         | `VARCHAR`   | `v0.10.2`                                           |
| `temp_directory`                              | Set the directory to which to write temp files                                                                                                                                                                | `VARCHAR`   | `⟨database_name⟩.tmp` or `.tmp` (in in-memory mode) |
| `temp_file_encryption`                        | Encrypt all temporary files if database is encrypted                                                                                                                                                          | `BOOLEAN`   | `false`                                             |
| `threads`, `worker_threads`                   | The number of total threads used by the system.                                                                                                                                                               | `BIGINT`    | # CPU cores                                         |
| `unsafe_disable_etag_checks`                  | Disable checks on ETag consistency                                                                                                                                                                            | `BOOLEAN`   | `false`                                             |
| `user`, `username`                            | The username to use. Ignored for legacy compatibility.                                                                                                                                                        | `VARCHAR`   |                                                     |
| `vacuum_rebuild_indexes`                      | (Experimental) Allow vacuum to compact row groups on tables with bound ART indexes, rebuilding the indexes afterward. Tables with a row count exceeding this threshold are skipped. 0 = disabled.             | `UBIGINT`   | `0`                                                 |
| `validate_external_file_cache`                | Cache validation mode: VALIDATE_`ALL` (default, validate all cache entries), VALIDATE_REMOTE (validate only remote cache entries), or NO_VALIDATION (disable cache validation).                               | `VARCHAR`   | `VALIDATE_ALL`                                      |
| `variant_minimum_shredding_size`              | Minimum size of a rowgroup to enable VARIANT shredding, or set to -1 to disable entirely. Defaults to 1/4th of a rowgroup                                                                                     | `BIGINT`    | `30000`                                             |
| `wal_autocheckpoint_entries`                  | Trigger automatic checkpoint when WAL entry count reaches or exceeds N (0 = disabled)                                                                                                                         | `UBIGINT`   | `0`                                                 |
| `warnings_as_errors`                          | Escalate all warnings to errors.                                                                                                                                                                              | `BOOLEAN`   | `false`                                             |
| `write_buffer_row_group_count`                | The amount of row groups to buffer in bulk ingestion prior to flushing them together. Reducing this setting can reduce memory consumption.                                                                    | `UBIGINT`   | `5`                                                 |
| `zstd_min_string_length`                      | The (average) length at which to enable ZSTD compression, defaults to 4096                                                                                                                                    | `UBIGINT`   | `4096`                                              |

##### Local Configuration Options {#docs:current:configuration:overview::local-configuration-options}

|                 Name                 |                                              Description                                               |   Type    |                                                                                                                                                                                                                                                                                                                                                                                                        Default value                                                                                                                                                                                                                                                                                                                                                                                                        |
|----|--------|--|---|
| `custom_profiling_settings`          | Accepts a `JSON` enabling custom metrics                                                               | `VARCHAR` | `{"ATTACH_LOAD_STORAGE_LATENCY": "true", "ATTACH_REPLAY_WAL_LATENCY": "true", "BLOCKED_THREAD_TIME": "true", "CHECKPOINT_LATENCY": "true", "COMMIT_LOCAL_STORAGE_LATENCY": "true", "CPU_TIME": "true", "CUMULATIVE_CARDINALITY": "true", "CUMULATIVE_ROWS_SCANNED": "true", "EXTRA_INFO": "true", "LATENCY": "true", "OPERATOR_CARDINALITY": "true", "OPERATOR_NAME": "true", "OPERATOR_ROWS_SCANNED": "true", "OPERATOR_TIMING": "true", "OPERATOR_TYPE": "true", "QUERY_NAME": "true", "RESULT_SET_SIZE": "true", "ROWS_RETURNED": "true", "SYSTEM_PEAK_BUFFER_MEMORY": "true", "SYSTEM_PEAK_TEMP_DIR_SIZE": "true", "TOTAL_BYTES_READ": "true", "TOTAL_BYTES_WRITTEN": "true", "TOTAL_MEMORY_ALLOCATED": "true", "WAITING_TO_ATTACH_LATENCY": "true", "WAL_REPLAY_ENTRY_COUNT": "true", "WRITE_TO_WAL_LATENCY": "true"}` |
| `enable_http_logging`                | (deprecated) Enables HTTP logging                                                                      | `BOOLEAN` | `true`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| `enable_profiling`                   | Enables profiling, and sets the output format (` JSON`, `QUERY_TREE`, `QUERY_TREE_OPTIMIZER`)           | `VARCHAR` | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| `enable_progress_bar_print`          | Controls the printing of the progress bar, when 'enable_progress_bar' is true                          | `BOOLEAN` | `true`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| `enable_progress_bar`                | Enables the progress bar, printing progress to the terminal for long queries                           | `BOOLEAN` | `true`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| `http_logging_output`                | (deprecated) The file to which HTTP logging output should be saved, or empty to print to the terminal  | `VARCHAR` |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| `profile_output`, `profiling_output` | The file to which profile output should be saved, or empty to print to the terminal                    | `VARCHAR` |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| `profiling_coverage`                 | The profiling coverage (SELECT or `ALL`)                                                               | `VARCHAR` | `SELECT`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| `profiling_mode`                     | The profiling mode (` STANDARD` or `DETAILED`)                                                          | `VARCHAR` | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| `progress_bar_time`                  | Sets the time (in milliseconds) how long a query needs to take before we start printing a progress bar | `BIGINT`  | `2000`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| `schema`                             | Sets the default search schema. Equivalent to setting search_path to a single value.                   | `VARCHAR` | `main`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| `search_path`                        | Sets the default catalog search path as a comma-separated list of values                               | `VARCHAR` |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| `streaming_buffer_size`              | The maximum memory to buffer between fetching from a streaming result (e.g., 1GB)                      | `VARCHAR` | `976.5 KiB`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |

## Pragmas {#docs:current:configuration:pragmas}



The `PRAGMA` statement is a SQL extension adopted by DuckDB from SQLite. `PRAGMA` statements can be issued in a similar manner to regular SQL statements. `PRAGMA` commands may alter the internal state of the database engine, and can influence the subsequent execution or behavior of the engine.

`PRAGMA` statements that assign a value to an option can also be issued using the [`SET` statement](#docs:current:sql:statements:set) and the value of an option can be retrieved using `SELECT current_setting(option_name)`.

For DuckDB's built in configuration options, see the [Configuration Reference](#docs:current:configuration:overview::configuration-reference).
DuckDB [extensions](#docs:current:extensions:overview) may register additional configuration options.
These are documented in the respective extensions' documentation pages.

This page contains the supported `PRAGMA` settings.

#### Metadata {#docs:current:configuration:pragmas::metadata}

###### Schema Information {#docs:current:configuration:pragmas::schema-information}

List all databases:

```sql
PRAGMA database_list;
```

List all tables:

```sql
PRAGMA show_tables;
```

List all tables, with extra information, similarly to [`DESCRIBE`](#docs:current:guides:meta:describe):

```sql
PRAGMA show_tables_expanded;
```

To list all functions:

```sql
PRAGMA functions;
```

For queries targeting non-existing schemas, DuckDB generates “did you mean...” style error messages.
When there are thousands of attached databases, these errors can take a long time to generate.
To limit the number of schemas DuckDB looks through, use the `catalog_error_max_schemas` option:

```sql
SET catalog_error_max_schemas = 10;
```

###### Table Information {#docs:current:configuration:pragmas::table-information}

Get info for a specific table:

```sql
PRAGMA table_info('table_name');
CALL pragma_table_info('table_name');
```

`table_info` returns information about the columns of the table with name `table_name`. The exact format of the table returned is given below:

```sql
cid INTEGER,        -- cid of the column
name VARCHAR,       -- name of the column
type VARCHAR,       -- type of the column
notnull BOOLEAN,    -- if the column is marked as NOT NULL
dflt_value VARCHAR, -- default value of the column, or NULL if not specified
pk BOOLEAN          -- part of the primary key or not
```

###### Database Size {#docs:current:configuration:pragmas::database-size}

Get the file and memory size of each database:

```sql
PRAGMA database_size;
CALL pragma_database_size();
```

`database_size` returns information about the file and memory size of each database. The column types of the returned results are given below:

```sql
database_name VARCHAR, -- database name
database_size VARCHAR, -- total block count times the block size
block_size BIGINT,     -- database block size
total_blocks BIGINT,   -- total blocks in the database
used_blocks BIGINT,    -- used blocks in the database
free_blocks BIGINT,    -- free blocks in the database
wal_size VARCHAR,      -- write ahead log size
memory_usage VARCHAR,  -- memory used by the database buffer manager
memory_limit VARCHAR   -- maximum memory allowed for the database
```

###### Storage Information {#docs:current:configuration:pragmas::storage-information}

To get storage information:

```sql
PRAGMA storage_info('table_name');
CALL pragma_storage_info('table_name');
```

This call returns the following information for the given table:

| Name           | Type      | Description                                           |
|----------------|-----------|-------------------------------------------------------|
| `row_group_id` | `BIGINT`  |                                                                                                                                                    |
| `column_name`  | `VARCHAR` |                                                                                                                                                    |
| `column_id`    | `BIGINT`  |                                                                                                                                                    |
| `column_path`  | `VARCHAR` |                                                                                                                                                    |
| `segment_id`   | `BIGINT`  |                                                                                                                                                    |
| `segment_type` | `VARCHAR` |                                                                                                                                                    |
| `start`        | `BIGINT`  | The start row id of this chunk                                                                                                                     |
| `count`        | `BIGINT`  | The amount of entries in this storage chunk                                                                                                        |
| `compression`  | `VARCHAR` | Compression type used for this column – see the [“Lightweight Compression in DuckDB” blog post](https://duckdb.org/2022/10/28/lightweight-compression) |
| `stats`        | `VARCHAR` |                                                                                                                                                    |
| `has_updates`  | `BOOLEAN` |                                                                                                                                                    |
| `persistent`   | `BOOLEAN` | `false` if temporary table                                                                                                                         |
| `block_id`     | `BIGINT`  | Empty unless persistent                                                                                                                            |
| `block_offset` | `BIGINT`  | Empty unless persistent                                                                                                                            |

See [Storage](#docs:current:internals:storage) for more information.

###### Show Databases {#docs:current:configuration:pragmas::show-databases}

The following statement is equivalent to the [`SHOW DATABASES` statement](#docs:current:sql:statements:attach):

```sql
PRAGMA show_databases;
```

#### Resource Management {#docs:current:configuration:pragmas::resource-management}

###### Memory Limit {#docs:current:configuration:pragmas::memory-limit}

Set the memory limit for the buffer manager:

```sql
SET memory_limit = '1GB';
```

> **Warning.** The specified memory limit is only applied to the buffer manager.
> For most queries, the buffer manager handles the majority of the data processed.
> However, certain in-memory data structures such as [vectors](#docs:current:internals:vector) and query results are allocated outside of the buffer manager.
> Additionally, [aggregate functions](#docs:current:sql:functions:aggregates) with complex state (e.g., `list`, `mode`, `quantile`, `string_agg`, and `approx` functions) use memory outside of the buffer manager.
> Therefore, the actual memory consumption can be higher than the specified memory limit.

###### Threads {#docs:current:configuration:pragmas::threads}

Set the amount of threads for parallel query execution:

```sql
SET threads = 4;
```

#### Collations {#docs:current:configuration:pragmas::collations}

List all available collations:

```sql
PRAGMA collations;
```

Set the default collation to one of the available ones:

```sql
SET default_collation = 'nocase';
```

#### Default Ordering for NULLs {#docs:current:configuration:pragmas::default-ordering-for-nulls}

Set the default ordering for NULLs to be either `NULLS_FIRST`, `NULLS_LAST`, `NULLS_FIRST_ON_ASC_LAST_ON_DESC` or `NULLS_LAST_ON_ASC_FIRST_ON_DESC`:

```sql
SET default_null_order = 'NULLS_FIRST';
SET default_null_order = 'NULLS_LAST_ON_ASC_FIRST_ON_DESC';
```

Set the default result set ordering direction to `ASCENDING` or `DESCENDING`:

```sql
SET default_order = 'ASCENDING';
SET default_order = 'DESCENDING';
```

#### Ordering by Non-Integer Literals {#docs:current:configuration:pragmas::ordering-by-non-integer-literals}

By default, ordering by non-integer literals is not allowed:

```sql
SELECT 42 ORDER BY 'hello world';
```

```console
-- Binder Error: ORDER BY non-integer literal has no effect.
```

To allow this behavior, use the `order_by_non_integer_literal` option:

```sql
SET order_by_non_integer_literal = true;
```

#### Implicit Casting to `VARCHAR` {#docs:current:configuration:pragmas::implicit-casting-to-varchar}

Prior to version 0.10.0, DuckDB would automatically allow any type to be implicitly cast to `VARCHAR` during function binding. As a result it was possible to e.g., compute the substring of an integer without using an explicit cast. For version v0.10.0 and later an explicit cast is needed instead. To revert to the old behavior that performs implicit casting, set the `old_implicit_casting` variable to `true`:

```sql
SET old_implicit_casting = true;
```

#### Python: Scan All Dataframes {#docs:current:configuration:pragmas::python-scan-all-dataframes}

Prior to version 1.1.0, DuckDB's [replacement scan mechanism](#docs:current:clients:c:replacement_scans) in Python scanned the global Python namespace. To revert to this old behavior, use the following setting:

```sql
SET python_scan_all_frames = true;
```

#### Information on DuckDB {#docs:current:configuration:pragmas::information-on-duckdb}

###### Version {#docs:current:configuration:pragmas::version}

Show DuckDB version:

```sql
PRAGMA version;
CALL pragma_version();
```

###### Platform {#docs:current:configuration:pragmas::platform}

`platform` returns an identifier for the platform the current DuckDB executable has been compiled for, e.g., `osx_arm64`.
The format of this identifier matches the platform name as described in the [extension loading explainer](#docs:current:extensions:extension_distribution::platforms):

```sql
PRAGMA platform;
CALL pragma_platform();
```

###### User Agent {#docs:current:configuration:pragmas::user-agent}

The following statement returns the user agent information, e.g., `duckdb/v0.10.0(osx_arm64)`:

```sql
PRAGMA user_agent;
```

###### Metadata Information {#docs:current:configuration:pragmas::metadata-information}

The following statement returns information on the metadata store (` block_id`, `total_blocks`, `free_blocks`, and `free_list`):

```sql
PRAGMA metadata_info;
```

#### Progress Bar {#docs:current:configuration:pragmas::progress-bar}

Show progress bar when running queries:

```sql
PRAGMA enable_progress_bar;
```

Or:

```sql
PRAGMA enable_print_progress_bar;
```

Don't show a progress bar for running queries:

```sql
PRAGMA disable_progress_bar;
```

Or:

```sql
PRAGMA disable_print_progress_bar;
```

#### EXPLAIN Output {#docs:current:configuration:pragmas::explain-output}

The output of [`EXPLAIN`](#docs:current:sql:statements:profiling) can be configured to show only the physical plan.

The default configuration of `EXPLAIN`:

```sql
SET explain_output = 'physical_only';
```

To only show the optimized query plan:

```sql
SET explain_output = 'optimized_only';
```

To show all query plans:

```sql
SET explain_output = 'all';
```

#### Profiling {#docs:current:configuration:pragmas::profiling}

##### Enable Profiling {#docs:current:configuration:pragmas::enable-profiling}

The following query enables profiling with the default format, `query_tree`.
Independent of the format, `enable_profiling` is **mandatory** to enable profiling.

```sql
PRAGMA enable_profiling;
PRAGMA enable_profile;
```

##### Profiling Coverage {#docs:current:configuration:pragmas::profiling-coverage}

By default, the profiling coverage is set to `SELECT`.
`SELECT` runs the profiler for each operator in the physical plan of a `SELECT` statement.

```sql
SET profiling_coverage = 'SELECT';
```

By default, the profiler **does not** emit profiling information for other statement types (` INSERT INTO`, `ATTACH`, etc.).
To run the profiler for all statement types, change this setting to `ALL`.

```sql
SET profiling_coverage = 'ALL';
```

##### Profiling Format {#docs:current:configuration:pragmas::profiling-format}

The format of `enable_profiling` can be specified as `query_tree`, `json`, `query_tree_optimizer`, or `no_output`.
Each format prints its output to the configured output, except `no_output`.

The default format is `query_tree`.
It prints the physical query plan and the metrics of each operator in the tree.

```sql
SET enable_profiling = 'query_tree';
```

Alternatively, `json` returns the physical query plan as JSON:

```sql
SET enable_profiling = 'json';
```

> **Tip.** To visualize query plans, consider using the [DuckDB execution plan visualizer](https://db.cs.uni-tuebingen.de/explain/) developed by the [Database Systems Research Group at the University of Tübingen](https://github.com/DBatUTuebingen).

To return the physical query plan, including optimizer and planner metrics:

```sql
SET enable_profiling = 'query_tree_optimizer';
```

Database drivers and other applications can also access profiling information through API calls, in which case users can disable any other output.
Even though the parameter reads `no_output`, it is essential to note that this **only** affects printing to the configurable output.
When accessing profiling information through API calls, it is still crucial to enable profiling:

```sql
SET enable_profiling = 'no_output';
```

##### Profiling Output {#docs:current:configuration:pragmas::profiling-output}

By default, DuckDB prints profiling information to the standard output.
However, if you prefer to write the profiling information to a file, you can use `PRAGMA` `profiling_output` to specify a filepath.

> **Warning.** The file contents will be overwritten for every newly issued query.
> Hence, the file will only contain the profiling information of the last run query:

```sql
SET profiling_output = '/path/to/file.json';
SET profile_output = '/path/to/file.json';
```

##### Profiling Mode {#docs:current:configuration:pragmas::profiling-mode}

By default, a limited amount of profiling information is provided (` standard`).

```sql
SET profiling_mode = 'standard';
```

For more details, use the detailed profiling mode by setting `profiling_mode` to `detailed`.
The output of this mode includes profiling of the planner and optimizer stages.

```sql
SET profiling_mode = 'detailed';
```

##### Custom Metrics {#docs:current:configuration:pragmas::custom-metrics}

By default, profiling enables all metrics except those activated by detailed profiling.

Using the `custom_profiling_settings` `PRAGMA`, each metric, including those from detailed profiling, can be individually enabled or disabled.
This `PRAGMA` accepts a JSON object with metric names as keys and Boolean values to toggle them on or off.
Settings specified by this `PRAGMA` override the default behavior.

> **Note.** This only affects the metrics when the `enable_profiling` is set to `json` or `no_output`.
> The `query_tree` and `query_tree_optimizer` always use a default set of metrics.

In the following example, the `CPU_TIME` metric is disabled.
The `EXTRA_INFO`, `OPERATOR_CARDINALITY`, and `OPERATOR_TIMING` metrics are enabled.

```sql
SET custom_profiling_settings = '{"CPU_TIME": "false", "EXTRA_INFO": "true", "OPERATOR_CARDINALITY": "true", "OPERATOR_TIMING": "true"}';
```

The profiling documentation contains an overview of the available [metrics](#docs:current:dev:profiling::metrics).

##### Disable Profiling {#docs:current:configuration:pragmas::disable-profiling}

To disable profiling:

```sql
PRAGMA disable_profiling;
PRAGMA disable_profile;
```

#### Query Optimization {#docs:current:configuration:pragmas::query-optimization}

###### Optimizer {#docs:current:configuration:pragmas::optimizer}

To disable the query optimizer:

```sql
PRAGMA disable_optimizer;
```

To enable the query optimizer:

```sql
PRAGMA enable_optimizer;
```

###### Selectively Disabling Optimizers {#docs:current:configuration:pragmas::selectively-disabling-optimizers}

The `disabled_optimizers` option allows selectively disabling optimization steps.
For example, to disable `filter_pushdown` and `statistics_propagation`, run:

```sql
SET disabled_optimizers = 'filter_pushdown,statistics_propagation';
```

The available optimizations can be queried using the [`duckdb_optimizers()` table function](#docs:current:sql:meta:duckdb_table_functions::duckdb_optimizers).

To re-enable the optimizers, run:

```sql
SET disabled_optimizers = '';
```

> **Warning.** The `disabled_optimizers` option should only be used for debugging performance issues and should be avoided in production.

#### Logging {#docs:current:configuration:pragmas::logging}

Set a path for query logging:

```sql
SET log_query_path = '/tmp/duckdb_log/';
```

Disable query logging:

```sql
SET log_query_path = '';
```

#### Full-Text Search Indexes {#docs:current:configuration:pragmas::full-text-search-indexes}

The `create_fts_index` and `drop_fts_index` options are only available when the [`fts` extension](#docs:current:core_extensions:full_text_search) is loaded. Their usage is documented on the [Full-Text Search extension page](#docs:current:core_extensions:full_text_search).

#### Verification {#docs:current:configuration:pragmas::verification}

###### Verification of External Operators {#docs:current:configuration:pragmas::verification-of-external-operators}

Enable verification of external operators:

```sql
PRAGMA verify_external;
```

Disable verification of external operators:

```sql
PRAGMA disable_verify_external;
```

###### Verification of Round-Trip Capabilities {#docs:current:configuration:pragmas::verification-of-round-trip-capabilities}

Enable verification of round-trip capabilities for supported logical plans:

```sql
PRAGMA verify_serializer;
```

Disable verification of round-trip capabilities:

```sql
PRAGMA disable_verify_serializer;
```

#### Object Cache {#docs:current:configuration:pragmas::object-cache}

Enable caching of objects for e.g., Parquet metadata:

```sql
PRAGMA enable_object_cache;
```

Disable caching of objects:

```sql
PRAGMA disable_object_cache;
```

#### Checkpointing {#docs:current:configuration:pragmas::checkpointing}

###### Compression {#docs:current:configuration:pragmas::compression}

During checkpointing, the existing column data + any new changes get compressed.
There exist a couple of pragmas to influence which compression functions are considered.

####### Force Compression {#docs:current:configuration:pragmas::force-compression}

Prefer using this compression method over any other method if possible:

```sql
PRAGMA force_compression = 'bitpacking';
```

####### Disabled Compression Methods {#docs:current:configuration:pragmas::disabled-compression-methods}

Avoid using any of the listed compression methods from the comma separated list:

```sql
PRAGMA disabled_compression_methods = 'fsst,rle';
```

###### Force Checkpoint {#docs:current:configuration:pragmas::force-checkpoint}

When [`CHECKPOINT`](#docs:current:sql:statements:checkpoint) is called when no changes are made, force a checkpoint regardless:

```sql
PRAGMA force_checkpoint;
```

###### Checkpoint on Shutdown {#docs:current:configuration:pragmas::checkpoint-on-shutdown}

Run a `CHECKPOINT` on successful shutdown and delete the WAL, to leave only a single database file behind:

```sql
PRAGMA enable_checkpoint_on_shutdown;
```

Don't run a `CHECKPOINT` on shutdown:

```sql
PRAGMA disable_checkpoint_on_shutdown;
```

#### Temp Directory for Spilling Data to Disk {#docs:current:configuration:pragmas::temp-directory-for-spilling-data-to-disk}

By default, DuckDB uses a temporary directory named `⟨database_file_name⟩.tmp`{:.language-sql .highlight} to spill to disk, located in the same directory as the database file. To change this, use:

```sql
SET temp_directory = '/path/to/temp_dir.tmp/';
```

#### Returning Errors as JSON {#docs:current:configuration:pragmas::returning-errors-as-json}

The `errors_as_json` option can be set to obtain error information in raw JSON format. For certain errors, extra information or decomposed information is provided for easier machine processing. For example:

```sql
SET errors_as_json = true;
```

Then, running a query that results in an error produces a JSON output:

```sql
SELECT * FROM nonexistent_tbl;
```

```json
{
   "exception_type":"Catalog",
   "exception_message":"Table with name nonexistent_tbl does not exist!\nDid you mean \"temp.information_schema.tables\"?",
   "name":"nonexistent_tbl",
   "candidates":"temp.information_schema.tables",
   "position":"14",
   "type":"Table",
   "error_subtype":"MISSING_ENTRY"
}
```

#### IEEE Floating-Point Operation Semantics {#docs:current:configuration:pragmas::ieee-floating-point-operation-semantics}

DuckDB follows IEEE floating-point operation semantics. If you would like to turn this off, run:

```sql
SET ieee_floating_point_ops = false;
```

In this case, floating point division by zero (e.g., `1.0 / 0.0`, `0.0 / 0.0` and `-1.0 / 0.0`) will all return `NULL`.

#### Query Verification (for Development) {#docs:current:configuration:pragmas::query-verification-for-development}

The following `PRAGMA`s are mostly used for development and internal testing.

Enable query verification:

```sql
PRAGMA enable_verification;
```

Disable query verification:

```sql
PRAGMA disable_verification;
```

Enable force parallel query processing:

```sql
PRAGMA verify_parallelism;
```

Disable force parallel query processing:

```sql
PRAGMA disable_verify_parallelism;
```

#### Block Sizes {#docs:current:configuration:pragmas::block-sizes}

When persisting a database to disk, DuckDB writes to a dedicated file containing a list of blocks holding the data.
In the case of a file that only holds very little data, e.g., a small table, the default block size of 256 kB might not be ideal.
Therefore, DuckDB's storage format supports different block sizes.

There are a few constraints on possible block size values.

* Must be a power of two.
* Must be greater or equal to 16384 (16 kB).
* Must be lesser or equal to 262144 (256 kB).

You can set the default block size for all new DuckDB files created by an instance like so:

```sql
SET default_block_size = '16384';
```

It is also possible to set the block size on a per-file basis, see [`ATTACH`](#docs:current:sql:statements:attach) for details.

## Secrets Manager {#docs:current:configuration:secrets_manager}

The **Secrets manager** provides a unified user interface for secrets across all backends that use them. Secrets can be scoped, so different storage prefixes can have different secrets, allowing for example to join data across organizations in a single query. Secrets can also be persisted, so that they do not need to be specified every time DuckDB is launched.

> **Warning.** Persistent secrets are stored in unencrypted binary format on the disk.

#### Types of Secrets {#docs:current:configuration:secrets_manager::types-of-secrets}

Secrets are typed, their type identifies which service they are for.
Most secrets are not included in DuckDB by default; instead, they are registered by extensions.
Currently, the following secret types are available:

| Secret type   | Service / protocol   | Extension                                                                         |
| ------------- | -------------------- | --------------------------------------------------------------------------------- |
| `azure`       | Azure Blob Storage   | [`azure`](#docs:current:core_extensions:azure)                       |
| `ducklake`    | DuckLake             | [`ducklake`](https://ducklake.select/docs/stable/duckdb/usage/connecting#secrets) |
| `gcs`         | Google Cloud Storage | [`httpfs`](#docs:current:core_extensions:httpfs:s3api)               |
| `http`        | HTTP and HTTPS       | [`httpfs`](#docs:current:core_extensions:httpfs:https)               |
| `huggingface` | Hugging Face         | [`httpfs`](#docs:current:core_extensions:httpfs:hugging_face)        |
| `iceberg`     | Iceberg REST Catalog | [`httpfs`](#docs:current:core_extensions:httpfs:s3api), [`iceberg`](#docs:current:core_extensions:iceberg:iceberg_rest_catalogs) |
| `mysql`       | MySQL                | [`mysql`](#docs:current:core_extensions:mysql)                       |
| `postgres`    | PostgreSQL           | [`postgres`](#docs:current:core_extensions:postgres)                 |
| `r2`          | Cloudflare R2        | [`httpfs`](#docs:current:core_extensions:httpfs:s3api)               |
| `s3`          | AWS S3               | [`httpfs`](#docs:current:core_extensions:httpfs:s3api)               |


For each type, there are one or more “secret providers” that specify how the secret is created. Secrets can also have an optional scope, which is a file path prefix that the secret applies to. When fetching a secret for a path, the secret scopes are compared to the path, returning the matching secret for the path. In the case of multiple matching secrets, the longest prefix is chosen.

#### Creating a Secret {#docs:current:configuration:secrets_manager::creating-a-secret}

Secrets can be created using the [`CREATE SECRET` SQL statement](#docs:current:sql:statements:create_secret).
Secrets can be **temporary** or **persistent**. Temporary secrets are used by default – and are stored in-memory for the life span of the DuckDB instance similar to how settings worked previously. Persistent secrets are stored in **unencrypted binary format** in the `~/.duckdb/stored_secrets` directory. On startup of DuckDB, persistent secrets are read from this directory and automatically loaded.

##### Secret Providers {#docs:current:configuration:secrets_manager::secret-providers}

To create a secret, a **Secret Provider** needs to be used. A Secret Provider is a mechanism through which a secret is generated. To illustrate this, for the `S3`, `GCS`, `R2`, and `AZURE` secret types, DuckDB currently supports two providers: `CONFIG` and `credential_chain`. The `CONFIG` provider requires the user to pass all configuration information into the `CREATE SECRET`, whereas the `credential_chain` provider will automatically try to fetch credentials. When no Secret Provider is specified, the `CONFIG` provider is used. For more details on how to create secrets using different providers check out the respective pages on [httpfs](#docs:current:core_extensions:httpfs:overview::configuration-and-authentication-using-secrets) and [azure](#docs:current:core_extensions:azure::authentication-with-secret).

##### Temporary Secrets {#docs:current:configuration:secrets_manager::temporary-secrets}

To create a temporary unscoped secret to access S3, we can now use the following:

```sql
CREATE SECRET my_secret (
    TYPE s3,
    KEY_ID 'my_secret_key',
    SECRET 'my_secret_value',
    REGION 'my_region'
);
```

Note that we implicitly use the default `CONFIG` secret provider here.

##### Persistent Secrets {#docs:current:configuration:secrets_manager::persistent-secrets}

In order to persist secrets between DuckDB database instances, we can now use the `CREATE PERSISTENT SECRET` command, e.g.:

```sql
CREATE PERSISTENT SECRET my_persistent_secret (
    TYPE s3,
    KEY_ID 'my_secret_key',
    SECRET 'my_secret_value'
);
```

By default, this will write the secret (unencrypted) to the `~/.duckdb/stored_secrets` directory. To change the secrets directory, issue:

```sql
SET secret_directory = 'path/to/my_secrets_dir';
```

Note that setting the value of the `home_directory` configuration option has no effect on the location of the secrets.

#### Deleting Secrets {#docs:current:configuration:secrets_manager::deleting-secrets}

Secrets can be deleted using the [`DROP SECRET` statement](#docs:current:sql:statements:create_secret::syntax-for-drop-secret), e.g.:

```sql
DROP PERSISTENT SECRET my_persistent_secret;
```

#### Creating Multiple Secrets for the Same Service Type {#docs:current:configuration:secrets_manager::creating-multiple-secrets-for-the-same-service-type}

If two secrets exist for a service type, the scope can be used to decide which one should be used. For example:

```sql
CREATE SECRET secret1 (
    TYPE s3,
    KEY_ID 'my_secret_key1',
    SECRET 'my_secret_value1',
    SCOPE 's3://⟨my-bucket⟩'
);
```

```sql
CREATE SECRET secret2 (
    TYPE s3,
    KEY_ID 'my_secret_key2',
    SECRET 'my_secret_value2',
    SCOPE 's3://⟨my-other-bucket⟩'
);
```

Now, if the user queries something from `s3://⟨my-other-bucket⟩/something`, secret `secret2` will be chosen automatically for that request. To see which secret is being used, the `which_secret` scalar function can be used, which takes a path and a secret type as parameters:

```sql
FROM which_secret('s3://⟨my-other-bucket⟩/file.parquet', 's3');
```

#### Listing Secrets {#docs:current:configuration:secrets_manager::listing-secrets}

Secrets can be listed using the built-in table-producing function, e.g., by using the [`duckdb_secrets()` table function](#docs:current:sql:meta:duckdb_table_functions::duckdb_secrets):

```sql
FROM duckdb_secrets();
```

Sensitive information will be redacted.

# Extensions {#extensions}

## Extensions {#docs:current:extensions:overview}

DuckDB has a flexible extension mechanism that allows for dynamically loading extensions.
Extensions can enhance DuckDB's functionality by providing support for additional file formats, introducing new types, and domain-specific functionality.

> Extensions are loadable on all clients (e.g., Python and R).
> Extensions distributed via the Core and Community repositories are built and tested on macOS, Windows and Linux. All operating systems are supported for both the AMD64 and the ARM64 architectures.

#### Listing Extensions {#docs:current:extensions:overview::listing-extensions}

To get a list of extensions, use the `duckdb_extensions` function:

```sql
SELECT extension_name, installed, description
FROM duckdb_extensions();
```

| extension_name    | installed | description                                                  |
|-------------------|-----------|--------------------------------------------------------------|
| arrow             | false     | A zero-copy data integration between Apache Arrow and DuckDB |
| autocomplete      | false     | Adds support for autocomplete in the shell                   |
| ...               | ...       | ...                                                          |

This list will show which extensions are available, which extensions are installed, at which version, where it is installed, and more.
The list includes most, but not all, available core extensions. For the full list, we maintain a [list of core extensions](#docs:current:core_extensions:overview).

> **Tip.** We provide an endpoint that serves weekly extension download statistics as JSON files: [core extensions](https://extensions.duckdb.org/downloads-last-week.json) and [community extensions](https://community-extensions.duckdb.org/downloads-last-week.json).

#### Built-In Extensions {#docs:current:extensions:overview::built-in-extensions}

DuckDB's binary distribution comes standard with a few built-in extensions. They are statically linked into the binary and can be used as is.
For example, to use the built-in [`json` extension](#docs:current:data:json:overview) to read a JSON file:

```sql
SELECT *
FROM 'test.json';
```

To make the DuckDB distribution lightweight, only a few essential extensions are built-in, varying slightly per distribution. Which extension is built-in on which platform is documented in the [list of core extensions](#docs:current:core_extensions:overview::default-extensions).

#### Installing More Extensions {#docs:current:extensions:overview::installing-more-extensions}

To make an extension that is not built-in available in DuckDB, two steps need to happen:

1. **Extension installation** is the process of downloading the extension binary and verifying its metadata. During installation, DuckDB stores the downloaded extension and some metadata in a local directory. From this directory DuckDB can then load the Extension whenever it needs to. This means that installation needs to happen only once.

2. **Extension loading** is the process of dynamically loading the binary into a DuckDB instance. DuckDB will search the local extension
directory for the installed extension, then load it to make its features available. This means that every time DuckDB is restarted, all
extensions that are used need to be (re)loaded

> Extension installation and loading are subject to a few [limitations](#docs:current:extensions:installing_extensions::limitations).

There are two main methods of making DuckDB perform the **installation** and **loading** steps for an installable extension: **explicitly** and through **autoloading**.

##### Explicit `INSTALL` and `LOAD` {#docs:current:extensions:overview::explicit-install-and-load}

In DuckDB extensions can also be explicitly installed and loaded. Both non-autoloadable and autoloadable extensions can be installed this way.
To explicitly install and load an extension, DuckDB has the dedicated SQL statements `LOAD` and `INSTALL`. For example,
to install and load the [`spatial` extension](#docs:current:core_extensions:spatial:overview), run:

```sql
INSTALL spatial;
LOAD spatial;
```

With these statements, DuckDB will ensure the spatial extension is installed (ignoring the `INSTALL` statement if it is already installed), then proceed
to `LOAD` the spatial extension (again ignoring the statement if it is already loaded).

###### Extension Repository {#docs:current:extensions:overview::extension-repository}

Optionally a repository can be provided where the extension should be installed from, by appending `FROM ⟨repository⟩`{:.language-sql .highlight} to the `INSTALL` / `FORCE INSTALL` command.
This repository can either be an alias, such as [`community`](#community_extensions:index), or it can be a direct URL, provided as a single-quoted string.

After installing/loading an extension, the [`duckdb_extensions` function](#::listing-extensions) can be used to get more information.

##### Autoloading Extensions {#docs:current:extensions:overview::autoloading-extensions}

For many of DuckDB's core extensions, explicitly loading and installing extensions is not necessary. DuckDB contains an autoloading mechanism
which can install and load the core extensions as soon as they are used in a query. For example, when running:

```sql
SELECT *
FROM 'https://raw.githubusercontent.com/duckdb/duckdb-web/main/data/weather.csv';
```

DuckDB will automatically install and load the [`httpfs`](#docs:current:core_extensions:httpfs:overview) extension. No explicit `INSTALL` or `LOAD` statements are required.

Not all extensions can be autoloaded. This can have various reasons: some extensions make several changes to the running DuckDB instance, making autoloading technically not (yet) possible. For others, it is preferred to have users opt-in to the extension explicitly before use due to the way they modify behavior in DuckDB.

To see which extensions can be autoloaded, check the [core extensions list](#docs:current:core_extensions:overview).

##### Community Extensions {#docs:current:extensions:overview::community-extensions}

DuckDB supports installing third-party [community extensions](#community_extensions:index). For example, you can install the [`avro` community extension](#community_extensions:extensions:avro) via:

```sql
INSTALL avro FROM community;
```

Community extensions are contributed by community members but they are built, [signed](#docs:current:extensions:extension_distribution::signed-extensions), and distributed in a centralized repository.

#### Updating Extensions {#docs:current:extensions:overview::updating-extensions}

While built-in extensions are tied to a DuckDB release due to their nature of being built into the DuckDB binary, installable extensions
can and do receive updates. To ensure all currently installed extensions are on the most recent version, call:

```sql
UPDATE EXTENSIONS;
```

For more details on extension versions, refer to the [Extension Versioning page](#docs:current:extensions:versioning_of_extensions).

#### Developing Extensions {#docs:current:extensions:overview::developing-extensions}

The same API that the core extensions use is available for developing extensions. This allows users to extend the functionality of DuckDB such that it suits their domain the best.
A template for creating extensions is available in the [`extension-template` repository](https://github.com/duckdb/extension-template/). This template also holds some documentation on how to get started building your own extension.

#### Working with Extensions {#docs:current:extensions:overview::working-with-extensions}

See the [installation instructions](#docs:current:extensions:installing_extensions) and the [advanced installation methods page](#docs:current:extensions:advanced_installation_methods).

## Installing Extensions {#docs:current:extensions:installing_extensions}

To install core DuckDB extensions, use the `INSTALL` command.
For example:

```sql
INSTALL httpfs;
```

This installs the extension from the default repository (` core`).

#### Extension Repositories {#docs:current:extensions:installing_extensions::extension-repositories}

By default, DuckDB extensions are installed from a single repository containing extensions built and signed by the core DuckDB team.
This ensures the stability and security of the core set of extensions.
These extensions live in the default `core` repository, which points to `http://extensions.duckdb.org`.

Besides the core repository, DuckDB also supports installing extensions from other repositories. For example, the `core_nightly` repository contains nightly builds for core extensions
that are built for the latest stable release of DuckDB. This allows users to try out new features in extensions before they are officially published.

##### Installing Extensions from Different Repositories {#docs:current:extensions:installing_extensions::installing-extensions-from-different-repositories}

To install extensions from the default repository (` core`), run:

```sql
INSTALL httpfs;
```

To explicitly install an extension from the core repository, run:

```sql
INSTALL httpfs FROM core;
-- or
INSTALL httpfs FROM 'http://extensions.duckdb.org';
```

To install an extension from the core nightly repository:

```sql
INSTALL spatial FROM core_nightly;
-- or
INSTALL spatial FROM 'http://nightly-extensions.duckdb.org';
```

To install an extension from a custom repository:

```sql
INSTALL ⟨custom_extension⟩ FROM 'https://my-custom-extension-repository';
```

Alternatively, the `custom_extension_repository` setting can be used to change the default repository used by DuckDB:

```sql
SET custom_extension_repository = 'http://nightly-extensions.duckdb.org';
```

DuckDB contains the following predefined repositories:

| Alias                 | URL                                      | Description                                                                            |
|:----------------------|:-----------------------------------------|:---------------------------------------------------------------------------------------|
| `core`                | `http://extensions.duckdb.org`           | DuckDB core extensions                                                                 |
| `core_nightly`        | `http://nightly-extensions.duckdb.org`   | Nightly builds for `core`                                                              |
| `community`           | `http://community-extensions.duckdb.org` | DuckDB community extensions                                                            |
| `local_build_debug`   | `./build/debug/repository`               | Repository created when building DuckDB from source in debug mode (for development)    |
| `local_build_release` | `./build/release/repository`             | Repository created when building DuckDB from source in release mode (for development)  |

#### Working with Multiple Repositories {#docs:current:extensions:installing_extensions::working-with-multiple-repositories}

When working with extensions from different repositories, especially mixing `core` and `core_nightly`, it is important to know the origins and version of the different extensions.
For this reason, DuckDB keeps track of this in the extension installation metadata.
For example:

```sql
INSTALL httpfs FROM core;
INSTALL aws FROM core_nightly;
SELECT extension_name, extension_version, installed_from, install_mode
FROM duckdb_extensions();
```

This outputs:



| extensions_name | extensions_version | installed_from | install_mode |
|:----------------|:-------------------|:---------------|:-------------|
| httpfs          | 62d61a417f         | core           | REPOSITORY   |
| aws             | 42c78d3            | core_nightly   | REPOSITORY   |
| ...             | ...                | ...            | ...          |

#### Force Installing to Upgrade Extensions {#docs:current:extensions:installing_extensions::force-installing-to-upgrade-extensions}

When DuckDB installs an extension, it is copied to a local directory to be cached and avoid future network traffic.
Any subsequent calls to `INSTALL ⟨extension_name⟩`{:.language-sql .highlight} will use the local version instead of downloading the extension again.
To force re-downloading the extension, run:

```sql
FORCE INSTALL extension_name;
```

Force installing can also be used to overwrite an extension with an extension of the same name from another repository,

For example, first, `spatial` is installed from the core repository:

```sql
INSTALL spatial;
```

Then, to overwrite this installation with the `spatial` extension from the `core_nightly` repository:

```sql
FORCE INSTALL spatial FROM core_nightly;
```

##### Switching between Repositories {#docs:current:extensions:installing_extensions::switching-between-repositories}

To switch repositories for an extension, use the `FORCE INSTALL` command.
For example, if you have installed `httpfs` from the `core_nightly` repository but would like to switch back to using `core`, run:

```sql
FORCE INSTALL httpfs FROM core;
```

#### Installing Extensions through Client APIs {#docs:current:extensions:installing_extensions::installing-extensions-through-client-apis}

For many clients, using SQL to load and install extensions is the preferred method. However, some clients have a dedicated
API to install and load extensions. For example, the [Python client](#docs:current:clients:python:overview::loading-and-installing-extensions), has dedicated `install_extension(name: str)` and `load_extension(name: str)` methods. For more details on a specific client API, refer
to the [Client API documentation](#docs:current:clients:overview)

#### Installation Location {#docs:current:extensions:installing_extensions::installation-location}

By default, extensions are installed under the user's home directory:

```sql
~/.duckdb/extensions/⟨duckdb_version⟩/⟨platform_name⟩/
```

For stable DuckDB releases, the `⟨duckdb_version⟩`{:.language-sql .highlight} will be equal to the version tag of that release. For nightly DuckDB builds, it will be equal
to the short git hash of the build. So for example, the extensions for DuckDB version v0.10.3 on macOS ARM64 (Apple Silicon) are installed to `~/.duckdb/extensions/v0.10.3/osx_arm64/`.
An example installation path for a nightly DuckDB build could be `~/.duckdb/extensions/fc2e4b26a6/linux_amd64`.

To change the default location where DuckDB stores its extensions, use the `extension_directory` configuration option:

```sql
SET extension_directory = '/path/to/your/extension/directory';
```

To specify multiple directories for loading extensions (e.g., for package managers or air-gapped environments), use the `extension_directories` option:

```sql
SET extension_directories = ['/usr/lib/duckdb/extensions', '/opt/duckdb/extensions'];
```

Note that setting the value of the `home_directory` configuration option has no effect on the location of the extensions.

#### Uninstalling Extensions {#docs:current:extensions:installing_extensions::uninstalling-extensions}

Currently, DuckDB does not provide a command to uninstall extensions.
To uninstall an extension, navigate to the extension's [Installation Location](#::installation-location) and remove its `.duckdb_extension` binary file:
For example:

```batch
rm ~/.duckdb/extensions/v1.2.1/osx_arm64/excel.duckdb_extension
```

#### Sharing Extensions between Clients {#docs:current:extensions:installing_extensions::sharing-extensions-between-clients}

The shared installation location allows extensions to be shared between the client APIs _of the same DuckDB version_, as long as they share the same `platform` or ABI. For example, if an extension is installed with version 1.2.1 of the CLI client on macOS, it is available from the Python, R, etc. client libraries provided that they have access to the user's home directory and use DuckDB version 1.2.1.

#### Limitations {#docs:current:extensions:installing_extensions::limitations}

DuckDB's extension mechanism has the following limitations:

* Extensions cannot be unloaded.
* Extensions cannot be reloaded. If you [update extensions](#docs:current:sql:statements:update_extensions), restart the DuckDB process to use newer extensions.

## Advanced Installation Methods {#docs:current:extensions:advanced_installation_methods}

#### Downloading Extensions Directly from S3 {#docs:current:extensions:advanced_installation_methods::downloading-extensions-directly-from-s3}

Downloading an extension directly can be helpful when building a [Lambda service](https://aws.amazon.com/pm/lambda/) or container that uses DuckDB.
DuckDB extensions are stored in public S3 buckets, but the directory structure of those buckets is not searchable.
As a result, a direct URL to the file must be used.
To download an extension file directly, use the following format:

```text
http://extensions.duckdb.org/v⟨duckdb_version⟩/⟨platform_name⟩/⟨extension_name⟩.duckdb_extension.gz
```

For example:

```text
http://extensions.duckdb.org/v1.5.2/windows_amd64/json.duckdb_extension.gz
```

#### Installing an Extension from an Explicit Path {#docs:current:extensions:advanced_installation_methods::installing-an-extension-from-an-explicit-path}

The `INSTALL` command can be used with the path to a `.duckdb_extension` file:

```sql
INSTALL 'path/to/httpfs.duckdb_extension';
```

Note that compressed `.duckdb_extension.gz` files need to be decompressed beforehand. It is also possible to specify remote paths.

#### Loading an Extension from an Explicit Path {#docs:current:extensions:advanced_installation_methods::loading-an-extension-from-an-explicit-path}

`LOAD` can be used with the path to a `.duckdb_extension`.
For example, if the file was available at the (relative) path `path/to/httpfs.duckdb_extension`, you can load it as follows:

```sql
LOAD 'path/to/httpfs.duckdb_extension';
```

This will skip any currently installed extensions and load the specified extension directly.

Note that using remote paths for compressed files is currently not possible.

#### Building and Installing Extensions from Source {#docs:current:extensions:advanced_installation_methods::building-and-installing-extensions-from-source}

For building and installing extensions from source, see the [Building DuckDB guide](#docs:current:dev:building:overview).

##### Statically Linking Extensions {#docs:current:extensions:advanced_installation_methods::statically-linking-extensions}

To statically link extensions, follow the [developer documentation's “Using extension config files” section](https://github.com/duckdb/duckdb/blob/main/extension/README.md#using-extension-config-files).

## Extension Distribution {#docs:current:extensions:extension_distribution}

#### Platforms {#docs:current:extensions:extension_distribution::platforms}

Extension binaries are distributed for several platforms (see below).
For platforms where packages for certain extensions are not available, users can build them from source and [install the resulting binaries manually](#docs:current:extensions:advanced_installation_methods::installing-an-extension-from-an-explicit-path).

All official extensions are distributed for the following platforms.

| Platform name   | Operating system | Architecture    | CPU types                      |
| --------------- | ---------------- | --------------- | ------------------------------ |
| `linux_amd64`   | Linux            | x86_64  (AMD64) |                                |
| `linux_arm64`   | Linux            | AArch64 (ARM64) | AWS Graviton, Snapdragon, etc. |
| `osx_amd64`     | macOS            | x86_64  (AMD64) | Intel                          |
| `osx_arm64`     | macOS            | AArch64 (ARM64) | Apple Silicon M1, M2, etc.     |
| `windows_amd64` | Windows          | x86_64  (AMD64) | Intel, AMD, etc.               |
| `windows_arm64` | Windows          | AArch64 (ARM64) | Copilot+ PC with Qualcomm CPU  |

Some extensions are distributed for the following platforms:

* `windows_amd64_mingw`
* `wasm_eh` and `wasm_mvp` (see [DuckDB-Wasm's extensions](#docs:current:clients:wasm:extensions))

For platforms outside the ones listed above, we do not officially distribute extensions (e.g., `linux_arm64_android`).

#### Extensions Signing {#docs:current:extensions:extension_distribution::extensions-signing}

##### Signed Extensions {#docs:current:extensions:extension_distribution::signed-extensions}

Extensions can be signed with a cryptographic key.
By default, DuckDB uses its built-in public keys to verify the integrity of extensions before loading them.
All core and community extensions are signed by the DuckDB team.

Signing the extension simplifies their distribution, this is why they can be distributed over HTTP without the need for HTTPS,
which itself is supported through an extension ([`httpfs`](#docs:current:core_extensions:httpfs:overview)).

##### Unsigned Extensions {#docs:current:extensions:extension_distribution::unsigned-extensions}

> **Warning.** Only load unsigned extensions from sources you trust.
> Avoid loading unsigned extensions over HTTP.
> Consult the [Securing DuckDB page](#docs:current:operations_manual:securing_duckdb:securing_extensions) for guidelines on how to set up DuckDB in a secure manner.

If you wish to load your own extensions or extensions from third-parties you will need to enable the `allow_unsigned_extensions` flag.
To load unsigned extensions using the [CLI client](#docs:current:clients:cli:overview), pass the `-unsigned` flag to it on startup:

```batch
duckdb -unsigned
```

Now any extension can be loaded, signed or not:

```sql
LOAD './some/local/ext.duckdb_extension';
```

For client APIs, the `allow_unsigned_extensions` database configuration options needs to be set, see the respective [Client API docs](#docs:current:clients:overview).
For example, for the Python client, see the [Loading and Installing Extensions section in the Python API documentation](#docs:current:clients:python:overview::loading-and-installing-extensions).

#### Binary Compatibility {#docs:current:extensions:extension_distribution::binary-compatibility}

To avoid binary compatibility issues, the binary extensions distributed by DuckDB are tied both to a specific DuckDB version and a [platform](#::platforms).
This means that DuckDB can automatically detect binary compatibility between it and a loadable extension.
When trying to load an extension that was compiled for a different version or platform, DuckDB will throw an error and refuse to load the extension.

#### Creating a Custom Repository {#docs:current:extensions:extension_distribution::creating-a-custom-repository}

You can create a custom DuckDB extension repository.
A DuckDB repository is an HTTP, HTTPS, S3, or local file based directory that serves the extensions files in a specific structure.
This structure is described in the [“Downloading Extensions Directly from S3” section](#docs:current:extensions:advanced_installation_methods::downloading-extensions-directly-from-s3), and is the same
for local paths and remote servers, for example:

```text
base_repository_path_or_url
└── v1.0.0
    └── osx_arm64
        ├── autocomplete.duckdb_extension
        ├── httpfs.duckdb_extension
        ├── icu.duckdb_extension
        ├── inet.duckdb_extension
        ├── json.duckdb_extension
        ├── parquet.duckdb_extension
        ├── tpcds.duckdb_extension
        ├── tpcds.duckdb_extension
        └── tpch.duckdb_extension
```

See the [`extension-template` repository](https://github.com/duckdb/extension-template/) for all necessary code and scripts
to set up a repository.

When installing an extension from a custom repository, DuckDB will search for both a gzipped and non-gzipped version. For example:

```sql
INSTALL icu FROM '⟨custom_repository⟩';
```

The execution of this statement will first look for `icu.duckdb_extension.gz`, then `icu.duckdb_extension` in the repository's directory structure.

If the custom repository is served over HTTPS or S3, the [`httpfs` extension](#docs:current:core_extensions:httpfs:overview) is required. DuckDB will attempt to [autoload](#docs:current:extensions:overview::autoloading-extensions)
the `httpfs` extension when an installation over HTTPS or S3 is attempted.

## Versioning of Extensions {#docs:current:extensions:versioning_of_extensions}

#### Extension Versioning {#docs:current:extensions:versioning_of_extensions::extension-versioning}

Most software has some sort of version number. Version numbers serve a few important goals:

* Tie a binary to a specific state of the source code
* Allow determining the expected feature set
* Allow determining the state of the APIs
* Allow efficient processing of bug reports (e.g., bug `#1337` was introduced in version `v3.4.5` )
* Allow determining chronological order of releases (e.g., version `v1.2.3` is older than `v1.2.4`)
* Give an indication of expected stability (e.g., `v0.0.1` is likely not very stable, whereas `v13.11.0` probably is stable)

Just like [DuckDB itself](#release_calendar), DuckDB extensions have their own version number. To ensure consistent semantics
of these version numbers across the various extensions, DuckDB's [Core Extensions](#docs:current:core_extensions:overview) use
a versioning scheme that prescribes how extensions should be versioned. The versioning scheme for Core Extensions is made up of 3 different stability levels: **unstable**, **pre-release**, and **stable**.
Let's go over each of the 3 levels and describe their format:

##### Unstable Extensions {#docs:current:extensions:versioning_of_extensions::unstable-extensions}

Unstable extensions are extensions that can't (or don't want to) give any guarantees regarding their current stability,
or their goals of becoming stable. Unstable extensions are tagged with the **short git hash** of the extension.

For example, at the time of writing this, the version of the `vss` extension is an unstable extension of version `690bfc5`.

What to expect from an extension that has a version number in the **unstable** format?

* The state of the source code of the extension can be found by looking up the hash in the extension repository
* Functionality may change or be removed completely with every release
* This extension's API could change with every release
* This extension may not follow a structured release cycle, new (breaking) versions can be pushed at any time

##### Pre-Release Extensions {#docs:current:extensions:versioning_of_extensions::pre-release-extensions}

Pre-release extensions are the next step up from Unstable extensions. They are tagged with version in the **[SemVer](https://semver.org/)** format, more specifically, those in the `v0.y.z` format.
In semantic versioning, versions starting with `v0` have a special meaning: they indicate that the more strict semantics of regular (` >v1.0.0`) versions do not yet apply. It basically means that an extension is working
towards becoming a stable extension, but is not quite there yet.

For example, at the time of writing this, the version of the `delta` extension is a pre-release extension of version `v0.1.0`.

What to expect from an extension that has a version number in the **pre-release** format?

* The extension is compiled from the source code corresponding to the tag.
* Semantic Versioning semantics apply. See the [Semantic Versioning](https://semver.org/) specification for details.
* The extension follows a release cycle where new features are tested in nightly builds before being grouped into a release and pushed to the `core` repository.
* Release notes describing what has been added each release should be available to make it easy to understand the difference between versions.

##### Stable Extensions {#docs:current:extensions:versioning_of_extensions::stable-extensions}

Stable extensions are the final step of extension stability. This is denoted by using a **stable SemVer** of format `vx.y.z` where `x>0`.

For example, at the time of writing this, the version of the `parquet` extension is a stable extension of version `v1.0.0`.

What to expect from an extension that has a version number in the **stable** format? Essentially the same as pre-release extensions, but now the more
strict SemVer semantics apply: the API of the extension should now be stable and will only change in backwards incompatible ways when the major version is bumped.
See the SemVer specification for details

#### Release Cycle of Pre-Release and Stable Core Extensions {#docs:current:extensions:versioning_of_extensions::release-cycle-of-pre-release-and-stable-core-extensions}

In general for extensions the release cycle depends on their stability level. **unstable** extensions are often in
sync with DuckDB's release cycle, but may also be quietly updated between DuckDB releases. **pre-release** and **stable**
extensions follow their own release cycle. These may or may not coincide with DuckDB releases. To find out more about the release cycle of a specific
extension, refer to the documentation or GitHub page of the respective extension. Generally, **pre-release** and **stable** extensions will document
their releases as GitHub releases, an example of which you can see in the [`delta` extension](https://github.com/duckdb/duckdb-delta/releases).

Finally, there is a small exception: All [in-tree](#docs:current:extensions:advanced_installation_methods::in-tree-vs-out-of-tree) extensions simply
follow DuckDB's release cycle.

#### Nightly Builds {#docs:current:extensions:versioning_of_extensions::nightly-builds}

Just like DuckDB itself, DuckDB's core extensions have nightly or dev builds that can be used to try out features before they are officially released.
This can be useful when your workflow depends on a new feature, or when you need to confirm that your stack is compatible with the upcoming version.

Nightly builds for extensions are slightly complicated due to the fact that currently DuckDB extensions binaries are tightly bound to a single DuckDB version. Because of this tight connection,
there is a potential risk for a combinatorial explosion. Therefore, not all combinations of nightly extension build and nightly DuckDB build are available.

In general, there are 2 ways of using nightly builds: using a nightly DuckDB build and using a stable DuckDB build. Let's go over the differences between the two:

##### From Stable DuckDB {#docs:current:extensions:versioning_of_extensions::from-stable-duckdb}

In most cases, users will be interested in a nightly build of a specific extension, but don't necessarily want to switch to using the nightly build of DuckDB itself. This allows using a specific bleeding-edge
feature while limiting the exposure to unstable code.

To achieve this, Core Extensions tend to regularly push builds to the [`core_nightly` repository](#docs:current:extensions:installing_extensions::extension-repositories). Let's look at an example:

First we install a [**stable DuckDB build**](https://duckdb.org/install/index.html).

Then we can install and load a **nightly** extension like this:

```sql
INSTALL aws FROM core_nightly;
LOAD aws;
```

In this example we are using the latest **nightly** build of the aws extension with the latest **stable** version of DuckDB.

##### From Nightly DuckDB {#docs:current:extensions:versioning_of_extensions::from-nightly-duckdb}

When DuckDB CI produces a nightly binary of DuckDB itself, the binaries are distributed with a set of extensions that are pinned at a specific version. This extension version will be tested for that specific build of DuckDB, but might not be the latest dev build. Let's look at an example:

First, we install a [**nightly DuckDB build**](https://duckdb.org/install/index.html). Then, we can install and load the `aws` extension as expected:

```sql
INSTALL aws;
LOAD aws;
```

#### Updating Extensions {#docs:current:extensions:versioning_of_extensions::updating-extensions}

DuckDB has a dedicated statement that will automatically update all extensions to their latest version. The output will
give the user information on which extensions were updated to/from which version. For example:

```sql
UPDATE EXTENSIONS;
```

| extension_name | repository   | update_result         | previous_version | current_version |
|:---------------|:-------------|:----------------------|:-----------------|:----------------|
| httpfs         | core         | NO_UPDATE_AVAILABLE   | 70fd6a8a24       | 70fd6a8a24      |
| delta          | core         | UPDATED               | d9e5cc1          | 04c61e4         |
| azure          | core         | NO_UPDATE_AVAILABLE   | 49b63dc          | 49b63dc         |
| aws            | core_nightly | NO_UPDATE_AVAILABLE   | 42c78d3          | 42c78d3         |

Note that DuckDB will look for updates in the source repository for each extension. So if an extension was installed from
`core_nightly`, it will be updated with the latest nightly build.

The update statement can also be provided with a list of specific extensions to update:

```sql
UPDATE EXTENSIONS (httpfs, azure);
```

| extension_name | repository   | update_result         | previous_version | current_version |
|:---------------|:-------------|:----------------------|:-----------------|:----------------|
| httpfs         | core         | NO_UPDATE_AVAILABLE   | 70fd6a8a24       | 70fd6a8a24      |
| azure          | core         | NO_UPDATE_AVAILABLE   | 49b63dc          | 49b63dc         |

#### Target DuckDB Version {#docs:current:extensions:versioning_of_extensions::target-duckdb-version}

Currently, when extensions are compiled, they are tied to a specific version of DuckDB. What this means is that, for example, an extension binary compiled for version 0.10.3 does not work for version 1.0.0. In most cases, this will not cause any issues and is fully transparent; DuckDB will automatically ensure it installs the correct binary for its version. For extension developers, this means that they must ensure that new binaries are created whenever a new version of DuckDB is released. However, note that DuckDB provides an [extension template](https://github.com/duckdb/extension-template) that makes this fairly simple.

#### In-Tree vs. Out-of-Tree {#docs:current:extensions:versioning_of_extensions::in-tree-vs-out-of-tree}

Originally, DuckDB extensions lived exclusively in the DuckDB main repository, `github.com/duckdb/duckdb`. These extensions are called in-tree. Later, the concept
of out-of-tree extensions was added, where extensions were separated into their own repository, which we call out-of-tree.

While from a user's perspective, there are generally no noticeable differences, there are some minor differences related to versioning:

* in-tree extensions use the version of DuckDB instead of having their own version
* in-tree extensions do not have dedicated release notes, their changes are reflected in the regular [DuckDB release notes](https://github.com/duckdb/duckdb/releases)
* core out-of tree extensions tend to live in repositories named `github.com/duckdb/duckdb-⟨extension_name⟩`{:.language-sql .highlight} but the name may vary. See the [full list](#docs:current:core_extensions:overview) of core extensions for details.

## Troubleshooting of Extensions {#docs:current:extensions:troubleshooting}

You might be visiting this page directed via a DuckDB error message, similar to:

```sql
INSTALL non_existing;
```

```console
HTTP Error:
Failed to download extension "non_existing" at URL "http://extensions.duckdb.org/v1.4.0/osx_arm64/non_existing.duckdb_extension.gz" (HTTP 404)

Candidate extensions: "inet", "encodings", "core_functions", "sqlite_scanner", "postgres_scanner"
For more info, visit https://duckdb.org/docs/lts/extensions/troubleshooting?version=v1.4.0&platform=osx_arm64&extension=non_existing
```

There are multiple scenarios for which an extension might not be available in a given extension repository at a given time:
* the extension has not been uploaded yet, here some delay after a given release date might be expected. Consider checking the issues at [`duckdb/duckdb`](https://github.com/duckdb/duckdb) or [`duckdb/community-extensions`](https://github.com/duckdb/community-extensions), or creating one yourself.
* the extension is available, but in a different repository, try for example `INSTALL ⟨name⟩ FROM core;`{:.language-sql .highlight} or `INSTALL ⟨name⟩ FROM community;`{:.language-sql .highlight} or `INSTALL ⟨name⟩ FROM core_nightly;`{:.language-sql .highlight} (see the [Installing Extensions page](#docs:current:extensions:installing_extensions::extension-repositories)).
* networking issues, so extension exists at the endpoint but it's not reachable from your local DuckDB. Here you can try visiting the given URL via a browser directly pasting the link from the error message in the search bar.

If you are on a development version of DuckDB, that is any version for which `PRAGMA version` returns a `library_version` not starting with a `v`, then extensions might not be available anymore on the default extension repository.

When in doubt, consider raising an issue in [`duckdb/duckdb`](https://github.com/duckdb/duckdb).

#### Manual Process to Download Extensions via the Browser {#docs:current:extensions:troubleshooting::manual-process-to-download-extensions-via-the-browser}

To check if an extension is available, consider trying to download the relevant extension resource, for example via your browser visiting <https://extensions.duckdb.org/v1.4.4/osx_arm64/spatial.duckdb_extension.gz> or any other link that has been provided. Note that `http://` has been deprecated in favor of `https://`.

If successful, this will download and unpack the extension to the default `Downloads` folder, so that from SQL you can run:

```sql
INSTALL '~/Downloads/spatial.duckdb_extension';
-- or
FORCE INSTALL '~/Downloads/spatial.duckdb_extension';
```

and after this command the extension will be regularly installed.

# Core Extensions {#core_extensions}

## Core Extensions {#docs:current:core_extensions:overview}

#### List of Core Extensions {#docs:current:core_extensions:overview::list-of-core-extensions}



| Name                                                                     | Description                                                             | Maintainer       | Support&nbsp;tier | Aliases                 |
| :----------------------------------------------------------------------- | :---------------------------------------------------------------------- | ---------------- | :---------------- | :---------------------- |
| [autocomplete](#docs:current:core_extensions:autocomplete)   | Adds support for autocomplete in the shell                              | DuckDB&nbsp;team | Secondary         |                         |
| [avro](#docs:current:core_extensions:avro)                   | Add support for reading Avro files                                      | DuckDB&nbsp;team | Secondary         |                         |
| [aws](#docs:current:core_extensions:aws)                     | Provides features that depend on the AWS SDK                            | DuckDB&nbsp;team | Secondary         |                         |
| [azure](#docs:current:core_extensions:azure)                 | Adds a filesystem abstraction for Azure blob storage to DuckDB          | DuckDB&nbsp;team | Secondary         |                         |
| [delta](#docs:current:core_extensions:delta)                 | Adds support for Delta Lake                                             | DuckDB&nbsp;team | Secondary         |                         |
| [ducklake](#docs:current:core_extensions:ducklake)           | Adds support for DuckLake                                               | DuckDB&nbsp;team | Secondary         |                         |
| [encodings](#docs:current:core_extensions:encodings)         | Adds support for encodings available in the ICU data repository         | DuckDB&nbsp;team | Secondary         |                         |
| [excel](#docs:current:core_extensions:excel)                 | Adds support for reading and writing Excel files                        | DuckDB&nbsp;team | Secondary         |                         |
| [fts](#docs:current:core_extensions:full_text_search)        | Adds support for full-text search indexes                               | DuckDB&nbsp;team | Secondary         |                         |
| [httpfs](#docs:current:core_extensions:httpfs:overview)      | Adds support for reading/writing files over an HTTP(S) or S3 connection | DuckDB&nbsp;team | Primary           | http, https, s3         |
| [iceberg](#docs:current:core_extensions:iceberg:overview)    | Adds support for Apache Iceberg                                         | DuckDB&nbsp;team | Secondary         |                         |
| [icu](#docs:current:core_extensions:icu)                     | Adds support for time zones and collations using the ICU library        | DuckDB&nbsp;team | Primary           |                         |
| [inet](#docs:current:core_extensions:inet)                   | Adds support for IP-related data types and functions                    | DuckDB&nbsp;team | Secondary         |                         |
| [jemalloc](#docs:current:core_extensions:jemalloc)           | Overwrites the system allocator with jemalloc                           | DuckDB&nbsp;team | Secondary         |                         |
| [json](#docs:current:data:json:overview)                     | Adds support for JSON operations                                        | DuckDB&nbsp;team | Primary           |                         |
| [lance](#docs:current:core_extensions:lance)                 | Adds support to read and write Lance tables                             | Third party      |                   |                         |
| [motherduck](#docs:current:core_extensions:motherduck)       | Allows connecting to MotherDuck                                         | Third party      |                   | md                      |
| [mysql](#docs:current:core_extensions:mysql)                 | Adds support for reading from and writing to a MySQL database           | DuckDB&nbsp;team | Secondary         | mysql_scanner           |
| [odbc](#docs:current:core_extensions:odbc:overview)          | Adds support for accessing remote databases over ODBC drivers           | DuckDB&nbsp;team | Secondary         | odbc_scanner           |
| [parquet](#docs:current:data:parquet:overview)               | Adds support for reading and writing Parquet files                      | DuckDB&nbsp;team | Primary           |                         |
| [postgres](#docs:current:core_extensions:postgres)           | Adds support for reading from and writing to a PostgreSQL database      | DuckDB&nbsp;team | Secondary         | postgres_scanner        |
| [spatial](#docs:current:core_extensions:spatial:overview)    | Adds support for working with geospatial data and functions             | DuckDB&nbsp;team | Secondary         |                         |
| [sqlite](#docs:current:core_extensions:sqlite)               | Adds support for reading from and writing to SQLite database files      | DuckDB&nbsp;team | Secondary         | sqlite_scanner, sqlite3 |
| [tpcds](#docs:current:core_extensions:tpcds)                 | Adds TPC-DS data generation and query support                           | DuckDB&nbsp;team | Secondary         |                         |
| [tpch](#docs:current:core_extensions:tpch)                   | Adds TPC-H data generation and query support                            | DuckDB&nbsp;team | Secondary         |                         |
| [unity_catalog](#docs:current:core_extensions:unity_catalog) | Adds support for connecting to Unity Catalog                            | DuckDB&nbsp;team | Secondary         | uc_catalog              |
| [ui](#docs:current:core_extensions:ui)                       | Adds local UI for DuckDB                                                | Third party      |                   |                         |
| [vortex](#docs:current:core_extensions:vortex)               | Adds support for reading and writing Vortex files                       | Third party      |                   |                         |
| [vss](#docs:current:core_extensions:vss)                     | Adds support for vector similarity search queries                       | DuckDB&nbsp;team | Secondary         |                         |

The **Maintainer** column denotes whether the extension is maintained by the DuckDB team or by a third party.
For the extensions maintained by the DuckDB team, the **Support tier** column denotes the extension's support status.
_Primary extensions_ are covered by [community support](https://duckdblabs.com/community_support_policy/).
_Secondary extensions_ are supported on a best-effort basis. That said, they still receive frequent bugfixes/updates and are shipped with new DuckDB releases.

## AutoComplete Extension {#docs:current:core_extensions:autocomplete}

The `autocomplete` extension adds support for autocomplete in the [CLI client](#docs:current:clients:cli:overview).
The extension is shipped by default with the CLI client.

#### Behavior {#docs:current:core_extensions:autocomplete::behavior}

For the behavior of the `autocomplete` extension, see the [documentation of the CLI client](#docs:current:clients:cli:autocomplete).

#### Functions {#docs:current:core_extensions:autocomplete::functions}

| Function                          | Description                                          |
|:----------------------------------|:-----------------------------------------------------|
| `sql_auto_complete(query_string)` | Attempts autocompletion on the given `query_string`. |

#### Example {#docs:current:core_extensions:autocomplete::example}

```sql
SELECT *
FROM sql_auto_complete('SEL');
```

Returns:

| suggestion  | suggestion_start |
|-------------|------------------|
| SELECT      |                0 |
| DELETE      |                0 |
| INSERT      |                0 |
| CALL        |                0 |
| LOAD        |                0 |
| CALL        |                0 |
| ALTER       |                0 |
| BEGIN       |                0 |
| EXPORT      |                0 |
| CREATE      |                0 |
| PREPARE     |                0 |
| EXECUTE     |                0 |
| EXPLAIN     |                0 |
| ROLLBACK    |                0 |
| DESCRIBE    |                0 |
| SUMMARIZE   |                0 |
| CHECKPOINT  |                0 |
| DEALLOCATE  |                0 |
| UPDATE      |                0 |
| DROP        |                0 |

## Avro Extension {#docs:current:core_extensions:avro}

The `avro` extension enables DuckDB to read [Apache Avro](https://avro.apache.org) files.

> The `avro` extension was [released as a community extension in late 2024](https://duckdb.org/2024/12/09/duckdb-avro-extension) and became a core extension in early 2025.

#### The `read_avro` Function {#docs:current:core_extensions:avro::the-read_avro-function}

The extension adds a single DuckDB function, `read_avro`. This function can be used like so:

```sql
FROM read_avro('⟨some_file⟩.avro');
```

This function will expose the contents of the Avro file as a DuckDB table. You can then use any arbitrary SQL constructs to further transform this table.

#### File IO {#docs:current:core_extensions:avro::file-io}

The `read_avro` function is integrated into DuckDB's file system abstraction, meaning you can read Avro files directly from, e.g., HTTP or S3 sources. For example:

```sql
FROM read_avro('https://blobs.duckdb.org/data/userdata1.avro');
FROM read_avro('s3://⟨your-bucket⟩/⟨some_file⟩.avro');
```

should "just" work.

You can also *glob* multiple files in a single read call or pass a list of files to the functions:

```sql
FROM read_avro('some_file_*.avro');
FROM read_avro(['some_file_1.avro', 'some_file_2.avro']);
```

If the filenames somehow contain valuable information (as is unfortunately all-too-common), you can pass the `filename` argument to `read_avro`:

```sql
FROM read_avro('some_file_*.avro', filename=true);
```

This will result in an additional column in the result set that contains the actual filename of the Avro file. 

#### Schema Conversion {#docs:current:core_extensions:avro::schema-conversion}

This extension automatically translates the Avro Schema to the DuckDB schema. *All* Avro types can be translated, except for *recursive type definitions*, which DuckDB does not support.

The type mapping is very straightforward except for Avro's "unique" way of handling `NULL`. Unlike other systems, Avro does not treat `NULL` as a possible value in a range of e.g. `INTEGER` but instead represents `NULL` as a union of the actual type with a special `NULL` type. This is different to DuckDB, where any value can be `NULL`. Of course DuckDB also supports `UNION` types, but this would be quite cumbersome to work with.

This extension *simplifies* the Avro schema where possible: An Avro union of any type and the special null type is simplified to just the non-null type. For example, an Avro record of the union type `["int","null"]` becomes a DuckDB `INTEGER`, which just happens to be `NULL` sometimes. Similarly, an Avro union that contains only a single type is converted to the type it contains. For example, an Avro record of the union type `["int"]` also becomes a DuckDB `INTEGER`.

The extension also "flattens" the Avro schema. Avro defines tables as root-level "record" fields, which are the same as DuckDB `STRUCT` fields. For more convenient handling, this extension turns the entries of a single top-level record into top-level columns.

#### Implementation {#docs:current:core_extensions:avro::implementation}

Internally, this extension uses the "official" [Apache Avro C API](https://avro.apache.org/docs/++version++/api/c/), albeit with some minor patching to allow reading of Avro files from memory.

#### Limitations and Future Plans {#docs:current:core_extensions:avro::limitations-and-future-plans}

* This extension currently does not make use of **parallelism** when reading either a single (large) Avro file or when reading a list of files. Adding support for parallelism in the latter case is on the roadmap. 
* There is currently no support for either projection or filter **pushdown**, but this is also planned at a later stage.
* There is currently no support for the Wasm or the Windows-MinGW builds of DuckDB due to issues with the Avro library dependency (sigh again). We plan to fix this eventually.
* As mentioned above, DuckDB cannot express recursive type definitions that Avro has, this is unlikely to ever change.
* There is no support to allow users to provide a separate Avro schema file. This is unlikely to change, all Avro files we have seen so far had their schema embedded.
* There is currently no support for the `union_by_name` flag that other readers in DuckDB support. This is planned for the future.

## AWS Extension {#docs:current:core_extensions:aws}

The `aws` extension adds functionality, e.g., authentication, on top of the `httpfs` extension's [S3 capabilities](#docs:current:core_extensions:httpfs:overview::s3-api), using the AWS SDK.

#### Installing and Loading {#docs:current:core_extensions:aws::installing-and-loading}

The `aws` extension will be transparently [autoloaded](#docs:current:core_extensions:overview::autoloading-extensions) on first use from the official extension repository.
If you would like to install and load it manually, run:

```sql
INSTALL aws;
LOAD aws;
```

> In most cases, the `aws` extension works in conjunction with the [`httpfs` extension](#docs:current:core_extensions:httpfs:overview).

#### Configuration and Authentication {#docs:current:core_extensions:aws::configuration-and-authentication}

The preferred way to configure and authenticate to AWS S3 endpoints is to use [secrets](#docs:current:sql:statements:create_secret).

##### `config` Provider {#docs:current:core_extensions:aws::config-provider}

The default provider, `config` (i.e., user-configured), allows access to the S3 bucket by manually providing a key. For example:

```sql
CREATE OR REPLACE SECRET secret (
    TYPE s3,
    PROVIDER config,
    KEY_ID '⟨AKIAIOSFODNN7EXAMPLE⟩',
    SECRET '⟨wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY⟩',
    REGION '⟨us-east-1⟩'
);
```

> **Tip.** If you get an IO Error (` Connection error for HTTP HEAD`), configure the endpoint explicitly via `ENDPOINT 's3.⟨your-region⟩.amazonaws.com'`{:.language-sql .highlight}.

Now, to query using the above secret, simply query any `s3://` prefixed file:

```sql
SELECT *
FROM 's3://⟨your-bucket⟩/⟨your_file⟩.parquet';
```

##### `credential_chain` Provider {#docs:current:core_extensions:aws::credential_chain-provider}

The `credential_chain` provider allows automatically fetching credentials using mechanisms provided by the AWS SDK. For example, to use the AWS SDK default provider:

```sql
CREATE OR REPLACE SECRET secret (
    TYPE s3,
    PROVIDER credential_chain
);
```

Again, to query a file using the above secret, simply query any `s3://` prefixed file.

DuckDB also allows specifying a specific chain using the `CHAIN` keyword. This takes a semicolon-separated list (` a;b;c`) of providers that will be tried in order. For example:

```sql
CREATE OR REPLACE SECRET secret (
    TYPE s3,
    PROVIDER credential_chain,
    CHAIN 'env;config'
);
```

The possible values for `CHAIN` are the following:

* [`config`](https://sdk.amazonaws.com/cpp/api/LATEST/aws-cpp-sdk-core/html/class_aws_1_1_auth_1_1_profile_config_file_a_w_s_credentials_provider.html)
* [`sts`](https://sdk.amazonaws.com/cpp/api/LATEST/aws-cpp-sdk-core/html/class_aws_1_1_auth_1_1_s_t_s_assume_role_web_identity_credentials_provider.html)
* [`sso`](https://aws.amazon.com/what-is/sso/)
* [`env`](https://sdk.amazonaws.com/cpp/api/LATEST/aws-cpp-sdk-core/html/class_aws_1_1_auth_1_1_environment_a_w_s_credentials_provider.html)
* [`instance`](https://sdk.amazonaws.com/cpp/api/LATEST/aws-cpp-sdk-core/html/class_aws_1_1_auth_1_1_instance_profile_credentials_provider.html)
* [`process`](https://sdk.amazonaws.com/cpp/api/LATEST/aws-cpp-sdk-core/html/class_aws_1_1_auth_1_1_process_credentials_provider.html)

The `credential_chain` provider also allows overriding the automatically fetched config. For example, to automatically load credentials, and then override the region, run:

```sql
CREATE OR REPLACE SECRET secret (
    TYPE s3,
    PROVIDER credential_chain,
    CHAIN config,
    REGION '⟨eu-west-1⟩'
);
```

##### Validation {#docs:current:core_extensions:aws::validation}

The AWS `credential_chain` provider will look for any required credentials during `CREATE SECRET` time, failing if absent/unavailable.

This behavior may be configured via the `VALIDATION` option as follows:

```sql
CREATE OR REPLACE SECRET secret (
    TYPE s3,
    PROVIDER credential_chain,
    VALIDATION 'exists'
);
```

Two validation modes are supported:

* `exists` (default) requires present credentials.
* `none` allows `CREATE SECRET` to succeed for `credential_chains` with no available credentials.

> `VALIDATION 'exists'` validates only the __presence__ of a credential, __not its operational readiness__. Thus, no attempt is made to
> convert into an access token, or perform a read, write, etc.

##### Auto-Refresh {#docs:current:core_extensions:aws::auto-refresh}

Some AWS endpoints require periodic refreshing of the credentials.
This can be specified with the `REFRESH auto` option:

```sql
CREATE SECRET env_test (
    TYPE s3,
    PROVIDER credential_chain,
    REFRESH auto
);
```

#### Legacy Features {#docs:current:core_extensions:aws::legacy-features}

> **Deprecated.** The `load_aws_credentials` function is deprecated.

Prior to version 0.10.0, DuckDB did not have a [Secrets manager](#docs:current:sql:statements:create_secret), to load the credentials automatically, the AWS extension provided
a special function to load the AWS credentials in the [legacy authentication method](#docs:current:core_extensions:httpfs:s3api_legacy_authentication).

| Function | Type | Description |
|---|---|-------|
| `load_aws_credentials` | `PRAGMA` function | Loads the AWS credentials through the [AWS Default Credentials Provider Chain](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials-chain.html) |

##### Load AWS Credentials (Legacy) {#docs:current:core_extensions:aws::load-aws-credentials-legacy}

To load the AWS credentials, run:

```sql
CALL load_aws_credentials();
```



| loaded_access_key_id | loaded_secret_access_key | loaded_session_token | loaded_region |
|----------------------|--------------------------|----------------------|---------------|
| AKIAIOSFODNN7EXAMPLE | `<redacted>`             | NULL                 | us-east-2     |

The function takes a string parameter to specify a specific profile:

```sql
CALL load_aws_credentials('minio-testing-2');
```



| loaded_access_key_id | loaded_secret_access_key | loaded_session_token | loaded_region |
|----------------------|--------------------------|----------------------|---------------|
| minio_duckdb_user_2  | `<redacted>`             | NULL                 | NULL          |

There are several parameters to tweak the behavior of the call:

```sql
CALL load_aws_credentials('minio-testing-2', set_region = false, redact_secret = false);
```



| loaded_access_key_id | loaded_secret_access_key     | loaded_session_token | loaded_region |
|----------------------|------------------------------|----------------------|---------------|
| minio_duckdb_user_2  | minio_duckdb_user_password_2 | NULL                 | NULL          |

## Azure Extension {#docs:current:core_extensions:azure}

The `azure` extension is a loadable extension that adds a filesystem abstraction for [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs) to DuckDB, enabling both reading and writing data.

#### Installing and Loading {#docs:current:core_extensions:azure::installing-and-loading}

The `azure` extension will be transparently [autoloaded](#docs:current:core_extensions:overview::autoloading-extensions) on first use from the official extension repository.
If you would like to install and load it manually, run:

```sql
INSTALL azure;
LOAD azure;
```

#### Usage {#docs:current:core_extensions:azure::usage}

Once the [authentication](#::authentication) is set up, you can query Azure storage as follows:

##### Azure Blob Storage {#docs:current:core_extensions:azure::azure-blob-storage}

Allowed URI schemes: `az` or `azure`

```sql
SELECT count(*)
FROM 'az://⟨my_container⟩/⟨path⟩/⟨my_file⟩.⟨parquet_or_csv⟩';
```

Globs are also supported:

```sql
SELECT *
FROM 'az://⟨my_container⟩/⟨path⟩/*.csv';
```

```sql
SELECT *
FROM 'az://⟨my_container⟩/⟨path⟩/**';
```

Or with a fully qualified path syntax:

```sql
SELECT count(*)
FROM 'az://⟨my_storage_account⟩.blob.core.windows.net/⟨my_container⟩/⟨path⟩/⟨my_file⟩.⟨parquet_or_csv⟩';
```

```sql
SELECT *
FROM 'az://⟨my_storage_account⟩.blob.core.windows.net/⟨my_container⟩/⟨path⟩/*.csv';
```

##### Azure Data Lake Storage (ADLS) {#docs:current:core_extensions:azure::azure-data-lake-storage-adls}

Allowed URI schemes: `abfss`

```sql
SELECT count(*)
FROM 'abfss://⟨my_filesystem⟩/⟨path⟩/⟨my_file⟩.⟨parquet_or_csv⟩';
```

Globs are also supported:

```sql
SELECT *
FROM 'abfss://⟨my_filesystem⟩/⟨path⟩/*.csv';
```

```sql
SELECT *
FROM 'abfss://⟨my_filesystem⟩/⟨path⟩/**';
```

Or with a fully qualified path syntax:

```sql
SELECT count(*)
FROM 'abfss://⟨my_storage_account⟩.dfs.core.windows.net/⟨my_filesystem⟩/⟨path⟩/⟨my_file⟩.⟨parquet_or_csv⟩';
```

```sql
SELECT *
FROM 'abfss://⟨my_storage_account⟩.dfs.core.windows.net/⟨my_filesystem⟩/⟨path⟩/*.csv';
```

#### Writing to Azure Blob Storage {#docs:current:core_extensions:azure::writing-to-azure-blob-storage}

You can write data directly to Azure Blob or ADLSv2 Storage using the [`COPY` statement](#docs:current:sql:statements:copy).

```sql
-- Write query results to a Parquet file on Blob Storage
COPY (SELECT * FROM my_table)
TO 'az://⟨my_container⟩/⟨path⟩/output.parquet';
```

```sql
-- Write a table to a CSV file on ADLSv2 Storage
COPY my_table
TO 'abfss://⟨my_container⟩/⟨path⟩/output.csv';
```

You can also use fully qualified paths:

```sql
COPY my_table
TO 'az://⟨my_storage_account⟩.blob.core.windows.net/⟨my_container⟩/⟨path⟩/output.parquet';
```

#### Configuration {#docs:current:core_extensions:azure::configuration}

Use the following [configuration options](#docs:current:configuration:overview) to control how the extension reads remote files:

| Name | Description | Type | Default |
|:---|:---|:---|:---|
| `azure_http_stats` | Include HTTP info from Azure Storage in the [`EXPLAIN ANALYZE` statement](#docs:current:dev:profiling). | `BOOLEAN` | `false` |
| `azure_read_transfer_concurrency` | Maximum number of threads the Azure client can use for a single parallel read. If `azure_read_transfer_chunk_size` is less than `azure_read_buffer_size` then setting this > 1 will allow the Azure client to do concurrent requests to fill the buffer. | `BIGINT` | `5` |
| `azure_read_transfer_chunk_size` | Maximum size in bytes that the Azure client will read in a single request. It is recommended that this is a factor of `azure_read_buffer_size`. | `BIGINT` | `1024*1024` |
| `azure_read_buffer_size` | Size of the read buffer. It is recommended that this is evenly divisible by `azure_read_transfer_chunk_size`. | `UBIGINT` | `1024*1024` |
| `azure_transport_option_type` | Underlying [adapter](https://github.com/Azure/azure-sdk-for-cpp/blob/main/doc/HttpTransportAdapter.md) to use in the Azure SDK. Valid values are: `default` or `curl`. | `VARCHAR` | `default` |
| `azure_context_caching` | Enable/disable the caching of the underlying Azure SDK HTTP connection in the DuckDB connection context when performing queries. If you suspect that this is causing some side effect, you can try to disable it by setting it to false (not recommended). | `BOOLEAN` | `true` |

> Setting `azure_transport_option_type` explicitly to `curl` will have the following effect:
> * On Linux, this may solve certificate issue (` Error: Invalid Error: Fail to get a new connection for: https://storage_account_name.blob.core.windows.net/. Problem with the SSL CA cert (path? access rights?)`) because when specifying the extension will try to find the bundle certificate in various paths (that is not done by *curl* by default and might be wrong due to static linking).
> * On Windows, this replaces the default adapter (*WinHTTP*) allowing you to use all *curl* capabilities (for example using a socks proxies).
> * On all operating systems, it will honor the following environment variables:
>   * `CURL_CA_INFO`: Path to a PEM encoded file containing the certificate authorities sent to libcurl. Note that this option is known to only work on Linux and might throw if set on other platforms.
>   * `CURL_CA_PATH`: Path to a directory which holds PEM encoded files, containing the certificate authorities sent to libcurl.

Example:

```sql
SET azure_http_stats = false;
SET azure_read_transfer_concurrency = 5;
SET azure_read_transfer_chunk_size = 1_048_576;
SET azure_read_buffer_size = 1_048_576;
```

#### Authentication {#docs:current:core_extensions:azure::authentication}

The Azure extension has two ways to configure the authentication. The preferred way is to use Secrets.

##### Authentication with Secret {#docs:current:core_extensions:azure::authentication-with-secret}

Multiple [Secret Providers](#docs:current:configuration:secrets_manager::secret-providers) are available for the Azure extension:

* If you need to define different secrets for different storage accounts, use the [`SCOPE` configuration](#docs:current:configuration:secrets_manager::creating-multiple-secrets-for-the-same-service-type). Note that the `SCOPE` requires a trailing slash (` SCOPE 'azure://some_container/'`).
* If you use fully qualified path then the `ACCOUNT_NAME` attribute is optional.

###### `CONFIG` Provider {#docs:current:core_extensions:azure::config-provider}

The default provider, `CONFIG` (i.e., user-configured), allows access to the storage account using a connection string or anonymously. For example:

```sql
CREATE SECRET secret1 (
    TYPE azure,
    CONNECTION_STRING '⟨value⟩'
);
```

If you do not use authentication, you still need to specify the storage account name. For example:

```sql
CREATE SECRET secret2 (
    TYPE azure,
    PROVIDER config,
    ACCOUNT_NAME '⟨storage_account_name⟩'
);
```

The default `PROVIDER` is `CONFIG`.

###### `credential_chain` Provider {#docs:current:core_extensions:azure::credential_chain-provider}

The `credential_chain` provider allows connecting using credentials automatically fetched by the Azure SDK via the Azure credential chain.
By default, the `DefaultAzureCredential` chain used, which tries credentials according to the order specified by the [Azure documentation](https://learn.microsoft.com/en-us/javascript/api/@azure/identity/defaultazurecredential?view=azure-node-latest#@azure-identity-defaultazurecredential-constructor).
For example:

```sql
CREATE SECRET secret3 (
    TYPE azure,
    PROVIDER credential_chain,
    ACCOUNT_NAME '⟨storage_account_name⟩'
);
```

DuckDB also allows specifying a specific chain using the `CHAIN` keyword. This takes a semicolon-separated list (` a;b;c`) of providers that will be tried in order. For example:

```sql
CREATE SECRET secret4 (
    TYPE azure,
    PROVIDER credential_chain,
    CHAIN 'cli;env',
    ACCOUNT_NAME '⟨storage_account_name⟩'
);
```

The possible values are the following:
[`cli`](https://learn.microsoft.com/en-us/cli/azure/authenticate-azure-cli);
[`managed_identity`](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview);
[`workload_identity`](https://learn.microsoft.com/en-us/entra/workload-id/workload-identities-overview);
[`env`](https://github.com/Azure/azure-sdk-for-cpp/blob/azure-identity_1.6.0/sdk/identity/azure-identity/README.md#environment-variables);
[`default`](https://github.com/Azure/azure-sdk-for-cpp/blob/azure-identity_1.6.0/sdk/identity/azure-identity/README.md#defaultazurecredential);

If no explicit `CHAIN` is provided, the default one will be [`default`](https://github.com/Azure/azure-sdk-for-cpp/blob/azure-identity_1.6.0/sdk/identity/azure-identity/README.md#defaultazurecredential)

###### Managed Identity {#docs:current:core_extensions:azure::managed-identity}

Managed Identity (MI) can be used gracefully and automatically via the `credential_chain`. In typical 
cases where the executor has a single MI available, no configuration is needed. 

If your execution environment has multiple Identities, use the `MANAGED_IDENTITY` provider and specify
which identity to use. This provider allows identity specification via one of 
`CLIENT_ID`, `OBJECT_ID` or `RESOURCE_ID`, e.g.:

```sql
CREATE SECRET secret1 (
    TYPE AZURE,
    PROVIDER MANAGED_IDENTITY,
    ACCOUNT_NAME '⟨storage account name⟩',
    CLIENT_ID '⟨used-assigned managed identity client id⟩'
);
```

The provider may be used without specifying an ID; if only a single ID is available this provider 
will function identically to the `credential_chain` provider, and use the single available ID. If 
multiple IDs are available, behavior is undefined (or more specifically, defined by the Azure SDK)
– therefore we recommend explicit Identity setting in this situation.


###### `SERVICE_PRINCIPAL` Provider {#docs:current:core_extensions:azure::service_principal-provider}

The `SERVICE_PRINCIPAL` provider allows connecting using a [Azure Service Principal (SPN)](https://learn.microsoft.com/en-us/entra/architecture/service-accounts-principal).

Either with a secret:

```sql
CREATE SECRET azure_spn (
    TYPE azure,
    PROVIDER service_principal,
    TENANT_ID '⟨tenant_id⟩',
    CLIENT_ID '⟨client_id⟩',
    CLIENT_SECRET '⟨client_secret⟩',
    ACCOUNT_NAME '⟨storage_account_name⟩'
);
```

Or with a certificate:

```sql
CREATE SECRET azure_spn_cert (
    TYPE azure,
    PROVIDER service_principal,
    TENANT_ID '⟨tenant_id⟩',
    CLIENT_ID '⟨client_id⟩',
    CLIENT_CERTIFICATE_PATH '⟨client_cert_path⟩',
    ACCOUNT_NAME '⟨storage_account_name⟩'
);
```

###### Configuring a Proxy {#docs:current:core_extensions:azure::configuring-a-proxy}

To configure proxy information when using secrets, you can add `HTTP_PROXY`, `PROXY_USER_NAME` and `PROXY_PASSWORD` in the secret definition. For example:

```sql
CREATE SECRET secret5 (
    TYPE azure,
    CONNECTION_STRING '⟨value⟩',
    HTTP_PROXY 'http://localhost:3128',
    PROXY_USER_NAME 'john',
    PROXY_PASSWORD 'doe'
);
```

> * When using secrets, the `HTTP_PROXY` environment variable will still be honored except if you provide an explicit value for it.
> * When using secrets, the `SET` variable of the *Authentication with variables* session will be ignored.
> * For the Azure `credential_chain` provider, the actual token is fetched at query time, not when the secret is created.

##### Authentication with Variables (Deprecated) {#docs:current:core_extensions:azure::authentication-with-variables-deprecated}

```sql
SET variable_name = variable_value;
```

Where `variable_name` can be one of the following:

| Name | Description | Type | Default |
|:---|:---|:---|:---|
| `azure_storage_connection_string` | Azure connection string, used for authenticating and configuring Azure requests. | `STRING` | - |
| `azure_account_name` | Azure account name, when set, the extension will attempt to automatically detect credentials (not used if you pass the connection string). | `STRING` | - |
| `azure_endpoint` | Override the Azure endpoint for when the Azure credential providers are used. | `STRING` | `blob.core.windows.net` |
| `azure_credential_chain`| Ordered list of Azure credential providers, in string format separated by `;`. For example: `'cli;managed_identity;env'`. See the list of possible values in the [`credential_chain` provider section](#::credential_chain-provider). Not used if you pass the connection string. | `STRING` | - |
| `azure_http_proxy` | Proxy to use when login & performing request to Azure. | `STRING` | `HTTP_PROXY` environment variable (if set). |
| `azure_proxy_user_name` | HTTP proxy username if needed. | `STRING` | - |
| `azure_proxy_password` | HTTP proxy password if needed. | `STRING` | - |

#### Additional Information {#docs:current:core_extensions:azure::additional-information}

##### Logging {#docs:current:core_extensions:azure::logging}

The Azure extension relies on the Azure SDK to connect to Azure Blob storage and supports printing the SDK logs to the console.
To control the log level, set the [`AZURE_LOG_LEVEL`](https://github.com/Azure/azure-sdk-for-cpp/blob/main/sdk/core/azure-core/README.md#sdk-log-messages) environment variable.

For instance, verbose logs can be enabled in Python as follows:

```python
import os
import duckdb

os.environ["AZURE_LOG_LEVEL"] = "verbose"

duckdb.sql("CREATE SECRET myaccount (TYPE azure, PROVIDER credential_chain, SCOPE 'az://myaccount.blob.core.windows.net/')")
duckdb.sql("SELECT count(*) FROM 'az://myaccount.blob.core.windows.net/path/to/blob.parquet'")
```

##### Difference between ADLS and Blob Storage {#docs:current:core_extensions:azure::difference-between-adls-and-blob-storage}

Even though ADLS implements similar functionality as the Blob storage, there are some important performance benefits to using the ADLS endpoints for globbing, especially when using (complex) glob patterns.

To demonstrate, let's look at an example of how a glob is performed internally using the Blob and ADLS endpoints, respectively.

Using the following filesystem:

```text
root
├── l_receipmonth=1997-10
│   ├── l_shipmode=AIR
│   │   └── data_0.csv
│   ├── l_shipmode=SHIP
│   │   └── data_0.csv
│   └── l_shipmode=TRUCK
│       └── data_0.csv
├── l_receipmonth=1997-11
│   ├── l_shipmode=AIR
│   │   └── data_0.csv
│   ├── l_shipmode=SHIP
│   │   └── data_0.csv
│   └── l_shipmode=TRUCK
│       └── data_0.csv
└── l_receipmonth=1997-12
    ├── l_shipmode=AIR
    │   └── data_0.csv
    ├── l_shipmode=SHIP
    │   └── data_0.csv
    └── l_shipmode=TRUCK
        └── data_0.csv
```

The following query is performed through the Blob endpoint:

```sql
SELECT count(*)
FROM 'az://root/l_receipmonth=1997-*/l_shipmode=SHIP/*.csv';
```

It will perform the following steps:

* List all the files with the prefix `root/l_receipmonth=1997-`
    * `root/l_receipmonth=1997-10/l_shipmode=SHIP/data_0.csv`
    * `root/l_receipmonth=1997-10/l_shipmode=AIR/data_0.csv`
    * `root/l_receipmonth=1997-10/l_shipmode=TRUCK/data_0.csv`
    * `root/l_receipmonth=1997-11/l_shipmode=SHIP/data_0.csv`
    * `root/l_receipmonth=1997-11/l_shipmode=AIR/data_0.csv`
    * `root/l_receipmonth=1997-11/l_shipmode=TRUCK/data_0.csv`
    * `root/l_receipmonth=1997-12/l_shipmode=SHIP/data_0.csv`
    * `root/l_receipmonth=1997-12/l_shipmode=AIR/data_0.csv`
    * `root/l_receipmonth=1997-12/l_shipmode=TRUCK/data_0.csv`
* Filter the result with the requested pattern `root/l_receipmonth=1997-*/l_shipmode=SHIP/*.csv`
    * `root/l_receipmonth=1997-10/l_shipmode=SHIP/data_0.csv`
    * `root/l_receipmonth=1997-11/l_shipmode=SHIP/data_0.csv`
    * `root/l_receipmonth=1997-12/l_shipmode=SHIP/data_0.csv`

Meanwhile, the same query can be performed through the datalake endpoint as follows:

```sql
SELECT count(*)
FROM 'abfss://root/l_receipmonth=1997-*/l_shipmode=SHIP/*.csv';
```

This will perform the following steps:

* List all directories in `root/`
    * `root/l_receipmonth=1997-10`
    * `root/l_receipmonth=1997-11`
    * `root/l_receipmonth=1997-12`
* Filter and list subdirectories: `root/l_receipmonth=1997-10`, `root/l_receipmonth=1997-11`, `root/l_receipmonth=1997-12`
    * `root/l_receipmonth=1997-10/l_shipmode=SHIP`
    * `root/l_receipmonth=1997-10/l_shipmode=AIR`
    * `root/l_receipmonth=1997-10/l_shipmode=TRUCK`
    * `root/l_receipmonth=1997-11/l_shipmode=SHIP`
    * `root/l_receipmonth=1997-11/l_shipmode=AIR`
    * `root/l_receipmonth=1997-11/l_shipmode=TRUCK`
    * `root/l_receipmonth=1997-12/l_shipmode=SHIP`
    * `root/l_receipmonth=1997-12/l_shipmode=AIR`
    * `root/l_receipmonth=1997-12/l_shipmode=TRUCK`
* Filter and list subdirectories: `root/l_receipmonth=1997-10/l_shipmode=SHIP`, `root/l_receipmonth=1997-11/l_shipmode=SHIP`, `root/l_receipmonth=1997-12/l_shipmode=SHIP`
    * `root/l_receipmonth=1997-10/l_shipmode=SHIP/data_0.csv`
    * `root/l_receipmonth=1997-11/l_shipmode=SHIP/data_0.csv`
    * `root/l_receipmonth=1997-12/l_shipmode=SHIP/data_0.csv`

As you can see because the Blob endpoint does not support the notion of directories, the filter can only be performed after the listing, whereas the ADLS endpoint will list files recursively. Especially with higher partition/directory counts, the performance difference can be very significant.

## Delta Extension {#docs:current:core_extensions:delta}

The `delta` extension adds support for the [Delta Lake open-source storage format](https://delta.io/). It is built using the [Delta Kernel](https://github.com/delta-incubator/delta-kernel-rs). The extension offers **read support** for Delta tables, both local and remote.

For implementation details, see the [announcement blog post](https://duckdb.org/2024/06/10/delta).

> **Warning.** We are aware of a regression in Azure Onelake which appears to be a consequence of a change in `delta-kernel-rs`. You can track the issue [on GitHub](https://github.com/duckdb/duckdb-delta/issues/307).

> To connect to Unity Catalog, DuckDB has the [`unity_catalog` experimental extension](https://github.com/duckdb/unity_catalog).
> Please note that this extension is a proof-of-concept and not production-ready.

#### Installing and Loading {#docs:current:core_extensions:delta::installing-and-loading}

The `delta` extension will be transparently [autoloaded](#docs:current:extensions:overview::autoloading-extensions) on first use from the official extension repository.
If you would like to install and load it manually, run:

```sql
INSTALL delta;
LOAD delta;
```

#### Usage {#docs:current:core_extensions:delta::usage}

To scan a local Delta table, run:

```sql
SELECT *
FROM delta_scan('file:///some/path/on/local/machine');
```

##### Reading from an S3 Bucket {#docs:current:core_extensions:delta::reading-from-an-s3-bucket}

To scan a Delta table in an [S3 bucket](#docs:current:core_extensions:httpfs:s3api), run:

```sql
SELECT *
FROM delta_scan('s3://some/delta/table');
```

For authenticating to S3 buckets, DuckDB [Secrets](#docs:current:configuration:secrets_manager) are supported:

```sql
CREATE SECRET (
    TYPE s3,
    PROVIDER credential_chain
);
SELECT *
FROM delta_scan('s3://some/delta/table/with/auth');
```

To scan public buckets on S3, you may need to pass the correct region by creating a secret containing the region of your public S3 bucket:

```sql
CREATE SECRET (
    TYPE s3,
    REGION 'my-region'
);
SELECT *
FROM delta_scan('s3://some/public/table/in/my-region');
```

##### Reading from Azure Blob Storage {#docs:current:core_extensions:delta::reading-from-azure-blob-storage}

To scan a Delta table in an [Azure Blob Storage bucket](#docs:current:core_extensions:azure::azure-blob-storage), run:

```sql
SELECT *
FROM delta_scan('az://my-container/my-table');
```

For authenticating to Azure Blob Storage, DuckDB [Secrets](#docs:current:configuration:secrets_manager) are supported:

```sql
CREATE SECRET (
    TYPE azure,
    PROVIDER credential_chain
);
SELECT *
FROM delta_scan('az://my-container/my-table-with-auth');
```

##### Credential Chains in Delta {#docs:current:core_extensions:delta::credential-chains-in-delta}

DuckDB Delta uses `delta-kernel-rs` and `object_store` for some network operations.
These systems have a different ordering (and inclusion defaults) for credential
chains. If your system has multiple credential sources available, e.g., both
Service Principal via the environment and a CLI-based option, credential loading behavior
may be inconsistent.

To avoid ambiguities, we recommend that you configure exactly one available
credential type in your production chain secrets.

#### Features {#docs:current:core_extensions:delta::features}

While the `delta` extension is still experimental, many (scanning) features and optimizations are already supported:

* multithreaded scans and Parquet metadata reading
* data skipping/filter pushdown
    * skipping row groups in file (based on Parquet metadata)
    * skipping complete files (based on Delta partition information)
* projection pushdown
* scanning tables with deletion vectors
* all primitive types
* structs
* S3 support with secrets

More optimizations are going to be released in the future.

#### Supported Platforms {#docs:current:core_extensions:delta::supported-platforms}

The `delta` extension currently only supports the following platforms:

* Linux AMD64 (x86_64 and ARM64): `linux_amd64` and `linux_arm64`
* macOS Intel and Apple Silicon: `osx_amd64` and `osx_arm64`
* Windows AMD64: `windows_amd64`

Support for the [other DuckDB platforms](#docs:current:extensions:extension_distribution::platforms) is work-in-progress.

#### Using delta-rs with DuckDB {#docs:current:core_extensions:delta::using-delta-rs-with-duckdb}

In this example, we create a Delta table with the `delta-rs` Python binding, then we use the `delta` extension of DuckDB to read it. We also showcase how to do other read operations with DuckDB, like reading the change data feed using the Arrow zero-copy integration. This operation can also be lazy if reading bigger data by using [Arrow Datasets](https://delta-io.github.io/delta-rs/integrations/delta-lake-arrow/).



<details markdown='1'>
<summary markdown='span'>
Click here to see the full example.
</summary>

```python
import deltalake as dl
import pyarrow as pa

# Create a delta table and read it with DuckDB Delta extension
dl.write_deltalake(
    "tmp/some_table",
    pa.table({
        "id": [1, 2, 3],
        "value": ["a", "b", "c"]
    })
)
with duckdb.connect() as conn:
    conn.execute("""
        INSTALL delta;
        LOAD delta;
    """)
    conn.sql("""
        SELECT * FROM delta_scan('tmp/some_table')
    """).show()

# Append some data and read the data change feed using the PyArrow integration
dl.write_deltalake(
    "tmp/some_table",
    pa.table({
        "id": [4, 5],
        "value": ["d", "e"]
    }),
    mode="append"
)
table = dl.DeltaTable("tmp/some_table").load_cdf(starting_version=1, ending_version=2)
with duckdb.connect() as conn:
    conn.register("t", table)
    conn.sql("SELECT * FROM t").show()
```

</details>


## DuckLake {#docs:current:core_extensions:ducklake}

> DuckLake 1.0 was been in April 2026.
> Read the [announcement blog post](https://ducklake.select/2026/04/13/ducklake-10/).

The `ducklake` extension adds support for attaching to databases stored in the [DuckLake format](http://ducklake.select/). 
The complete documentation of this extension is available at the [DuckLake website](https://ducklake.select/docs/stable/duckdb/introduction).

#### Installing and Loading {#docs:current:core_extensions:ducklake::installing-and-loading}

To install `ducklake`, run:

```sql
INSTALL ducklake;
```

The `ducklake` extension will be transparently [autoloaded](#docs:current:core_extensions:overview::autoloading-extensions) on first use in an `ATTACH` clause.
If you would like to load it manually, run:

```sql
LOAD ducklake;
```

#### Usage {#docs:current:core_extensions:ducklake::usage}

```sql
ATTACH 'ducklake:metadata.ducklake' AS my_ducklake (DATA_PATH 'data_files');
USE my_ducklake;
```

#### Tables {#docs:current:core_extensions:ducklake::tables}

In DuckDB, the `ducklake` extension stores the [catalog tables](http://ducklake.select/docs/stable/specification/tables/overview) for a DuckLake named
`my_ducklake`{:.language-sql .highlight} in the
`__ducklake_metadata_⟨my_ducklake⟩`{:.language-sql .highlight} catalog.

#### Functions {#docs:current:core_extensions:ducklake::functions}

Note that DuckLake registers several functions.
These should be called with the catalog name as the first argument, e.g.:

```sql
FROM ducklake_snapshots('my_ducklake');
```

```text
┌─────────────┬────────────────────────────┬────────────────┬──────────────────────────┐
│ snapshot_id │       snapshot_time        │ schema_version │         changes          │
│    int64    │  timestamp with time zone  │     int64      │ map(varchar, varchar[])  │
├─────────────┼────────────────────────────┼────────────────┼──────────────────────────┤
│      0      │ 2025-05-26 11:41:10.838+02 │       0        │ {schemas_created=[main]} │
└─────────────┴────────────────────────────┴────────────────┴──────────────────────────┘
```

##### `ducklake_snapshots` {#docs:current:core_extensions:ducklake::ducklake_snapshots}

Returns the snapshots stored in the DuckLake catalog name `catalog`.

| Parameter name | Parameter type | Named parameter | Description |
| -------------- | -------------- | --------------- | ----------- |
| `catalog`      | `VARCHAR`      | no              |             |

The information is encoded into a table with the following schema:

| Column name      | Column type                |
| ---------------- | -------------------------- |
| `snapshot_id`    | `BIGINT`                   |
| `snapshot_time`  | `TIMESTAMP WITH TIME ZONE` |
| `schema_version` | `BIGINT`                   |
| `changes`        | `MAP(VARCHAR, VARCHAR[])`  |

##### `ducklake_table_info` {#docs:current:core_extensions:ducklake::ducklake_table_info}

The `ducklake_table_info` function returns information on the tables stored in the DuckLake catalog named `catalog`.

| Parameter name | Parameter type | Named parameter | Description |
| -------------- | -------------- | --------------- | ----------- |
| `catalog`      | `VARCHAR`      | no              |             |

The information is encoded into a table with the following schema:

| Column name              | Column type |
| ------------------------ | ----------- |
| `table_name`             | `VARCHAR`   |
| `schema_id`              | `BIGINT`    |
| `table_id`               | `BIGINT`    |
| `table_uuid`             | `UUID`      |
| `file_count`             | `BIGINT`    |
| `file_size_bytes`        | `BIGINT`    |
| `delete_file_count`      | `BIGINT`    |
| `delete_file_size_bytes` | `BIGINT`    |

##### `ducklake_table_insertions` {#docs:current:core_extensions:ducklake::ducklake_table_insertions}

The `ducklake_table_insertions` function returns the rows inserted in a given table between snapshots of given versions or timestamps.
The function has two variants, depending on whether `start_snapshot` and `end_snapshot` have types `BIGINT` or `TIMESTAMP WITH TIME ZONE`.

| Parameter name   | Parameter type                        | Named parameter | Description |
| ---------------- | ------------------------------------- | --------------- | ----------- |
| `catalog`        | `VARCHAR`                             | no              |             |
| `schema_name`    | `VARCHAR`                             | no              |             |
| `table_name`     | `VARCHAR`                             | no              |             |
| `start_snapshot` | `BIGINT` / `TIMESTAMP WITH TIME ZONE` | no              |             |
| `end_snapshot`   | `BIGINT` / `TIMESTAMP WITH TIME ZONE` | no              |             |

The schema of the table returned by the function is equivalent to that of the table `table_name`.

##### `ducklake_table_deletions` {#docs:current:core_extensions:ducklake::ducklake_table_deletions}

The `ducklake_table_deletions` function returns the rows deleted from a given table between snapshots of given versions or timestamps.
The function has two variants, depending on whether `start_snapshot` and `end_snapshot` have types `BIGINT` or `TIMESTAMP WITH TIME ZONE`.

| Parameter name   | Parameter type                        | Named parameter | Description |
| ---------------- | ------------------------------------- | --------------- | ----------- |
| `catalog`        | `VARCHAR`                             | no              |             |
| `schema_name`    | `VARCHAR`                             | no              |             |
| `table_name`     | `VARCHAR`                             | no              |             |
| `start_snapshot` | `BIGINT` / `TIMESTAMP WITH TIME ZONE` | no              |             |
| `end_snapshot`   | `BIGINT` / `TIMESTAMP WITH TIME ZONE` | no              |             |

The schema of the table returned by the function is equivalent to that of the table `table_name`.

##### `ducklake_table_changes` {#docs:current:core_extensions:ducklake::ducklake_table_changes}

The `ducklake_table_changes` function returns the rows changed in a given table between snapshots of given versions or timestamps.
The function has two variants, depending on whether `start_snapshot` and `end_snapshot` have types `BIGINT` or `TIMESTAMP WITH TIME ZONE`.

| Parameter name   | Parameter type                        | Named parameter | Description |
| ---------------- | ------------------------------------- | --------------- | ----------- |
| `catalog`        | `VARCHAR`                             | no              |             |
| `schema_name`    | `VARCHAR`                             | no              |             |
| `table_name`     | `VARCHAR`                             | no              |             |
| `start_snapshot` | `BIGINT` / `TIMESTAMP WITH TIME ZONE` | no              |             |
| `end_snapshot`   | `BIGINT` / `TIMESTAMP WITH TIME ZONE` | no              |             |

The schema of the table returned by the function contains the following three columns plus the schema of the table `table_name`.

| Column name   | Column type | Description                              |
| ------------- | ----------- | ---------------------------------------- |
| `snapshot_id` | `BIGINT`    |                                          |
| `rowid`       | `BIGINT`    |                                          |
| `change_type` | `VARCHAR`   | The type of change: `insert` or `delete` |

#### Commands {#docs:current:core_extensions:ducklake::commands}

##### `ducklake_cleanup_old_files` {#docs:current:core_extensions:ducklake::ducklake_cleanup_old_files}

The `ducklake_cleanup_old_files` function cleans up old files in the DuckLake denoted by `catalog`.
Upon success, it returns a table with a single column (` Success`) and 0 rows.

| Parameter name | Parameter type             | Named parameter | Description |
| -------------- | -------------------------- | --------------- | ----------- |
| `catalog`      | `VARCHAR`                  | no              |             |
| `cleanup_all`  | `BOOLEAN`                  | yes             |             |
| `dry_run`      | `BOOLEAN`                  | yes             |             |
| `older_than`   | `TIMESTAMP WITH TIME ZONE` | yes             |             |

##### `ducklake_expire_snapshots` {#docs:current:core_extensions:ducklake::ducklake_expire_snapshots}

The `ducklake_expire_snapshots` function expires snapshots with the versions specified by the `versions` parameter or the ones older than the `older_than` parameter.
Upon success, it returns a table with a single column (` Success`) and 0 rows.

| Parameter name | Parameter type             | Named parameter | Description |
| -------------- | -------------------------- | --------------- | ----------- |
| `catalog`      | `VARCHAR`                  | no              |             |
| `versions`     | `UBIGINT[]`                | yes             |             |
| `older_than`   | `TIMESTAMP WITH TIME ZONE` | yes             |             |

##### `ducklake_merge_adjacent_files` {#docs:current:core_extensions:ducklake::ducklake_merge_adjacent_files}

The `ducklake_merge_adjacent_files` function merges adjacent files in the storage.
Upon success, it returns a table with a single column (` Success`) and 0 rows.

| Parameter name | Parameter type | Named parameter | Description |
| -------------- | -------------- | --------------- | ----------- |
| `catalog`      | `VARCHAR`      | no              |             |

#### Compatibility Matrix {#docs:current:core_extensions:ducklake::compatibility-matrix}

The DuckLake specification and the `ducklake` DuckDB extension are currently released together.
See the [Compatibility Matrix](https://ducklake.select/release_calendar#compatibility-matrix).

## Encodings Extension {#docs:current:core_extensions:encodings}

The `encodings` extension adds support for reading CSVs using more than 1,000 character encodings.

For a complete list of supported `encodings`, see [All Supported Encodings](#::all-supported-encodings). For detailed information on character encoding, see the [ICU data repository](https://github.com/unicode-org/icu-data/tree/main/charset/data/ucm).

#### Installing and Loading {#docs:current:core_extensions:encodings::installing-and-loading}

```sql
INSTALL encodings;
LOAD encodings;
```

#### Usage {#docs:current:core_extensions:encodings::usage}

Refer to the `encoding` while reading from files.

To read a `.csv` file with `shift_jis` encoding:

```sql
FROM read_csv('my_shift_jis.csv', encoding = 'shift_jis');
```

To read a `.csv` file with `windows-1251-2000` encoding:

```sql
FROM read_csv('my_windows_1251_2000.csv', encoding = 'windows-1251-2000');
```

To read a `.csv` file with `windows-1252-2000` encoding:

```sql
FROM read_csv('my_windows_1252_2000.csv', encoding = 'windows-1252-2000');
```

To read a `.csv` file with `EUC_CN` encoding:
```sql
FROM read_csv('my_euc_cn.csv', encoding = 'EUC_CN');
```

#### All Supported Encodings {#docs:current:core_extensions:encodings::all-supported-encodings}

The following table is an alphabetized list of all `encoding` values supported by DuckDB using the `encodings` core extension:

|   Id | Encoding                      |
|-----:|:------------------------------|
|1|`5601`|
|2|`8859_1`|
|3|`8859_10`|
|4|`8859_15`|
|5|`8859_2`|
|6|`8859_3`|
|7|`8859_4`|
|8|`8859_5`|
|9|`8859_6`|
|10|`8859_7`|
|11|`8859_8`|
|12|`8859_9`|
|13|`aix-IBM_udcJP-4.3.6`|
|14|`ANSI_X3.110`|
|15|`ascii`|
|16|`ASMO_449`|
|17|`BALTIC`|
|18|`big5`|
|19|`CNS11643.1986_1`|
|20|`CNS11643.1986_2`|
|21|`CNS-11643-1992`|
|22|`cp037`|
|23|`cp1026`|
|24|`CP1250`|
|25|`CP1251`|
|26|`CP1252`|
|27|`CP1253`|
|28|`CP1254`|
|29|`CP1255`|
|30|`CP1256`|
|31|`CP1257`|
|32|`CP1258`|
|33|`cp273`|
|34|`cp424`|
|35|`cp437`|
|36|`cp500`|
|37|`CP737`|
|38|`CP775`|
|39|`cp850`|
|40|`cp852`|
|41|`cp855`|
|42|`cp857`|
|43|`cp860`|
|44|`cp861`|
|45|`cp862`|
|46|`cp863`|
|47|`cp864`|
|48|`cp865`|
|49|`cp866`|
|50|`cp869`|
|51|`cp949`|
|52|`CSN_369103`|
|53|`CWI`|
|54|`DEC_MCS`|
|55|`EBCDIC_AT_DE`|
|56|`EBCDIC_AT_DE_A`|
|57|`EBCDIC_CA_FR`|
|58|`EBCDIC_DK_NO`|
|59|`EBCDIC_DK_NO_A`|
|60|`EBCDIC_ES`|
|61|`EBCDIC_ES_A`|
|62|`EBCDIC_ES_S`|
|63|`EBCDIC_FI_SE`|
|64|`EBCDIC_FI_SE_A`|
|65|`EBCDIC_FR`|
|66|`EBCDIC_IS_FRISS`|
|67|`EBCDIC_IT`|
|68|`EBCDIC_PT`|
|69|`EBCDIC_UK`|
|70|`EBCDIC_US`|
|71|`EUC_CN`|
|72|`EUC_JP`|
|73|`EUC_KR`|
|74|`EUC_TW`|
|75|`euc-jp-2007`|
|76|`eucTH`|
|77|`euc-tw-2014`|
|78|`gb18030`|
|79|`glibc-ANSI_X3.110-2.3.3`|
|80|`glibc-ARMSCII_8-2.3.3`|
|81|`glibc-BIG5-2.3.3`|
|82|`glibc-BIG5HKSCS-2.3.3`|
|83|`glibc-BS_4730-2.3.3`|
|84|`glibc-CP10007-2.3.3`|
|85|`glibc-CP1125-2.3.3`|
|86|`glibc-CP932-2.3.3`|
|87|`glibc-CSA_Z243.4_1985_1-2.3.3`|
|88|`glibc-CSA_Z243.4_1985_2-2.3.3`|
|89|`glibc-DIN_66003-2.3.3`|
|90|`glibc-DS_2089-2.3.3`|
|91|`glibc-ECMA_CYRILLIC-2.3.3`|
|92|`glibc-ES-2.3.3`|
|93|`glibc-ES2-2.3.3`|
|94|`glibc-EUC_CN-2.3.3`|
|95|`glibc-EUC_JP_MS-2.3.3`|
|96|`glibc-EUC_JP-2.3.3`|
|97|`glibc-EUC_KR-2.3.3`|
|98|`glibc-GB_1988_80-2.3.3`|
|99|`glibc-GBK-2.3.3`|
|100|`glibc-GEORGIAN_ACADEMY-2.3.3`|
|101|`glibc-GEORGIAN_PS-2.3.3`|
|102|`glibc-IBM1046-2.3.3`|
|103|`glibc-IBM1124-2.3.3`|
|104|`glibc-IBM1129-2.3.3`|
|105|`glibc-IBM1132-2.3.3`|
|106|`glibc-IBM1133-2.3.3`|
|107|`glibc-IBM1160-2.3.3`|
|108|`glibc-IBM1161-2.3.3`|
|109|`glibc-IBM1162-2.3.3`|
|110|`glibc-IBM1163-2.3.3`|
|111|`glibc-IBM1164-2.3.3`|
|112|`glibc-IBM856-2.3.3`|
|113|`glibc-IBM864-2.3.3`|
|114|`glibc-IBM866NAV-2.3.3`|
|115|`glibc-IBM870-2.3.3`|
|116|`glibc-IBM874-2.3.3`|
|117|`glibc-IBM922-2.3.3`|
|118|`glibc-IBM943-2.3.3`|
|119|`glibc-ISIRI_3342-2.3.3`|
|120|`glibc-ISO_5428-2.3.3`|
|121|`glibc-ISO_6937-2.3.3`|
|122|`glibc-ISO_8859_13-2.3.3`|
|123|`glibc-ISO_8859_16-2.3.3`|
|124|`glibc-ISO_8859_7-2.3.3`|
|125|`glibc-ISO_8859_8-2.3.3`|
|126|`glibc-ISO_IR_209-2.3.3`|
|127|`glibc-IT-2.3.3`|
|128|`glibc-JIS_C6220_1969_RO-2.3.3`|
|129|`glibc-JIS_C6229_1984_B-2.3.3`|
|130|`glibc-JOHAB-2.3.3`|
|131|`glibc-JUS_I.B1.002-2.3.3`|
|132|`glibc-KOI8_R-2.3.3`|
|133|`glibc-KOI8_T-2.3.3`|
|134|`glibc-KOI8_U-2.3.3`|
|135|`glibc-KSC5636-2.3.3`|
|136|`glibc-MAC_SAMI-2.3.3`|
|137|`glibc-MACINTOSH-2.3.3`|
|138|`glibc-MSZ_7795.3-2.3.3`|
|139|`glibc-NC_NC00_10-2.3.3`|
|140|`glibc-NF_Z_62_010_1973-2.3.3`|
|141|`glibc-NF_Z_62_010-2.3.3`|
|142|`glibc-NS_4551_1-2.3.3`|
|143|`glibc-NS_4551_2-2.3.3`|
|144|`glibc-PT154-2.3.3`|
|145|`glibc-PT-2.3.3`|
|146|`glibc-PT2-2.3.3`|
|147|`glibc-RK1048-2.3.3`|
|148|`glibc-SEN_850200_B-2.3.3`|
|149|`glibc-SEN_850200_C-2.3.3`|
|150|`glibc-SJIS-2.3.3`|
|151|`glibc-T.61_8BIT-2.3.3`|
|152|`glibc-UHC-2.3.3`|
|153|`glibc-VISCII-2.3.3`|
|154|`glibc-WIN_SAMI_2-2.3.3`|
|155|`GOST_19768_74`|
|156|`GREEK_CCITT`|
|157|`GREEK7`|
|158|`GREEK7_OLD`|
|159|`HP_ROMAN8`|
|160|`hpux-big5-11.11`|
|161|`hpux-cp1140-11.11`|
|162|`hpux-cp1141-11.11`|
|163|`hpux-cp1142-11.11`|
|164|`hpux-cp1143-11.11`|
|165|`hpux-cp1144-11.11`|
|166|`hpux-cp1145-11.11`|
|167|`hpux-cp1146-11.11`|
|168|`hpux-cp1147-11.11`|
|169|`hpux-cp1148-11.11`|
|170|`hpux-cp1149-11.11`|
|171|`hpux-cp1250-11.11`|
|172|`hpux-cp1251-11.11`|
|173|`hpux-cp1252-11.11`|
|174|`hpux-cp1253-11.11`|
|175|`hpux-cp1254-11.11`|
|176|`hpux-cp1255-11.11`|
|177|`hpux-cp1256-11.11`|
|178|`hpux-cp1257-11.11`|
|179|`hpux-cp1258-11.11`|
|180|`hpux-cp437-11.11`|
|181|`hpux-cp737-11.11`|
|182|`hpux-cp775-11.11`|
|183|`hpux-cp850-11.11`|
|184|`hpux-cp852-11.11`|
|185|`hpux-cp855-11.11`|
|186|`hpux-cp857-11.11`|
|187|`hpux-cp860-11.11`|
|188|`hpux-cp861-11.11`|
|189|`hpux-cp862-11.11`|
|190|`hpux-cp863-11.11`|
|191|`hpux-cp864-11.11`|
|192|`hpux-cp865-11.11`|
|193|`hpux-cp866-11.11`|
|194|`hpux-cp869-11.11`|
|195|`hpux-cp874-11.11`|
|196|`hpux-eucJP0201-11.11`|
|197|`hpux-eucJP-11.11`|
|198|`hpux-eucJPMS-11.11`|
|199|`hpux-eucKR-11.11`|
|200|`hpux-eucTW-11.11`|
|201|`hpux-greee-11.11`|
|202|`hpux-hkbig5-11.11`|
|203|`hpux-hp15CN-11.11`|
|204|`hpux-iso87-11.11`|
|205|`hpux-roc15-11.11`|
|206|`hpux-sjis0201-11.11`|
|207|`hpux-sjis-11.11`|
|208|`hpux-sjisMS-11.11`|
|209|`IBM_1046`|
|210|`IBM_1124`|
|211|`IBM_1129`|
|212|`IBM_1252`|
|213|`IBM_850`|
|214|`IBM_856`|
|215|`IBM_858`|
|216|`IBM_932`|
|217|`IBM_943`|
|218|`IBM_eucJP`|
|219|`IBM_eucKR`|
|220|`IBM_eucTW`|
|221|`IBM_udcJP_GR`|
|222|`IBM038`|
|223|`IBM1004`|
|224|`ibm-1004_P100-1995`|
|225|`ibm-1006_P100-1995`|
|226|`ibm-1006_X100-1995`|
|227|`ibm-1008_P100-1995`|
|228|`ibm-1008_X100-1995`|
|229|`ibm-1009_P100-1995`|
|230|`ibm-1010_P100-1995`|
|231|`ibm-1011_P100-1995`|
|232|`ibm-1012_P100-1995`|
|233|`ibm-1013_P100-1995`|
|234|`ibm-1014_P100-1995`|
|235|`ibm-1015_P100-1995`|
|236|`ibm-1016_P100-1995`|
|237|`ibm-1017_P100-1995`|
|238|`ibm-1018_P100-1995`|
|239|`ibm-1019_P100-1995`|
|240|`ibm-1020_P100-2003`|
|241|`ibm-1021_P100-2003`|
|242|`ibm-1023_P100-2003`|
|243|`ibm-1025_P100-1995`|
|244|`ibm-1026_P100-1995`|
|245|`ibm-1027_P100-1995`|
|246|`ibm-1040_P100-1995`|
|247|`ibm-1041_P100-1995`|
|248|`ibm-1042_P100-1995`|
|249|`ibm-1043_P100-1995`|
|250|`ibm-1046_X110-1999`|
|251|`IBM1047`|
|252|`ibm-1047_P100-1995`|
|253|`ibm-1051_P100-1999`|
|254|`ibm-1088_P100-1995`|
|255|`ibm-1089_P100-1995`|
|256|`ibm-1097_P100-1995`|
|257|`ibm-1097_X100-1995`|
|258|`ibm-1098_P100-1995`|
|259|`ibm-1098_X100-1995`|
|260|`ibm-1100_P100-2003`|
|261|`ibm-1101_P100-2003`|
|262|`ibm-1102_P100-2003`|
|263|`ibm-1103_P100-2003`|
|264|`ibm-1104_P100-2003`|
|265|`ibm-1105_P100-2003`|
|266|`ibm-1106_P100-2003`|
|267|`ibm-1107_P100-2003`|
|268|`ibm-1112_P100-1995`|
|269|`ibm-1114_P100-1995`|
|270|`ibm-1114_P100-2001`|
|271|`ibm-1115_P100-1995`|
|272|`ibm-1122_P100-1999`|
|273|`ibm-1123_P100-1995`|
|274|`ibm-1124_X100-1996`|
|275|`ibm-1125_P100-1997`|
|276|`ibm-1126_P100_P100-1997_U3`|
|277|`ibm-1126_P100-1997`|
|278|`ibm-1127_P100-2004`|
|279|`ibm-1129_P100-1997`|
|280|`ibm-1130_P100-1997`|
|281|`ibm-1131_P100-1997`|
|282|`ibm-1132_P100-1997`|
|283|`ibm-1132_P100-1998`|
|284|`ibm-1133_P100-1997`|
|285|`ibm-1137_P100-1999`|
|286|`ibm-1137_PMOD-1999`|
|287|`ibm-1140_P100-1997`|
|288|`ibm-1141_P100-1997`|
|289|`ibm-1142_P100-1997`|
|290|`ibm-1143_P100-1997`|
|291|`ibm-1144_P100-1997`|
|292|`ibm-1145_P100-1997`|
|293|`ibm-1146_P100-1997`|
|294|`ibm-1147_P100-1997`|
|295|`ibm-1148_P100-1997`|
|296|`ibm-1149_P100-1997`|
|297|`ibm-1153_P100-1999`|
|298|`ibm-1154_P100-1999`|
|299|`ibm-1155_P100-1999`|
|300|`ibm-1156_P100-1999`|
|301|`ibm-1157_P100-1999`|
|302|`ibm-1158_P100-1999`|
|303|`ibm-1159_P100-1999`|
|304|`ibm-1160_P100-1999`|
|305|`ibm-1161_P100-1999`|
|306|`ibm-1162_P100-1999`|
|307|`ibm-1163_P100-1999`|
|308|`ibm-1164_P100-1999`|
|309|`ibm-1165_P101-2000`|
|310|`ibm-1166_P100-2002`|
|311|`ibm-1167_P100-2002`|
|312|`ibm-1168_P100-2002`|
|313|`ibm-1174_X100-2007`|
|314|`ibm-1250_P100-1999`|
|315|`ibm-1251_P100-1995`|
|316|`ibm-1252_P100-2000`|
|317|`ibm-1253_P100-1995`|
|318|`ibm-1254_P100-1995`|
|319|`ibm-1255_P100-1995`|
|320|`ibm-1256_P110-1997`|
|321|`ibm-1257_P100-1995`|
|322|`ibm-1258_P100-1997`|
|323|`ibm-12712_P100-1998`|
|324|`ibm-1275_P100-1995`|
|325|`ibm-1275_X100-1995`|
|326|`ibm-1276_P100-1995`|
|327|`ibm-1277_P100-1995`|
|328|`ibm-1280_P100-1996`|
|329|`ibm-1281_P100-1996`|
|330|`ibm-1282_P100-1996`|
|331|`ibm-1283_P100-1996`|
|332|`ibm-1284_P100-1996`|
|333|`ibm-1285_P100-1996`|
|334|`ibm-13121_P100-1995`|
|335|`ibm-13124_P100-1995`|
|336|`ibm-13124_P10A-1995`|
|337|`ibm-13125_P100-1997`|
|338|`ibm-13140_P101-2000`|
|339|`ibm-13143_P101-2000`|
|340|`ibm-13145_P101-2000`|
|341|`ibm-13156_P101-2000`|
|342|`ibm-13157_P101-2000`|
|343|`ibm-13162_P101-2000`|
|344|`ibm-13218_P100-1996`|
|345|`ibm-1350_P110-1997`|
|346|`ibm-1351_P110-1997`|
|347|`ibm-1362_P100-1997`|
|348|`ibm-1362_P110-1999`|
|349|`ibm-1363_P100-1997`|
|350|`ibm-1363_P10A-1997`|
|351|`ibm-1363_P10B-1998`|
|352|`ibm-1363_P110-1999`|
|353|`ibm-1363_P11A-1999`|
|354|`ibm-1363_P11B-1999`|
|355|`ibm-1363_P11C-2006`|
|356|`ibm-1364_P100-2007`|
|357|`ibm-1364_P110-2007`|
|358|`ibm-13676_P102-2001`|
|359|`ibm-1370_P100-1999`|
|360|`ibm-1370_X100-1999`|
|361|`ibm-1371_P100-1999`|
|362|`ibm-1371_X100-1999`|
|363|`ibm-1373_P100-2002`|
|364|`ibm-1374_P100_P100-2005_MS`|
|365|`ibm-1374_P100-2005`|
|366|`ibm-1375_P100-2004`|
|367|`ibm-1375_P100-2006`|
|368|`ibm-1375_P100-2007`|
|369|`ibm-1375_P100-2008`|
|370|`ibm-1375_X100-2004`|
|371|`ibm-1377_P100_P100-2006_U3`|
|372|`ibm-1377_P100-2006`|
|373|`ibm-1377_P100-2008`|
|374|`ibm-1380_P100-1995`|
|375|`ibm-1380_X100-1995`|
|376|`ibm-1381_P110-1999`|
|377|`ibm-1381_X110-1999`|
|378|`ibm-1382_P100-1995`|
|379|`ibm-1382_X100-1995`|
|380|`ibm-1383_P110-1999`|
|381|`ibm-1383_X110-1999`|
|382|`ibm-1385_P100-1997`|
|383|`ibm-1385_P100-2005`|
|384|`ibm-1386_P100-2001`|
|385|`ibm-1386_P110-1997`|
|386|`ibm-1388_P100-2024`|
|387|`ibm-1388_P103-2001`|
|388|`ibm-1388_P110-2000`|
|389|`ibm-1390_P100-1999`|
|390|`ibm-1390_P110-2003`|
|391|`ibm-1399_P100-1999`|
|392|`ibm-1399_P110-2003`|
|393|`ibm-16684_P100-1999`|
|394|`ibm-16684_P110-2003`|
|395|`ibm-16804_X110-1999`|
|396|`ibm-17221_P100-2001`|
|397|`ibm-17240_P101-2000`|
|398|`ibm-17248_X110-1999`|
|399|`ibm-20780_P100-1999`|
|400|`ibm-21344_P101-2000`|
|401|`ibm-21427_P100-1999`|
|402|`ibm-21427_X100-1999`|
|403|`ibm-25546_P100-1997`|
|404|`IBM256`|
|405|`ibm-256_P100-1995`|
|406|`ibm-259_P100-1995`|
|407|`ibm-259_X100-1995`|
|408|`ibm-273_P100-1999`|
|409|`IBM274`|
|410|`ibm-274_P100-2000`|
|411|`IBM275`|
|412|`ibm-275_P100-1995`|
|413|`IBM277`|
|414|`ibm-277_P100-1999`|
|415|`IBM278`|
|416|`ibm-278_P100-1999`|
|417|`IBM280`|
|418|`ibm-280_P100-1999`|
|419|`IBM281`|
|420|`ibm-282_P100-1995`|
|421|`IBM284`|
|422|`ibm-284_P100-1999`|
|423|`IBM285`|
|424|`ibm-285_P100-1999`|
|425|`ibm-286_P100-2003`|
|426|`ibm-28709_P100-1995`|
|427|`IBM290`|
|428|`ibm-290_P100-1995`|
|429|`ibm-293_P100-1995`|
|430|`ibm-293_X100-1995`|
|431|`IBM297`|
|432|`ibm-297_P100-1999`|
|433|`ibm-300_P110-1997`|
|434|`ibm-300_P120-2006`|
|435|`ibm-300_X110-1997`|
|436|`ibm-301_P110-1997`|
|437|`ibm-301_X110-1997`|
|438|`ibm-33058_P100-2000`|
|439|`ibm-33722_P120-1999`|
|440|`ibm-33722_P12A_P12A-2004_U2`|
|441|`ibm-33722_P12A_P12A-2009_U2`|
|442|`ibm-33722_P12A-1999`|
|443|`ibm-367_P100-1995`|
|444|`ibm-37_P100-1999`|
|445|`IBM420`|
|446|`ibm-420_X110-1999`|
|447|`ibm-420_X120-1999`|
|448|`IBM423`|
|449|`ibm-423_P100-1995`|
|450|`ibm-424_P100-1995`|
|451|`ibm-425_P101-2000`|
|452|`ibm-437_P100-1995`|
|453|`ibm-4517_P100-2005`|
|454|`ibm-4899_P100-1998`|
|455|`ibm-4904_P101-2000`|
|456|`ibm-4909_P100-1999`|
|457|`ibm-4930_P100-1997`|
|458|`ibm-4930_P110-1999`|
|459|`ibm-4933_P100-1996`|
|460|`ibm-4933_P100-2002`|
|461|`ibm-4944_P101-2000`|
|462|`ibm-4945_P101-2000`|
|463|`ibm-4948_P100-1995`|
|464|`ibm-4951_P100-1995`|
|465|`ibm-4952_P100-1995`|
|466|`ibm-4954_P101-2000`|
|467|`ibm-4955_P101-2000`|
|468|`ibm-4956_P101-2000`|
|469|`ibm-4957_P101-2000`|
|470|`ibm-4958_P101-2000`|
|471|`ibm-4959_P101-2000`|
|472|`ibm-4960_P100-1995`|
|473|`ibm-4960_X100-1995`|
|474|`ibm-4961_P101-2000`|
|475|`ibm-4962_P101-2000`|
|476|`ibm-4963_P101-2000`|
|477|`ibm-4971_P100-1999`|
|478|`ibm-500_P100-1999`|
|479|`ibm-5012_P100-1999`|
|480|`ibm-5026_P120-1999`|
|481|`ibm-5026_X120-1999`|
|482|`ibm-5035_P120_P12A-2005_U2`|
|483|`ibm-5035_P120-1999`|
|484|`ibm-5035_X120-1999`|
|485|`ibm-5039_P110-1996`|
|486|`ibm-5039_P11A-1998`|
|487|`ibm-5048_P100-1995`|
|488|`ibm-5049_P100-1995`|
|489|`ibm-5050_P120-1999`|
|490|`ibm-5050_P12A-1999`|
|491|`ibm-5067_P100-1995`|
|492|`ibm-5104_X110-1999`|
|493|`ibm-5123_P100-1999`|
|494|`ibm-5142_P100-1995`|
|495|`ibm-5210_P100-1999`|
|496|`ibm-5233_P100-2011`|
|497|`ibm-5346_P100-1998`|
|498|`ibm-5347_P100-1998`|
|499|`ibm-5348_P100-1997`|
|500|`ibm-5349_P100-1998`|
|501|`ibm-5350_P100-1998`|
|502|`ibm-5351_P100-1998`|
|503|`ibm-5352_P100-1998`|
|504|`ibm-5353_P100-1998`|
|505|`ibm-5354_P100-1998`|
|506|`ibm-53685_P101-2000`|
|507|`ibm-54191_P100-2006`|
|508|`ibm-5470_P100_P100-2005_MS`|
|509|`ibm-5470_P100-2005`|
|510|`ibm-5471_P100-2006`|
|511|`ibm-5471_P100-2007`|
|512|`ibm-5473_P100-2006`|
|513|`ibm-5478_P100-1995`|
|514|`ibm-5486_P100-1999`|
|515|`ibm-5487_P100-2001`|
|516|`ibm-5488_P100-2001`|
|517|`ibm-5495_P100-1999`|
|518|`ibm-62383_P100-2007`|
|519|`ibm-720_P100-1997`|
|520|`ibm-737_P100-1997`|
|521|`ibm-775_P100-1996`|
|522|`ibm-803_P100-1999`|
|523|`ibm-806_P100-1998`|
|524|`ibm-808_P100-1999`|
|525|`ibm-813_P100-1995`|
|526|`ibm-819_P100-1999`|
|527|`ibm-833_P100-1995`|
|528|`ibm-834_P100-1995`|
|529|`ibm-834_X100-1995`|
|530|`ibm-835_P100-1995`|
|531|`ibm-835_X100-1995`|
|532|`ibm-836_P100-1995`|
|533|`ibm-837_P100-1995`|
|534|`ibm-837_P100-2011`|
|535|`ibm-837_X100-1995`|
|536|`ibm-838_P100-1995`|
|537|`ibm-848_P100-1999`|
|538|`ibm-8482_P100-1999`|
|539|`ibm-849_P100-1999`|
|540|`ibm-850_P100-1999`|
|541|`IBM851`|
|542|`ibm-851_P100-1995`|
|543|`ibm-852_P100-1999`|
|544|`ibm-855_P100-1995`|
|545|`ibm-856_P100-1995`|
|546|`ibm-857_P100-1995`|
|547|`ibm-858_P100-1997`|
|548|`ibm-859_P100-1999`|
|549|`ibm-860_P100-1995`|
|550|`ibm-861_P100-1995`|
|551|`ibm-8612_P100-1995`|
|552|`ibm-8612_X110-1995`|
|553|`ibm-862_P100-1995`|
|554|`ibm-863_P100-1995`|
|555|`ibm-864_X110-1999`|
|556|`ibm-864_X120-2012`|
|557|`ibm-865_P100-1995`|
|558|`ibm-866_P100-1995`|
|559|`ibm-867_P100-1998`|
|560|`IBM868`|
|561|`ibm-868_P100-1995`|
|562|`ibm-868_X100-1995`|
|563|`ibm-869_P100-1995`|
|564|`IBM870`|
|565|`ibm-870_P100-1999`|
|566|`IBM871`|
|567|`ibm-871_P100-1999`|
|568|`ibm-872_P100-1999`|
|569|`IBM874`|
|570|`ibm-874_P100-1995`|
|571|`IBM875`|
|572|`ibm-875_P100-1995`|
|573|`ibm-878_P100-1996`|
|574|`IBM880`|
|575|`ibm-880_P100-1995`|
|576|`IBM891`|
|577|`ibm-891_P100-1995`|
|578|`ibm-895_P100-1995`|
|579|`ibm-896_P100-1995`|
|580|`ibm-897_P100-1995`|
|581|`ibm-9005_X100-2005`|
|582|`ibm-9005_X110-2007`|
|583|`ibm-901_P100-1999`|
|584|`ibm-902_P100-1999`|
|585|`ibm-9027_P100-1999`|
|586|`ibm-9027_X100-1999`|
|587|`IBM903`|
|588|`ibm-903_P100-1995`|
|589|`ibm-9030_P100-1995`|
|590|`IBM904`|
|591|`ibm-904_P100-1995`|
|592|`ibm-9042_P101-2000`|
|593|`ibm-9044_P100-1999`|
|594|`ibm-9048_P100-1998`|
|595|`ibm-9049_P100-1999`|
|596|`IBM905`|
|597|`ibm-905_P100-1995`|
|598|`ibm-9056_P100-1995`|
|599|`ibm-9061_P100-1999`|
|600|`ibm-9064_P101-2000`|
|601|`ibm-9066_P100-1995`|
|602|`ibm-9067_X100-2005`|
|603|`ibm-912_P100-1999`|
|604|`ibm-913_P100-2000`|
|605|`ibm-914_P100-1995`|
|606|`ibm-9145_P110-1997`|
|607|`ibm-9145_X110-1997`|
|608|`ibm-915_P100-1995`|
|609|`ibm-916_P100-1995`|
|610|`IBM918`|
|611|`ibm-918_P100-1995`|
|612|`ibm-918_X100-1995`|
|613|`ibm-920_P100-1995`|
|614|`ibm-921_P100-1995`|
|615|`ibm-922_P100-1999`|
|616|`ibm-923_P100-1998`|
|617|`ibm-9238_X110-1999`|
|618|`ibm-924_P100-1998`|
|619|`ibm-926_P100-2000`|
|620|`ibm-927_P100-1995`|
|621|`ibm-927_X100-1995`|
|622|`ibm-928_P100-1995`|
|623|`ibm-930_P120_P12A-2006_U2`|
|624|`ibm-930_P120-1999`|
|625|`ibm-930_X120-1999`|
|626|`ibm-9306_P101-2000`|
|627|`ibm-931_P120-1999`|
|628|`ibm-931_X120-1999`|
|629|`ibm-932_P120-1999`|
|630|`ibm-932_P12A_P12A-2000_U2`|
|631|`ibm-932_P12A-1999`|
|632|`ibm-933_P110-1999`|
|633|`ibm-933_X110-1999`|
|634|`ibm-935_P110-1999`|
|635|`ibm-935_X110-1999`|
|636|`ibm-937_P110-1999`|
|637|`ibm-937_X110-1999`|
|638|`ibm-939_P120_P12A-2005_U2`|
|639|`ibm-939_P120-1999`|
|640|`ibm-939_X120-1999`|
|641|`ibm-941_P120-1996`|
|642|`ibm-941_P12A-1996`|
|643|`ibm-941_P130-2001`|
|644|`ibm-941_P13A-2001`|
|645|`ibm-941_X110-1996`|
|646|`ibm-941_X11A-1996`|
|647|`ibm-942_P120-1999`|
|648|`ibm-942_P12A_P12A-2000_U2`|
|649|`ibm-942_P12A-1999`|
|650|`ibm-943_P130-1999`|
|651|`ibm-943_P14A-1999`|
|652|`ibm-943_P15A-2003`|
|653|`ibm-944_P100-1995`|
|654|`ibm-944_X100-1995`|
|655|`ibm-9444_P100_P100-2005_MS`|
|656|`ibm-9444_P100-2001`|
|657|`ibm-9444_P100-2005`|
|658|`ibm-9447_P100-2002`|
|659|`ibm-9448_X100-2005`|
|660|`ibm-9449_P100-2002`|
|661|`ibm-946_P100-1995`|
|662|`ibm-947_P100-1995`|
|663|`ibm-947_X100-1995`|
|664|`ibm-948_P110-1999`|
|665|`ibm-948_X110-1999`|
|666|`ibm-949_P110-1999`|
|667|`ibm-949_P11A-1999`|
|668|`ibm-949_X110-1999`|
|669|`ibm-950_P110-1999`|
|670|`ibm-950_X110-1999`|
|671|`ibm-951_P100-1995`|
|672|`ibm-951_X100-1995`|
|673|`ibm-952_P110-1997`|
|674|`ibm-953_P100-2000`|
|675|`ibm-954_P101-2007`|
|676|`ibm-955_P110-1997`|
|677|`ibm-9577_P100-2001`|
|678|`ibm-9580_P110-1999`|
|679|`ibm-960_P100-2000`|
|680|`ibm-963_P100-1995`|
|681|`ibm-964_P110-1999`|
|682|`ibm-964_X110-1999`|
|683|`ibm-970_P110_P110-2006_U2`|
|684|`ibm-970_P110-1999`|
|685|`ibm-971_P100-1995`|
|686|`IEC_P27_1`|
|687|`INIS`|
|688|`INIS_8`|
|689|`INIS_CYRILLIC`|
|690|`ISO_10367_BOX`|
|691|`ISO_5427`|
|692|`ISO_5427_EXT`|
|693|`ISO_5428`|
|694|`ISO_8859_1`|
|695|`ISO_8859_10`|
|696|`ISO_8859_11`|
|697|`ISO_8859_13`|
|698|`ISO_8859_14`|
|699|`ISO_8859_15`|
|700|`ISO_8859_2`|
|701|`ISO_8859_3`|
|702|`ISO_8859_4`|
|703|`ISO_8859_5`|
|704|`ISO_8859_6`|
|705|`ISO_8859_7`|
|706|`ISO_8859_8`|
|707|`ISO_8859_9`|
|708|`ISO_IR_197`|
|709|`ISO646_US`|
|710|`iso81`|
|711|`iso815`|
|712|`iso82`|
|713|`iso85`|
|714|`iso86`|
|715|`iso87`|
|716|`iso88`|
|717|`ISO8859_1`|
|718|`iso-8859_10-1998`|
|719|`iso-8859_11-2001`|
|720|`iso-8859_1-1998`|
|721|`iso-8859_13-1998`|
|722|`iso-8859_14-1998`|
|723|`ISO8859_15`|
|724|`iso-8859_15-1999`|
|725|`iso-8859_16-2001`|
|726|`ISO8859_2`|
|727|`iso-8859_2-1999`|
|728|`ISO8859_3`|
|729|`iso-8859_3-1999`|
|730|`ISO8859_4`|
|731|`iso-8859_4-1998`|
|732|`ISO8859_5`|
|733|`iso-8859_5-1999`|
|734|`ISO8859_6`|
|735|`iso-8859_6-1999`|
|736|`ISO8859_7`|
|737|`iso-8859_7-1987`|
|738|`iso-8859_7-2003`|
|739|`ISO8859_8`|
|740|`iso-8859_8-1999`|
|741|`ISO8859_9`|
|742|`iso-8859_9-1999`|
|743|`iso89`|
|744|`java-ASCII-1.3_P`|
|745|`java-Big5-1.3_P`|
|746|`java-Cp037-1.3_P`|
|747|`java-Cp1006-1.3_P`|
|748|`java-Cp1025-1.3_P`|
|749|`java-Cp1026-1.3_P`|
|750|`java-Cp1097-1.3_P`|
|751|`java-Cp1098-1.3_P`|
|752|`java-Cp1112-1.3_P`|
|753|`java-Cp1122-1.3_P`|
|754|`java-Cp1123-1.3_P`|
|755|`java-Cp1124-1.3_P`|
|756|`java-Cp1250-1.3_P`|
|757|`java-Cp1251-1.3_P`|
|758|`java-Cp1252-1.3_P`|
|759|`java-Cp1253-1.3_P`|
|760|`java-Cp1254-1.3_P`|
|761|`java-Cp1255-1.3_P`|
|762|`java-Cp1256-1.3_P`|
|763|`java-Cp1257-1.3_P`|
|764|`java-Cp1258-1.3_P`|
|765|`java-Cp1381-1.3_P`|
|766|`java-Cp1383-1.3_P`|
|767|`java-Cp273-1.3_P`|
|768|`java-Cp277-1.3_P`|
|769|`java-Cp278-1.3_P`|
|770|`java-Cp280-1.3_P`|
|771|`java-Cp284-1.3_P`|
|772|`java-Cp285-1.3_P`|
|773|`java-Cp297-1.3_P`|
|774|`java-Cp33722-1.3_P`|
|775|`java-Cp420-1.3_P`|
|776|`java-Cp424-1.3_P`|
|777|`java-Cp437-1.3_P`|
|778|`java-Cp500-1.3_P`|
|779|`java-Cp737-1.3_P`|
|780|`java-Cp775-1.3_P`|
|781|`java-Cp838-1.3_P`|
|782|`java-Cp850-1.3_P`|
|783|`java-Cp852-1.3_P`|
|784|`java-Cp855-1.3_P`|
|785|`java-Cp856-1.3_P`|
|786|`java-Cp857-1.3_P`|
|787|`java-Cp860-1.3_P`|
|788|`java-Cp861-1.3_P`|
|789|`java-Cp862-1.3_P`|
|790|`java-Cp863-1.3_P`|
|791|`java-Cp864-1.3_P`|
|792|`java-Cp865-1.3_P`|
|793|`java-Cp866-1.3_P`|
|794|`java-Cp868-1.3_P`|
|795|`java-Cp869-1.3_P`|
|796|`java-Cp870-1.3_P`|
|797|`java-Cp871-1.3_P`|
|798|`java-Cp874-1.3_P`|
|799|`java-Cp875-1.3_P`|
|800|`java-Cp918-1.3_P`|
|801|`java-Cp921-1.3_P`|
|802|`java-Cp922-1.3_P`|
|803|`java-Cp930-1.3_P`|
|804|`java-Cp933-1.3_P`|
|805|`java-Cp935-1.3_P`|
|806|`java-Cp937-1.3_P`|
|807|`java-Cp939-1.3_P`|
|808|`java-Cp942-1.3_P`|
|809|`java-Cp942C-1.3_P`|
|810|`java-Cp943-1.2.2`|
|811|`java-Cp943C-1.3_P`|
|812|`java-Cp948-1.3_P`|
|813|`java-Cp949-1.3_P`|
|814|`java-Cp949C-1.3_P`|
|815|`java-Cp950-1.3_P`|
|816|`java-Cp964-1.3_P`|
|817|`java-Cp970-1.3_P`|
|818|`java-EUC_CN-1.3_P`|
|819|`java-EUC_JP-1.3_P`|
|820|`java-EUC_KR-1.3_P`|
|821|`java-EUC_TW-1.3_P`|
|822|`java-ISO2022JP-1.3_P`|
|823|`java-ISO2022KR-1.3_P`|
|824|`java-ISO8859_1-1.3_P`|
|825|`java-ISO8859_13-1.3_P`|
|826|`java-ISO8859_2-1.3_P`|
|827|`java-ISO8859_3-1.3_P`|
|828|`java-ISO8859_4-1.3_P`|
|829|`java-ISO8859_5-1.3_P`|
|830|`java-ISO8859_6-1.3_P`|
|831|`java-ISO8859_7-1.3_P`|
|832|`java-ISO8859_8-1.3_P`|
|833|`java-ISO8859_9-1.3_P`|
|834|`java-Johab-1.3_P`|
|835|`java-KOI8_R-1.3_P`|
|836|`java-MS874-1.3_P`|
|837|`java-MS932-1.3_P`|
|838|`java-MS949-1.3_P`|
|839|`java-SJIS-1.3_P`|
|840|`java-TIS620-1.3_P`|
|841|`JISX0201.1976_0`|
|842|`JISX0201.1976_GR`|
|843|`JISX0208.1983_0`|
|844|`JISX0208.1983_GR`|
|845|`KOI_8`|
|846|`KOI8_R`|
|847|`KOI8_U`|
|848|`KSC5601.1987_0`|
|849|`LATIN_GREEK`|
|850|`LATIN_GREEK_1`|
|851|`latin-1`|
|852|`MAC_IS`|
|853|`mac_roman`|
|854|`MAC_UK`|
|855|`macos-0_1-10.2`|
|856|`macos-0_2-10.2`|
|857|`macos-1024-10.2`|
|858|`macos-1040-10.2`|
|859|`macos-1049-10.2`|
|860|`macos-1057-10.2`|
|861|`macos-1059-10.2`|
|862|`macos-1280-10.2`|
|863|`macos-1281-10.2`|
|864|`macos-1282-10.2`|
|865|`macos-1283-10.2`|
|866|`macos-1284-10.2`|
|867|`macos-1285-10.2`|
|868|`macos-1286-10.2`|
|869|`macos-1287-10.2`|
|870|`macos-1288-10.2`|
|871|`macos-1536-10.2`|
|872|`macos-21-10.5`|
|873|`macos-2562-10.2`|
|874|`macos-2563-10.2`|
|875|`macos-2566-10.2`|
|876|`macos-2817-10.2`|
|877|`macos-29-10.2`|
|878|`macos-3074-10.2`|
|879|`macos-33-10.5`|
|880|`macos-34-10.2`|
|881|`macos-35-10.2`|
|882|`macos-36_1-10.2`|
|883|`macos-36_2-10.2`|
|884|`macos-37_2-10.2`|
|885|`macos-37_3-10.2`|
|886|`macos-37_4-10.2`|
|887|`macos-37_5-10.2`|
|888|`macos-38_1-10.2`|
|889|`macos-38_2-10.2`|
|890|`macos-513-10.2`|
|891|`macos-514-10.2`|
|892|`macos-515-10.2`|
|893|`macos-516-10.2`|
|894|`macos-517-10.2`|
|895|`macos-518-10.2`|
|896|`macos-519-10.2`|
|897|`macos-520-10.2`|
|898|`macos-521-10.2`|
|899|`macos-527-10.2`|
|900|`macos-6_2-10.4`|
|901|`macos-6-10.2`|
|902|`macos-7_1-10.2`|
|903|`macos-7_2-10.2`|
|904|`macos-7_3-10.2`|
|905|`NATS_DANO`|
|906|`NATS_SEFI`|
|907|`osd-EBCDIC-DF03-IRV`|
|908|`osd-EBCDIC-DF04-1`|
|909|`osd-EBCDIC-DF04-15`|
|910|`PCK`|
|911|`roma8`|
|912|`shift_jis`|
|913|`solaris-zh_HK.hkscs-5.9`|
|914|`solaris-zh_TW_big5-2.7`|
|915|`thai8`|
|916|`TIS_620`|
|917|`utf-16`|
|918|`utf-8`|
|919|`windows-10000-2000`|
|920|`windows-10001-2000`|
|921|`windows-10002-2000`|
|922|`windows-10003-2000`|
|923|`windows-10004-2000`|
|924|`windows-10005-2000`|
|925|`windows-10006-2000`|
|926|`windows-10007-2000`|
|927|`windows-10008-2000`|
|928|`windows-10010-2000`|
|929|`windows-10017-2000`|
|930|`windows-10021-2000`|
|931|`windows-10029-2000`|
|932|`windows-10079-2000`|
|933|`windows-10081-2000`|
|934|`windows-10082-2000`|
|935|`windows-1026-2000`|
|936|`windows-1047-2000`|
|937|`windows-1140-2000`|
|938|`windows-1141-2000`|
|939|`windows-1142-2000`|
|940|`windows-1143-2000`|
|941|`windows-1144-2000`|
|942|`windows-1145-2000`|
|943|`windows-1146-2000`|
|944|`windows-1147-2000`|
|945|`windows-1148-2000`|
|946|`windows-1149-2000`|
|947|`windows-1250-2000`|
|948|`windows-1251-2000`|
|949|`windows-1252-2000`|
|950|`windows-1253-2000`|
|951|`windows-1254-2000`|
|952|`windows-1255-2000`|
|953|`windows-1256-2000`|
|954|`windows-1257-2000`|
|955|`windows-1258_db-2013`|
|956|`windows-1258-2000`|
|957|`windows-1361-2000`|
|958|`windows-20000-2000`|
|959|`windows-20001-2000`|
|960|`windows-20002-2000`|
|961|`windows-20003-2000`|
|962|`windows-20004-2000`|
|963|`windows-20005-2000`|
|964|`windows-20105-2000`|
|965|`windows-20106-2000`|
|966|`windows-20107-2000`|
|967|`windows-20108-2000`|
|968|`windows-20127-2000`|
|969|`windows-20261-2000`|
|970|`windows-20269-2000`|
|971|`windows-20273-2000`|
|972|`windows-20277-2000`|
|973|`windows-20278-2000`|
|974|`windows-20280-2000`|
|975|`windows-20284-2000`|
|976|`windows-20285-2000`|
|977|`windows-20290-2000`|
|978|`windows-20297-2000`|
|979|`windows-20420-2000`|
|980|`windows-20423-2000`|
|981|`windows-20424-2000`|
|982|`windows-20833-2000`|
|983|`windows-20838-2000`|
|984|`windows-20866-2000`|
|985|`windows-20871-2000`|
|986|`windows-20880-2000`|
|987|`windows-20905-2000`|
|988|`windows-20924-2000`|
|989|`windows-20932-2000`|
|990|`windows-20936-2000`|
|991|`windows-20949-2000`|
|992|`windows-21025-2000`|
|993|`windows-21027-2000`|
|994|`windows-21866-2000`|
|995|`windows-28591-2000`|
|996|`windows-28592-2000`|
|997|`windows-28593-2000`|
|998|`windows-28594-2000`|
|999|`windows-28595-2000`|
|1000|`windows-28596-2000`|
|1001|`windows-28597-2000`|
|1002|`windows-28598-2000`|
|1003|`windows-28599-2000`|
|1004|`windows-28603-vista`|
|1005|`windows-28605-2000`|
|1006|`windows-37-2000`|
|1007|`windows-38598-2000`|
|1008|`windows-437-2000`|
|1009|`windows-500-2000`|
|1010|`windows-51932-2006`|
|1011|`windows-51936-2000`|
|1012|`windows-51949-2000`|
|1013|`windows-708-2000`|
|1014|`windows-720-2000`|
|1015|`windows-737-2000`|
|1016|`windows-775-2000`|
|1017|`windows-850-2000`|
|1018|`windows-852-2000`|
|1019|`windows-855-2000`|
|1020|`windows-857-2000`|
|1021|`windows-858-2000`|
|1022|`windows-860-2000`|
|1023|`windows-861-2000`|
|1024|`windows-862-2000`|
|1025|`windows-863-2000`|
|1026|`windows-864-2000`|
|1027|`windows-865-2000`|
|1028|`windows-866-2000`|
|1029|`windows-869-2000`|
|1030|`windows-870-2000`|
|1031|`windows-874-2000`|
|1032|`windows-875-2000`|
|1033|`windows-932-2000`|
|1034|`windows-936-2000`|
|1035|`windows-949-2000`|
|1036|`windows-950_hkscs-2001`|
|1037|`windows-950-2000`|
|1038|`zh_CN.euc`|
|1039|`zh_CN.gbk`|
|1040|`zh_CN_cp935`|
|1041|`zh_TW_cp937`|
|1042|`zh_TW_euc`|

## Excel Extension {#docs:current:core_extensions:excel}

The `excel` extension provides functions to format numbers per Excel's formatting rules by wrapping the [i18npool library](https://www.openoffice.org/l10n/i18n_framework/index.html) and to read/write Excel (` .xlsx`) files. However, please note that `.xls` files are not supported.

> **Tip.** Previously, reading and writing Excel files was handled through the [`spatial` extension](#docs:current:core_extensions:spatial:overview), which coincidentally included support for XLSX files through one of its dependencies, but this capability may be removed from the `spatial` extension. Additionally, the `excel` extension is more efficient and provides more control over the import/export process. If the `excel` extension is insufficient for your use case, try using the [`spatial` extension](#docs:current:core_extensions:spatial:overview). See the [Excel Import](#docs:current:guides:file_formats:excel_import) and [Excel Export](#docs:current:guides:file_formats:excel_export) pages for instructions. However, please be aware that these features may be deprecated in the future.

#### Installing and Loading {#docs:current:core_extensions:excel::installing-and-loading}

The `excel` extension will be transparently [autoloaded](#docs:current:extensions:overview::autoloading-extensions) on first use from the official extension repository.
If you would like to install and load it manually, run:

```sql
INSTALL excel;
LOAD excel;
```

#### Excel Scalar Functions {#docs:current:core_extensions:excel::excel-scalar-functions}

| Function                            | Description                                                          |
| :---------------------------------- | :------------------------------------------------------------------- |
| `excel_text(number, format_string)` | Format the given `number` per the rules given in the `format_string` |
| `text(number, format_string)`       | Alias for `excel_text`                                               |

#### Examples {#docs:current:core_extensions:excel::examples}

```sql
SELECT excel_text(1_234_567.897, 'h:mm AM/PM') AS timestamp;
```

| timestamp |
| --------- |
| 9:31 PM   |

```sql
SELECT excel_text(1_234_567.897, 'h AM/PM') AS timestamp;
```

| timestamp |
| --------- |
| 9 PM      |

#### Reading XLSX Files {#docs:current:core_extensions:excel::reading-xlsx-files}

Reading a `.xlsx` file is as simple as just `SELECT`ing from it immediately, e.g.:

```sql
SELECT *
FROM 'test.xlsx';
```

|   a |   b |
| --: | --: |
| 1.0 | 2.0 |
| 3.0 | 4.0 |

However, if you want to set additional options to control the import process, you can use the `read_xlsx` function instead. The following named parameters are supported.

| Option             | Type      | Default                  | Description                                                                                                                                                                                                                                                                                           |
| ------------------ | --------- | ------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `header`           | `BOOLEAN` | _automatically inferred_ | Whether to treat the first row as containing the names of the resulting columns.                                                                                                                                                                                                                      |
| `sheet`            | `VARCHAR` | _automatically inferred_ | The name of the sheet in the xlsx file to read. Default is the first sheet.                                                                                                                                                                                                                           |
| `all_varchar`      | `BOOLEAN` | `false`                  | Whether to read all cells as containing `VARCHAR`s.                                                                                                                                                                                                                                                   |
| `ignore_errors`    | `BOOLEAN` | `false`                  | Whether to ignore errors and silently replace cells that can't be cast to the corresponding inferred column type with `NULL`'s.                                                                                                                                                                        |
| `range`            | `VARCHAR` | _automatically inferred_ | The range of cells to read, in spreadsheet notation. For example, `A1:B2` reads the cells from A1 to B2. If not specified the resulting range will be inferred as rectangular region of cells between the first row of consecutive non-empty cells and the first empty row spanning the same columns. |
| `stop_at_empty`    | `BOOLEAN` | _automatically inferred_ | Whether to stop reading the file when an empty row is encountered. If an explicit `range` option is provided, this is `false` by default, otherwise `true`.                                                                                                                                           |
| `empty_as_varchar` | `BOOLEAN` | `false`                  | Whether to treat empty cells as `VARCHAR` instead of `DOUBLE` when trying to automatically infer column types.                                                                                                                                                                                        |

```sql
SELECT *
FROM read_xlsx('test.xlsx', header = true);
```

|   a |   b |
| --: | --: |
| 1.0 | 2.0 |
| 3.0 | 4.0 |

Alternatively, the `COPY` statement with the `XLSX` format option can be used to import an Excel file into an existing table, in which case the types of the columns in the target table will be used to coerce the types of the cells in the Excel file.

```sql
CREATE TABLE test (a DOUBLE, b DOUBLE);
COPY test FROM 'test.xlsx' WITH (FORMAT xlsx, HEADER);
SELECT * FROM test;
```

##### Type and Range Inference {#docs:current:core_extensions:excel::type-and-range-inference}

Because Excel itself only really stores numbers or strings in cells, and does not enforce that all cells in a column are of the same type, the `excel` extension has to do some guesswork to "infer" and decide the types of the columns when importing an Excel sheet. While almost all columns are inferred as either `DOUBLE` or `VARCHAR`, there are some caveats:

* `TIMESTAMP`, `TIME`, `DATE` and `BOOLEAN` types are inferred when possible based on the _format_ applied to the cell.
* Text cells containing `TRUE` and `FALSE` are inferred as `BOOLEAN`.
* Empty cells are considered to be `DOUBLE` by default, unless the `empty_as_varchar` option is set to `true`, in which case they are typed as `VARCHAR`.

If the `all_varchar` option is set to `true`, none of the above applies and all cells are read as `VARCHAR`.

When no types are specified explicitly, (e.g., when using the `read_xlsx` function instead of `COPY TO ... FROM '⟨file⟩.xlsx'`{:.language-sql .highlight})
the types of the resulting columns are inferred based on the first "data" row in the sheet, that is:

* If no explicit range is given
  * The first row after the header if a header is found or forced by the `header` option
  * The first non-empty row in the sheet if no header is found or forced
* If an explicit range is given
  * The second row of the range if a header is found in the first row or forced by the `header` option
  * The first row of the range if no header is found or forced

This can sometimes lead to issues if the first "data row" is not representative of the rest of the sheet (e.g., it contains empty cells) in which case the `ignore_errors` or `empty_as_varchar` options can be used to work around this.

However, when the `COPY TO ... FROM '⟨file⟩.xlsx'`{:.language-sql .highlight} syntax is used, no type inference is done and the types of the resulting columns are determined by the types of the columns in the table being copied to. All cells will simply be converted by casting from `DOUBLE` or `VARCHAR` to the target column type.

#### Writing XLSX Files {#docs:current:core_extensions:excel::writing-xlsx-files}

Writing `.xlsx` files is supported using the `COPY` statement with `XLSX` given as the format. The following additional parameters are supported.

| Option            | Type      | Default   | Description                                                                          |
| ----------------- | --------- | --------- | ------------------------------------------------------------------------------------ |
| `header`          | `BOOLEAN` | `false`   | Whether to write the column names as the first row in the sheet                      |
| `sheet`           | `VARCHAR` | `Sheet1`  | The name of the sheet in the xlsx file to write.                                     |
| `sheet_row_limit` | `INTEGER` | `1048576` | The maximum number of rows in a sheet. An error is thrown if this limit is exceeded. |

> **Warning.** Many tools only support a maximum of 1,048,576 rows in a sheet, so increasing the `sheet_row_limit` may render the resulting file unreadable by other software.

These are passed as options to the `COPY` statement after the `FORMAT`, e.g.:

```sql
CREATE TABLE test AS
    SELECT *
    FROM (VALUES (1, 2), (3, 4)) AS t(a, b);
COPY test TO 'test.xlsx' WITH (FORMAT xlsx, HEADER true);
```

##### Type Conversions {#docs:current:core_extensions:excel::type-conversions}

Because XLSX files only really support storing numbers or strings – the equivalent of `VARCHAR` and `DOUBLE`, the following type conversions are applied when writing XLSX files.

* Numeric types are cast to `DOUBLE` when writing to an XLSX file.
* Temporal types (` TIMESTAMP`, `DATE`, `TIME`, etc.) are converted to excel "serial" numbers, that is the number of days since 1900-01-01 for dates and the fraction of a day for times. These are then styled with a "number format" so that they appear as dates or times when opened in Excel.
* `TIMESTAMP_TZ` and `TIME_TZ` are cast to UTC `TIMESTAMP` and `TIME` respectively, with the timezone information being lost.
* `BOOLEAN`s are converted to `1` and `0`, with a "number format" applied to make them appear as `TRUE` and `FALSE` in Excel.
* All other types are cast to `VARCHAR` and then written as text cells.

## Full-Text Search Extension {#docs:current:core_extensions:full_text_search}

Full-Text Search is an extension to DuckDB that allows for search through strings, similar to [SQLite's FTS5 extension](https://www.sqlite.org/fts5.html).

#### Installing and Loading {#docs:current:core_extensions:full_text_search::installing-and-loading}

The `fts` extension will be transparently [autoloaded](#docs:current:extensions:overview::autoloading-extensions) on first use from the official extension repository.
If you would like to install and load it manually, run:

```sql
INSTALL fts;
LOAD fts;
```

#### Usage {#docs:current:core_extensions:full_text_search::usage}

The extension adds two `PRAGMA` statements to DuckDB: one to create, and one to drop an index. Additionally, a scalar macro `stem` is added, which is used internally by the extension.

##### `PRAGMA create_fts_index` {#docs:current:core_extensions:full_text_search::pragma-create_fts_index}

```python
create_fts_index(input_table, input_id, *input_values, stemmer = 'porter',
                 stopwords = 'english', ignore = '(\\.|[^a-z])+',
                 strip_accents = 1, lower = 1, overwrite = 0)
```

`PRAGMA` that creates a FTS index for the specified table.



| Name | Type | Description |
|:--|:--|:----------|
| `input_table` | `VARCHAR` | Qualified name of specified table, e.g., `'table_name'` or `'main.table_name'` |
| `input_id` | `VARCHAR` | Column name of document identifier, e.g., `'document_identifier'` |
| `input_values...` | `VARCHAR` | Column names of the text fields to be indexed (vararg), e.g., `'text_field_1'`, `'text_field_2'`, ..., `'text_field_N'`, or `'\*'` for all columns in input_table of type `VARCHAR` |
| `stemmer` | `VARCHAR` | The type of stemmer to be used. One of `'arabic'`, `'basque'`, `'catalan'`, `'danish'`, `'dutch'`, `'english'`, `'finnish'`, `'french'`, `'german'`, `'greek'`, `'hindi'`, `'hungarian'`, `'indonesian'`, `'irish'`, `'italian'`, `'lithuanian'`, `'nepali'`, `'norwegian'`, `'porter'`, `'portuguese'`, `'romanian'`, `'russian'`, `'serbian'`, `'spanish'`, `'swedish'`, `'tamil'`, `'turkish'`, or `'none'` if no stemming is to be used. Defaults to `'porter'` |
| `stopwords` | `VARCHAR` | Qualified name of table containing a single `VARCHAR` column containing the desired stopwords, or `'none'` if no stopwords are to be used. Defaults to `'english'` for a pre-defined list of 571 English stopwords |
| `ignore` | `VARCHAR` | Regular expression of patterns to be ignored. Defaults to `'(\\.|[^a-z])+'`, ignoring all escaped and non-alphabetic lowercase characters |
| `strip_accents` | `BOOLEAN` | Whether to remove accents (e.g., convert `á` to `a`). Defaults to `1` |
| `lower` | `BOOLEAN` | Whether to convert all text to lowercase. Defaults to `1` |
| `overwrite` | `BOOLEAN` | Whether to overwrite an existing index on a table. Defaults to `0` |



This `PRAGMA` builds the index under a newly created schema. The schema will be named after the input table: if an index is created on table `'main.table_name'`, then the schema will be named `'fts_main_table_name'`.

##### `PRAGMA drop_fts_index` {#docs:current:core_extensions:full_text_search::pragma-drop_fts_index}

```python
drop_fts_index(input_table)
```

Drops a FTS index for the specified table.

| Name | Type | Description |
|:--|:--|:-----------|
| `input_table` | `VARCHAR` | Qualified name of input table, e.g., `'table_name'` or `'main.table_name'` |

##### `match_bm25` Function {#docs:current:core_extensions:full_text_search::match_bm25-function}

```python
match_bm25(input_id, query_string, fields := NULL, k := 1.2, b := 0.75, conjunctive := 0)
```

When an index is built, this retrieval macro is created that can be used to search the index.

| Name | Type | Description |
|:--|:--|:----------|
| `input_id` | `VARCHAR` | Column name of document identifier, e.g., `'document_identifier'` |
| `query_string` | `VARCHAR` | The string to search the index for |
| `fields` | `VARCHAR` | Comma-separated list of fields to search in, e.g., `'text_field_2, text_field_N'`. Defaults to `NULL` to search all indexed fields |
| `k` | `DOUBLE` | Parameter _k<sub>1</sub>_ in the Okapi BM25 retrieval model. Defaults to `1.2` |
| `b` | `DOUBLE` | Parameter _b_ in the Okapi BM25 retrieval model. Defaults to `0.75` |
| `conjunctive` | `BOOLEAN` | Whether to make the query conjunctive i.e., all terms in the query string must be present in order for a document to be retrieved |

##### `stem` Function {#docs:current:core_extensions:full_text_search::stem-function}

```python
stem(input_string, stemmer)
```

Reduces words to their base. Used internally by the extension.

| Name | Type | Description |
|:--|:--|:----------|
| `input_string` | `VARCHAR` | The column or constant to be stemmed. |
| `stemmer` | `VARCHAR` | The type of stemmer to be used. One of `'arabic'`, `'basque'`, `'catalan'`, `'danish'`, `'dutch'`, `'english'`, `'finnish'`, `'french'`, `'german'`, `'greek'`, `'hindi'`, `'hungarian'`, `'indonesian'`, `'irish'`, `'italian'`, `'lithuanian'`, `'nepali'`, `'norwegian'`, `'porter'`, `'portuguese'`, `'romanian'`, `'russian'`, `'serbian'`, `'spanish'`, `'swedish'`, `'tamil'`, `'turkish'`, or `'none'` if no stemming is to be used. |

#### Example Usage {#docs:current:core_extensions:full_text_search::example-usage}

Create a table and fill it with text data:

```sql
CREATE TABLE documents (
    document_identifier VARCHAR,
    text_content VARCHAR,
    author VARCHAR,
    doc_version INTEGER
);
INSERT INTO documents
    VALUES ('doc1',
            'The mallard is a dabbling duck that breeds throughout the temperate.',
            'Hannes Mühleisen',
            3),
           ('doc2',
            'The cat is a domestic species of small carnivorous mammal.',
            'Laurens Kuiper',
            2
           );
```

Build the index, and make both the `text_content` and `author` columns searchable.

```sql
PRAGMA create_fts_index(
    'documents', 'document_identifier', 'text_content', 'author'
);
```

Search the `author` field index for documents that are authored by `Muhleisen`. This retrieves `doc1`:

```sql
SELECT document_identifier, text_content, score
FROM (
    SELECT *, fts_main_documents.match_bm25(
        document_identifier,
        'Muhleisen',
        fields := 'author'
    ) AS score
    FROM documents
) sq
WHERE score IS NOT NULL
  AND doc_version > 2
ORDER BY score DESC;
```

| document_identifier |                             text_content                             | score |
|---------------------|----------------------------------------------------------------------|------:|
| doc1                | The mallard is a dabbling duck that breeds throughout the temperate. | 0.0   |

Search for documents about `small cats`. This retrieves `doc2`:

```sql
SELECT document_identifier, text_content, score
FROM (
    SELECT *, fts_main_documents.match_bm25(
        document_identifier,
        'small cats'
    ) AS score
    FROM documents
) sq
WHERE score IS NOT NULL
ORDER BY score DESC;
```

| document_identifier |                        text_content                        | score |
|---------------------|------------------------------------------------------------|------:|
| doc2                | The cat is a domestic species of small carnivorous mammal. | 0.0   |

> **Warning.** The FTS index will not update automatically when the input table changes.
> A workaround of this limitation can be recreating the index to refresh.

## httpfs (HTTP and S3) {#core_extensions:httpfs}

### httpfs Extension for HTTP and S3 Support {#docs:current:core_extensions:httpfs:overview}

The `httpfs` extension is an autoloadable extension implementing a file system that allows reading remote/writing remote files.
For plain HTTP(S), only file reading is supported. For object storage using the S3 API, the `httpfs` extension supports reading/writing/[globbing](#docs:current:sql:functions:pattern_matching::globbing) files.

#### Installation and Loading {#docs:current:core_extensions:httpfs:overview::installation-and-loading}

The `httpfs` extension will be, by default, autoloaded on first use of any functionality exposed by this extension.

To manually install and load the `httpfs` extension, run:

```sql
INSTALL httpfs;
LOAD httpfs;
```

#### HTTP(S) {#docs:current:core_extensions:httpfs:overview::https}

The `httpfs` extension supports connecting to [HTTP(S) endpoints](#docs:current:core_extensions:httpfs:https).

#### S3 API {#docs:current:core_extensions:httpfs:overview::s3-api}

The `httpfs` extension supports connecting to [S3 API endpoints](#docs:current:core_extensions:httpfs:s3api).

### HTTP(S) Support {#docs:current:core_extensions:httpfs:https}

With the `httpfs` extension, it is possible to directly query files over the HTTP(S) protocol. This works for all files supported by DuckDB or its various extensions, and provides read-only access.

```sql
SELECT *
FROM 'https://domain.tld/file.extension';
```

#### Partial Reading {#docs:current:core_extensions:httpfs:https::partial-reading}

For CSV files, files will be downloaded entirely in most cases, due to the row-based nature of the format.
For Parquet files, DuckDB supports [partial reading](#docs:current:data:parquet:overview::partial-reading), i.e., it can use a combination of the Parquet metadata and [HTTP range requests](https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests) to only download the parts of the file that are actually required by the query. For example, the following query will only read the Parquet metadata and the data for the `column_a` column:

```sql
SELECT column_a
FROM 'https://domain.tld/file.parquet';
```

In some cases, no actual data needs to be read at all as they only require reading the metadata:

```sql
SELECT count(*)
FROM 'https://domain.tld/file.parquet';
```

#### Scanning Multiple Files {#docs:current:core_extensions:httpfs:https::scanning-multiple-files}

Scanning multiple files over HTTP(S) is also supported:

```sql
SELECT *
FROM read_parquet([
    'https://domain.tld/file1.parquet',
    'https://domain.tld/file2.parquet'
]);
```

#### Authenticating {#docs:current:core_extensions:httpfs:https::authenticating}

To authenticate for an HTTP(S) endpoint, create an `HTTP` secret using the [Secrets Manager](#docs:current:configuration:secrets_manager):

```sql
CREATE SECRET http_auth (
    TYPE http,
    BEARER_TOKEN '⟨token⟩'
);
```

Or:

```sql
CREATE SECRET http_auth (
    TYPE http,
    EXTRA_HTTP_HEADERS MAP {
        'Authorization': 'Bearer ⟨token⟩'
    }
);
```

#### HTTP Proxy {#docs:current:core_extensions:httpfs:https::http-proxy}

DuckDB supports HTTP proxies.

You can add an HTTP proxy using the [Secrets Manager](#docs:current:configuration:secrets_manager):

```sql
CREATE SECRET http_proxy (
    TYPE http,
    HTTP_PROXY '⟨http_proxy_url⟩',
    HTTP_PROXY_USERNAME '⟨username⟩',
    HTTP_PROXY_PASSWORD '⟨password⟩'
);
```

You can also set the scope for an HTTP proxy using the `SCOPE` keyword.

```sql
CREATE SECRET http_proxy (
    TYPE HTTP, 
    SCOPE ['⟨https://duckdb.org⟩', '⟨https://some-other-website.org⟩'], 
    HTTP_PROXY '⟨http_proxy_url⟩',
    HTTP_PROXY_USERNAME '⟨username⟩',
    HTTP_PROXY_PASSWORD '⟨password⟩'
);
```

Alternatively, you can add it via [configuration options](#docs:current:configuration:pragmas):

```sql
SET http_proxy = '⟨http_proxy_url⟩';
SET http_proxy_username = '⟨username⟩';
SET http_proxy_password = '⟨password⟩';
```

Note: You cannot set a proxy scope using the configurations options.

#### Using a Custom Certificate File {#docs:current:core_extensions:httpfs:https::using-a-custom-certificate-file}

To use the `httpfs` extension with a custom certificate file, set the following [configuration options](#docs:current:configuration:pragmas) prior to loading the extension:

```sql
LOAD httpfs;
SET ca_cert_file = '⟨certificate_file⟩';
SET enable_server_cert_verification = true;
```

If you would like to disable SSL verification for all HTTP requests using an HTTP secret you can do so with the following statement:

```sql
CREATE SECRET disable_ssl (
    TYPE HTTP, 
    VERIFY_SSL 0
);
```

To enable it again for one specific endpoint, you can take advantage of the scope parameter:

```sql
CREATE SECRET enable_ssl_for_your_website (
    TYPE HTTP, 
    SCOPE 'https://⟨your-website.com⟩', 
    VERIFY_SSL 1
); 
```

### Hugging Face Support {#docs:current:core_extensions:httpfs:hugging_face}

The `httpfs` extension introduces support for the `hf://` protocol to access datasets hosted in [Hugging Face](https://huggingface.co/) repositories.
See the [announcement blog post](https://duckdb.org/2024/05/29/access-150k-plus-datasets-from-hugging-face-with-duckdb) for details.

#### Usage {#docs:current:core_extensions:httpfs:hugging_face::usage}

Hugging Face repositories can be queried using the following URL pattern:

```text
hf://datasets/⟨my_username⟩/⟨my_dataset⟩/⟨path_to_file⟩
```

For example, to read a CSV file, you can use the following query:

```sql
SELECT *
FROM 'hf://datasets/datasets-examples/doc-formats-csv-1/data.csv';
```

Where:

* `datasets-examples` is the name of the user/organization
* `doc-formats-csv-1` is the name of the dataset repository
* `data.csv` is the file path in the repository

The result of the query is:

|  kind   | sound |
|---------|-------|
| dog     | woof  |
| cat     | meow  |
| pokemon | pika  |
| human   | hello |

To read a JSONL file, you can run:

```sql
SELECT *
FROM 'hf://datasets/datasets-examples/doc-formats-jsonl-1/data.jsonl';
```

Finally, for reading a Parquet file, use the following query:

```sql
SELECT *
FROM 'hf://datasets/datasets-examples/doc-formats-parquet-1/data/train-00000-of-00001.parquet';
```

Each of these commands reads the data from the specified file format and displays it in a structured tabular format. Choose the appropriate command based on the file format you are working with.

#### Creating a Local Table {#docs:current:core_extensions:httpfs:hugging_face::creating-a-local-table}

To avoid accessing the remote endpoint for every query, you can save the data in a DuckDB table by running a [`CREATE TABLE ... AS` command](#docs:current:sql:statements:create_table::create-table--as-select-ctas). For example:

```sql
CREATE TABLE data AS
    SELECT *
    FROM 'hf://datasets/datasets-examples/doc-formats-csv-1/data.csv';
```

Then, simply query the `data` table as follows:

```sql
SELECT *
FROM data;
```

#### Multiple Files {#docs:current:core_extensions:httpfs:hugging_face::multiple-files}

To query all files under a specific directory, you can use a [glob pattern](#docs:current:data:multiple_files:overview::multi-file-reads-and-globs). For example:

```sql
SELECT count(*) AS count
FROM 'hf://datasets/cais/mmlu/astronomy/*.parquet';
```

| count |
|------:|
| 173   |

By using glob patterns, you can efficiently handle large datasets and perform comprehensive queries across multiple files, simplifying your data inspections and processing tasks.
Here, you can see how you can look for questions that contain the word “planet” in astronomy:

```sql
SELECT count(*) AS count
FROM 'hf://datasets/cais/mmlu/astronomy/*.parquet'
WHERE question LIKE '%planet%';
```

| count |
|------:|
| 21    |

#### Versioning and Revisions {#docs:current:core_extensions:httpfs:hugging_face::versioning-and-revisions}

In Hugging Face repositories, dataset versions or revisions are different dataset updates. Each version is a snapshot at a specific time, allowing you to track changes and improvements. In git terms, it can be understood as a branch or specific commit.

You can query different dataset versions/revisions by using the following URL:

```sql
hf://datasets/⟨my_username⟩/⟨my_dataset⟩@⟨my_branch⟩/⟨path_to_file⟩
```

For example:

```sql
SELECT *
FROM 'hf://datasets/datasets-examples/doc-formats-csv-1@~parquet/**/*.parquet';
```

|  kind   | sound |
|---------|-------|
| dog     | woof  |
| cat     | meow  |
| pokemon | pika  |
| human   | hello |

The previous query will read all Parquet files under the `~parquet` revision. This is a special branch where Hugging Face automatically generates the Parquet files of every dataset to enable efficient scanning.

#### Authentication {#docs:current:core_extensions:httpfs:hugging_face::authentication}

Configure your Hugging Face Token in the DuckDB Secrets Manager to access private or gated datasets.
First, visit [Hugging Face Settings – Tokens](https://huggingface.co/settings/tokens) to obtain your access token.
Second, set it in your DuckDB session using DuckDB’s [Secrets Manager](#docs:current:configuration:secrets_manager). DuckDB supports two providers for managing secrets:

##### `CONFIG` {#docs:current:core_extensions:httpfs:hugging_face::config}

The user must pass all configuration information into the `CREATE SECRET` statement. To create a secret using the `CONFIG` provider, use the following command:

```sql
CREATE SECRET hf_token (
    TYPE huggingface,
    TOKEN 'your_hf_token'
);
```

##### `credential_chain` {#docs:current:core_extensions:httpfs:hugging_face::credential_chain}

Automatically tries to fetch credentials. For the Hugging Face token, it will try to get it from `~/.cache/huggingface/token`. To create a secret using the `credential_chain` provider, use the following command:

```sql
CREATE SECRET hf_token (
    TYPE huggingface,
    PROVIDER credential_chain
);
```

### S3 API Support {#docs:current:core_extensions:httpfs:s3api}

The `httpfs` extension supports reading/writing/[globbing](#::globbing) files on object storage servers using the S3 API. S3 offers a standard API to read and write to remote files (while regular http servers, predating S3, do not offer a common write API). DuckDB conforms to the S3 API, that is now common among industry storage providers.

#### Platforms {#docs:current:core_extensions:httpfs:s3api::platforms}

The `httpfs` filesystem is tested with [AWS S3](https://aws.amazon.com/s3/), [Minio](https://min.io/), [Google Cloud](https://cloud.google.com/storage/docs/interoperability) and [lakeFS](https://docs.lakefs.io/integrations/duckdb.html). Other services that implement the S3 API (such as [Cloudflare R2](https://www.cloudflare.com/en-gb/developer-platform/r2/) and [Tigris](https://www.tigrisdata.com/)) should also work, but not all features may be supported.

The following table shows which parts of the S3 API are required for each `httpfs` feature.

| Feature | Required S3 API features |
|:---|:---|
| Public file reads | HTTP Range requests |
| Private file reads | Secret key or session token authentication |
| File glob | [ListObjectsV2](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html) |
| File writes | [Multipart upload](https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html) |

#### Configuration and Authentication {#docs:current:core_extensions:httpfs:s3api::configuration-and-authentication}

The preferred way to configure and authenticate to S3 endpoints is to use [secrets](#docs:current:sql:statements:create_secret). Multiple secret providers are available.

To migrate from the [deprecated S3 API](#docs:current:core_extensions:httpfs:s3api_legacy_authentication), use a defined secret with a profile.
See the [“Loading a Secret Based on a Profile” section](#::loading-a-secret-based-on-a-profile).

##### `config` Provider {#docs:current:core_extensions:httpfs:s3api::config-provider}

The default provider, `config` (i.e., user-configured), allows access to the S3 bucket by manually providing a key. For example:

```sql
CREATE OR REPLACE SECRET secret (
    TYPE s3,
    PROVIDER config,
    KEY_ID '⟨AKIAIOSFODNN7EXAMPLE⟩',
    SECRET '⟨wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY⟩',
    REGION '⟨us-east-1⟩'
);
```

> **Tip.** If you get an IO Error (` Connection error for HTTP HEAD`), configure the endpoint explicitly via `ENDPOINT 's3.⟨your-region⟩.amazonaws.com'`{:.language-sql .highlight}.

Now, to query using the above secret, simply query any `s3://` prefixed file:

```sql
SELECT *
FROM 's3://⟨your-bucket⟩/⟨your_file⟩.parquet';
```

##### `credential_chain` Provider {#docs:current:core_extensions:httpfs:s3api::credential_chain-provider}

The `credential_chain` provider allows automatically fetching credentials using mechanisms provided by the AWS SDK. For example, to use the AWS SDK default provider:

```sql
CREATE OR REPLACE SECRET secret (
    TYPE s3,
    PROVIDER credential_chain
);
```

Again, to query a file using the above secret, simply query any `s3://` prefixed file.

DuckDB also allows specifying a specific chain using the `CHAIN` keyword. This takes a semicolon-separated list (` a;b;c`) of providers that will be tried in order. For example:

```sql
CREATE OR REPLACE SECRET secret (
    TYPE s3,
    PROVIDER credential_chain,
    CHAIN 'env;config'
);
```

The possible values for `CHAIN` are the following:

* [`config`](https://sdk.amazonaws.com/cpp/api/LATEST/aws-cpp-sdk-core/html/class_aws_1_1_auth_1_1_profile_config_file_a_w_s_credentials_provider.html)
* [`sts`](https://sdk.amazonaws.com/cpp/api/LATEST/aws-cpp-sdk-core/html/class_aws_1_1_auth_1_1_s_t_s_assume_role_web_identity_credentials_provider.html)
* [`sso`](https://aws.amazon.com/what-is/sso/)
* [`env`](https://sdk.amazonaws.com/cpp/api/LATEST/aws-cpp-sdk-core/html/class_aws_1_1_auth_1_1_environment_a_w_s_credentials_provider.html)
* [`instance`](https://sdk.amazonaws.com/cpp/api/LATEST/aws-cpp-sdk-core/html/class_aws_1_1_auth_1_1_instance_profile_credentials_provider.html)
* [`process`](https://sdk.amazonaws.com/cpp/api/LATEST/aws-cpp-sdk-core/html/class_aws_1_1_auth_1_1_process_credentials_provider.html)

The `credential_chain` provider also allows overriding the automatically fetched config. For example, to automatically load credentials, and then override the region, run:

```sql
CREATE OR REPLACE SECRET secret (
    TYPE s3,
    PROVIDER credential_chain,
    CHAIN config,
    REGION '⟨eu-west-1⟩'
);
```

###### Loading a Secret Based on a Profile {#docs:current:core_extensions:httpfs:s3api::loading-a-secret-based-on-a-profile}

To load credentials based on a profile which is not defined as a default from the `AWS_PROFILE` environment variable or as a default profile based on AWS SDK precedence, run:

```sql
CREATE OR REPLACE SECRET secret (
    TYPE s3,
    PROVIDER credential_chain,
    CHAIN config,
    PROFILE '⟨my_profile⟩'
);
```

This approach is equivalent to the [deprecated S3 API's](#docs:current:core_extensions:httpfs:s3api_legacy_authentication)'s method `load_aws_credentials('⟨my_profile⟩')`.

##### Overview of S3 Secret Parameters {#docs:current:core_extensions:httpfs:s3api::overview-of-s3-secret-parameters}

Below is a complete list of the supported parameters that can be used for both the `config` and `credential_chain` providers:

| Name                          | Description                                                                           | Secret            | Type      | Default                                     |
|:------------------------------|:--------------------------------------------------------------------------------------|:------------------|:----------|:--------------------------------------------|
| `ENDPOINT`                    | Specify a custom S3 endpoint                                                          | `S3`, `GCS`, `R2` | `STRING`  | `s3.amazonaws.com` for `S3`,                |
| `KEY_ID`                      | The ID of the key to use                                                              | `S3`, `GCS`, `R2` | `STRING`  | -                                           |
| `REGION`                      | The region for which to authenticate (should match the region of the bucket to query) | `S3`, `GCS`, `R2` | `STRING`  | `us-east-1`                                 |
| `SECRET`                      | The secret of the key to use                                                          | `S3`, `GCS`, `R2` | `STRING`  | -                                           |
| `SESSION_TOKEN`               | Optionally, a session token can be passed to use temporary credentials                | `S3`, `GCS`, `R2` | `STRING`  | -                                           |
| `URL_COMPATIBILITY_MODE`      | Can help when URLs contain problematic characters                                     | `S3`, `GCS`, `R2` | `BOOLEAN` | `true`                                      |
| `URL_STYLE`                   | Either `vhost` or `path`                                                              | `S3`, `GCS`, `R2` | `STRING`  | `vhost` for `S3`, `path` for `R2` and `GCS` |
| `USE_SSL`                     | Whether to use HTTPS or HTTP                                                          | `S3`, `GCS`, `R2` | `BOOLEAN` | `true`                                      |
| `VERIFY_SSL`                  | Whether to verify the SSL certificate of the server                                   | `S3`, `GCS`, `R2` | `BOOLEAN` | `true`                                      |
| `ACCOUNT_ID`                  | The R2 account ID to use for generating the endpoint URL                              | `R2`              | `STRING`  | -                                           |
| `KMS_KEY_ID`                  | AWS KMS (Key Management Service) key for Server Side Encryption S3                    | `S3`              | `STRING`  | -                                           |
| `REQUESTER_PAYS`              | Allows use of "requester pays" S3 buckets                                             | `S3`              | `BOOLEAN` | `false`                                     |

##### Platform-Specific Secret Types {#docs:current:core_extensions:httpfs:s3api::platform-specific-secret-types}

###### S3 Secrets {#docs:current:core_extensions:httpfs:s3api::s3-secrets}

The httpfs extension supports [Server Side Encryption via the AWS Key Management Service (KMS) on S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingKMSEncryption.html) using the `KMS_KEY_ID` option:

```sql
CREATE OR REPLACE SECRET secret (
    TYPE s3,
    PROVIDER credential_chain,
    CHAIN config,
    REGION '⟨eu-west-1⟩',
    KMS_KEY_ID 'arn:aws:kms:⟨region⟩:⟨account_id⟩:⟨key⟩/⟨key_id⟩',
    SCOPE 's3://⟨bucket-sub-path⟩'
);
```

###### R2 Secrets {#docs:current:core_extensions:httpfs:s3api::r2-secrets}

While [Cloudflare R2](https://www.cloudflare.com/developer-platform/r2) uses the regular S3 API, DuckDB has a special Secret type, `R2`, to make configuring it a bit simpler:

```sql
CREATE OR REPLACE SECRET secret (
    TYPE r2,
    KEY_ID '⟨AKIAIOSFODNN7EXAMPLE⟩',
    SECRET '⟨wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY⟩',
    ACCOUNT_ID '⟨my_account_id⟩'
);
```

Note the addition of the `ACCOUNT_ID` which is used to generate the correct endpoint URL for you. Also note that `R2` Secrets can also use both the `CONFIG` and `credential_chain` providers. However, since DuckDB uses an AWS client internally, when using `credential_chain`, the client will search for AWS credentials in the standard AWS credential locations (environment variables, credential files, etc.). Therefore, your R2 credentials must be made available as AWS environment variables (` AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`) for the credential chain to work properly. Finally, `R2` secrets are only available when using URLs starting with `r2://`, for example:

```sql
SELECT *
FROM read_parquet('r2://⟨some-file-that-uses-an-r2-secret⟩.parquet');
```

###### GCS Secrets {#docs:current:core_extensions:httpfs:s3api::gcs-secrets}

While [Google Cloud Storage](https://cloud.google.com/storage) is accessed by DuckDB using the S3 API, DuckDB has a special Secret type, `GCS`, to make configuring it a bit simpler:

```sql
CREATE OR REPLACE SECRET secret (
    TYPE gcs,
    KEY_ID '⟨my_hmac_access_id⟩',
    SECRET '⟨my_hmac_secret_key⟩'
);
```


**Important**: The `KEY_ID` and `SECRET` values must be HMAC keys generated specifically for Google Cloud Storage interoperability. These are not the same as regular GCP service account keys or access tokens. You can create HMAC keys by following the [Google Cloud documentation for managing HMAC keys](https://cloud.google.com/storage/docs/authentication/managing-hmackeys).

Note that the above secret will automatically have the correct Google Cloud Storage endpoint configured. Also note that `GCS` Secrets can also use both the `CONFIG` and `credential_chain` providers. However, since DuckDB uses an AWS client internally, when using `credential_chain`, the client will search for AWS credentials in the standard AWS credential locations (environment variables, credential files, etc.). Therefore, your GCS HMAC keys must be made available as AWS environment variables (` AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`) for the credential chain to work properly. Finally, `GCS` secrets are only available when using URLs starting with `gcs://` or `gs://`, for example:

```sql
SELECT *
FROM read_parquet('gcs://⟨some/file/that/uses/a/gcs/secret⟩.parquet');
```

#### Reading {#docs:current:core_extensions:httpfs:s3api::reading}

Reading files from S3 is now as simple as:

```sql
SELECT *
FROM 's3://⟨your-bucket⟩/⟨filename⟩.⟨extension⟩';
```

##### Partial Reading {#docs:current:core_extensions:httpfs:s3api::partial-reading}

The `httpfs` extension supports [partial reading](#docs:current:core_extensions:httpfs:https::partial-reading) from S3 buckets.

##### Reading Multiple Files {#docs:current:core_extensions:httpfs:s3api::reading-multiple-files}

Multiple files are also possible, for example:

```sql
SELECT *
FROM read_parquet([
    's3://⟨your-bucket⟩/⟨filename-1⟩.parquet',
    's3://⟨your-bucket⟩/⟨filename-2⟩.parquet'
]);
```

##### Globbing {#docs:current:core_extensions:httpfs:s3api::globbing}

File [globbing](#docs:current:sql:functions:pattern_matching::globbing) is implemented using the ListObjectsV2 API call and allows using filesystem-like glob patterns to match multiple files, for example:

```sql
SELECT *
FROM read_parquet('s3://⟨your-bucket⟩/*.parquet');
```

This query matches all files in the root of the bucket with the [Parquet extension](#docs:current:data:parquet:overview).

Several features for matching are supported, such as `*` to match any number of any character, `?` for any single character or `[0-9]` for a single character in a range of characters:

```sql
SELECT count(*) FROM read_parquet('s3://⟨your-bucket⟩/folder*/100?/t[0-9].parquet');
```

A useful feature when using globs is the `filename` option, which adds a column named `filename` that encodes the file that a particular row originated from:

```sql
SELECT *
FROM read_parquet('s3://⟨your-bucket⟩/*.parquet', filename = true);
```

This could for example result in:

| column_a | column_b | filename |
|:---|:---|:---|
| 1 | examplevalue1 | s3://bucket-name/file1.parquet |
| 2 | examplevalue1 | s3://bucket-name/file2.parquet |

##### Hive Partitioning {#docs:current:core_extensions:httpfs:s3api::hive-partitioning}

DuckDB also offers support for the [Hive partitioning scheme](#docs:current:data:partitioning:hive_partitioning), which is available when using HTTP(S) and S3 endpoints.

#### Writing {#docs:current:core_extensions:httpfs:s3api::writing}

Writing to S3 uses the multipart upload API. This allows DuckDB to robustly upload files at high speed. Writing to S3 works for both CSV and Parquet:

```sql
COPY table_name TO 's3://⟨your-bucket⟩/⟨filename⟩.⟨extension⟩';
```

Partitioned copy to S3 also works:

```sql
COPY table TO 's3://⟨your-bucket⟩/partitioned' (
    FORMAT parquet,
    PARTITION_BY (⟨part_col_a⟩, ⟨part_col_b⟩)
);
```

An automatic check is performed for existing files/directories, which is currently quite conservative (and on S3 will add a bit of latency). To disable this check and force writing, an `OVERWRITE_OR_IGNORE` flag is added:

```sql
COPY table TO 's3://⟨your-bucket⟩/partitioned' (
    FORMAT parquet,
    PARTITION_BY (⟨part_col_a⟩, ⟨part_col_b⟩),
    OVERWRITE_OR_IGNORE true
);
```

The naming scheme of the written files looks like this:

```sql
s3://⟨your-bucket⟩/partitioned/part_col_a=⟨val⟩/part_col_b=⟨val⟩/data_⟨thread_number⟩.parquet
```

##### Configuration {#docs:current:core_extensions:httpfs:s3api::configuration}

Some additional configuration options exist for the S3 upload, though the default values should suffice for most use cases.

| Name | Description |
|:---|:---|
| `s3_uploader_max_parts_per_file` | Used for part size calculation, see [AWS docs](https://docs.aws.amazon.com/AmazonS3/latest/userguide/qfacts.html) |
| `s3_uploader_max_filesize` | Used for part size calculation, see [AWS docs](https://docs.aws.amazon.com/AmazonS3/latest/userguide/qfacts.html) |
| `s3_uploader_thread_limit` | Maximum number of uploader threads |

### Legacy Authentication Scheme for S3 API {#docs:current:core_extensions:httpfs:s3api_legacy_authentication}

Prior to version 0.10.0, DuckDB did not have a [Secrets manager](#docs:current:sql:statements:create_secret). Hence, the configuration of and authentication to S3 endpoints was handled via variables. This page documents the legacy authentication scheme for the S3 API.

> **Warning.** This page describes a legacy method to store secrets as DuckDB settings.
> This increases the risk of accidentally leaking secrets (e.g., by printing their values).
> Therefore, avoid using these methods for storing secrets.
> The recommended way to configure and authenticate of S3 endpoints is to use [secrets](#docs:current:core_extensions:httpfs:s3api::configuration-and-authentication).

#### Legacy Authentication Scheme {#docs:current:core_extensions:httpfs:s3api_legacy_authentication::legacy-authentication-scheme}

To be able to read or write from S3, the correct region should be set:

```sql
SET s3_region = 'us-east-1';
```

Optionally, the endpoint can be configured in case a non-AWS object storage server is used:

```sql
SET s3_endpoint = '⟨domain⟩.⟨tld⟩:⟨port⟩';
```

If the endpoint is not SSL-enabled then run:

```sql
SET s3_use_ssl = false;
```

Switching between [path-style](https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html#path-style-access) and [vhost-style](https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html#virtual-hosted-style-access) URLs is possible using:

```sql
SET s3_url_style = 'path';
```

However, note that this may also require updating the endpoint. For example for AWS S3 it is required to change the endpoint to `s3.⟨region⟩.amazonaws.com`{:.language-sql .highlight}.

After configuring the correct endpoint and region, public files can be read. To also read private files, authentication credentials can be added:

```sql
SET s3_access_key_id = '⟨aws_access_key_id⟩';
SET s3_secret_access_key = '⟨aws_secret_access_key⟩';
```

Alternatively, temporary S3 credentials are also supported. They require setting an additional session token:

```sql
SET s3_session_token = '⟨aws_session_token⟩';
```

The [`aws` extension](#docs:current:core_extensions:aws) allows for loading AWS credentials.

#### Per-Request Configuration {#docs:current:core_extensions:httpfs:s3api_legacy_authentication::per-request-configuration}

Aside from the global S3 configuration described above, specific configuration values can be used on a per-request basis. This allows for use of multiple sets of credentials, regions, etc. These are used by including them on the S3 URI as query parameters. All the individual configuration values listed above can be set as query parameters. For instance:

```sql
SELECT *
FROM 's3://bucket/file.parquet?s3_access_key_id=accessKey&s3_secret_access_key=secretKey';
```

Multiple configurations per query are also allowed:

```sql
SELECT *
FROM 's3://bucket/file.parquet?s3_access_key_id=accessKey1&s3_secret_access_key=secretKey1' t1
INNER JOIN 's3://bucket/file.csv?s3_access_key_id=accessKey2&s3_secret_access_key=secretKey2' t2;
```

#### Configuration {#docs:current:core_extensions:httpfs:s3api_legacy_authentication::configuration}

Some additional configuration options exist for the S3 upload, though the default values should suffice for most use cases.

Additionally, most of the configuration options can be set via environment variables:

| DuckDB setting         | Environment variable       | Note                                     |
|:-----------------------|:---------------------------|:-----------------------------------------|
| `s3_region`            | `AWS_REGION`               | Takes priority over `AWS_DEFAULT_REGION` |
| `s3_region`            | `AWS_DEFAULT_REGION`       |                                          |
| `s3_access_key_id`     | `AWS_ACCESS_KEY_ID`        |                                          |
| `s3_secret_access_key` | `AWS_SECRET_ACCESS_KEY`    |                                          |
| `s3_session_token`     | `AWS_SESSION_TOKEN`        |                                          |
| `s3_endpoint`          | `DUCKDB_S3_ENDPOINT`       |                                          |
| `s3_use_ssl`           | `DUCKDB_S3_USE_SSL`        |                                          |
| `s3_requester_pays`    | `DUCKDB_S3_REQUESTER_PAYS` |                                          |

## Iceberg {#core_extensions:iceberg}

### Iceberg Extension {#docs:current:core_extensions:iceberg:overview}

The `iceberg` extension implements support for the [Apache Iceberg open table format](https://iceberg.apache.org/). 
In this page we will go over the basic usage of the extension without the need to attach to an Iceberg catalog. For full support – including write support – see [how to attach Iceberg REST catalogs](#docs:current:core_extensions:iceberg:iceberg_rest_catalogs).

#### Installing and Loading {#docs:current:core_extensions:iceberg:overview::installing-and-loading}

The `iceberg` extension is installed and loaded automatically on first use.
If you would like to install and load it manually, run:

```sql
INSTALL iceberg;
LOAD iceberg;
```

#### Updating the Extension {#docs:current:core_extensions:iceberg:overview::updating-the-extension}

The `iceberg` extension often receives updates between DuckDB releases.
To make sure that you have the latest version, [update your extensions](#docs:current:sql:statements:update_extensions):

```sql
UPDATE EXTENSIONS;
```

#### Usage {#docs:current:core_extensions:iceberg:overview::usage}

To test the examples, download the [`iceberg_data.zip`](https://duckdb.org/data/iceberg_data.zip) file and unzip it.

##### Common Parameters {#docs:current:core_extensions:iceberg:overview::common-parameters}

| Parameter                    | Type        | Default                                    | Description                                                |
| ---------------------------- | ----------- | ------------------------------------------ | ---------------------------------------------------------- |
| `allow_moved_paths`          | `BOOLEAN`   | `false`                                    | Allows scanning Iceberg tables that are moved              |
| `metadata_compression_codec` | `VARCHAR`   | `''`                                       | Treats metadata files as when set to `'gzip'`              |
| `snapshot_from_id`           | `UBIGINT`   | `NULL`                                     | Access snapshot with a specific `id`                       |
| `snapshot_from_timestamp`    | `TIMESTAMP` | `NULL`                                     | Access snapshot with a specific `timestamp`                |
| `version`                    | `VARCHAR`   | `'?'`                                      | Provides an explicit version string, hint file or guessing |
| `version_name_format`        | `VARCHAR`   | `'v%s%s.metadata.json,%s%s.metadata.json'` | Controls how versions are converted to metadata file names |

##### Querying Individual Tables {#docs:current:core_extensions:iceberg:overview::querying-individual-tables}

```sql
SELECT count(*)
FROM iceberg_scan('data/iceberg/lineitem_iceberg', allow_moved_paths = true);
```

| count_star() |
|-------------:|
| 51793        |

> The `allow_moved_paths` option ensures that some path resolution is performed, 
> which allows scanning Iceberg tables that are moved.

You can also directly specify the current manifest in the query, this may be resolved from the catalog prior to the query, in this example the manifest version is a UUID.
To do so, navigate to the `data/iceberg` directory and run:

```sql
SELECT count(*)
FROM iceberg_scan('lineitem_iceberg/metadata/v1.metadata.json');
```

| count_star() |
|-------------:|
| 60175        |

The `iceberg` extension works together with the [`httpfs` extension](#docs:current:core_extensions:httpfs:overview) or the [`azure` extension](#docs:current:core_extensions:azure) to access Iceberg tables in object stores such as S3 or Azure Blob Storage.

```sql
SELECT count(*)
FROM iceberg_scan('s3://bucketname/lineitem_iceberg/metadata/v1.metadata.json');
```

##### Access Iceberg Metadata {#docs:current:core_extensions:iceberg:overview::access-iceberg-metadata}

To access Iceberg Metadata, you can use the `iceberg_metadata` function:

```sql
SELECT *
FROM iceberg_metadata('data/iceberg/lineitem_iceberg', allow_moved_paths = true);
```



|                             manifest_path                              | manifest_sequence_number | manifest_content | status  | content  |                                     file_path                                      | file_format | record_count |
|------------------------------------------------------------------------|--------------------------|------------------|---------|----------|------------------------------------------------------------------------------------|-------------|--------------|
| lineitem_iceberg/metadata/10eaca8a-1e1c-421e-ad6d-b232e5ee23d3-m1.avro | 2                        | DATA             | ADDED   | EXISTING | lineitem_iceberg/data/00041-414-f3c73457-bbd6-4b92-9c15-17b241171b16-00001.parquet | PARQUET     | 51793        |
| lineitem_iceberg/metadata/10eaca8a-1e1c-421e-ad6d-b232e5ee23d3-m0.avro | 2                        | DATA             | DELETED | EXISTING | lineitem_iceberg/data/00000-411-0792dcfe-4e25-4ca3-8ada-175286069a47-00001.parquet | PARQUET     | 60175        |

##### Visualizing Snapshots {#docs:current:core_extensions:iceberg:overview::visualizing-snapshots}

To visualize the snapshots in an Iceberg table, use the `iceberg_snapshots` function:

```sql
SELECT *
FROM iceberg_snapshots('data/iceberg/lineitem_iceberg');
```



| sequence_number |     snapshot_id     |      timestamp_ms       |                                         manifest_list                                          |
|-----------------|---------------------|-------------------------|------------------------------------------------------------------------------------------------|
| 1               | 3776207205136740581 | 2023-02-15 15:07:54.504 | lineitem_iceberg/metadata/snap-3776207205136740581-1-cf3d0be5-cf70-453d-ad8f-48fdc412e608.avro |
| 2               | 7635660646343998149 | 2023-02-15 15:08:14.73  | lineitem_iceberg/metadata/snap-7635660646343998149-1-10eaca8a-1e1c-421e-ad6d-b232e5ee23d3.avro |

> `iceberg_snapshots` does not take `allow_moved_paths`, `snapshot_from_id` or `snapshot_from_timestamp` as parameters.

##### Selecting Metadata Versions {#docs:current:core_extensions:iceberg:overview::selecting-metadata-versions}

By default, the `iceberg` extension will look for a `version-hint.text` file to identify the proper metadata version to use. This can be overridden by explicitly supplying a version number via the `version` parameter to the functions of the `iceberg` extension:

```sql
SELECT *
FROM iceberg_snapshots(
    'data/iceberg/lineitem_iceberg',
    version = '1'
);
```

By default, `iceberg` functions will look for both `v{version}.metadata.json` and `{version}.metadata.json` files, or `v{version}.gz.metadata.json` and `{version}.gz.metadata.json` when `metadata_compression_codec = 'gzip'` is specified.
Other compression codecs are not supported.

If any text file is provided through the `version` parameter, it is opened and treated as a version hint file:

```sql
SELECT *
FROM iceberg_snapshots(
    'data/iceberg/lineitem_iceberg',
    version = 'version-hint.txt'
);
```

The `iceberg` extension will open this file and use the **entire content** of the file as a provided version number.
Note that the entire content of the `version-hint.txt` file will be treated as a literal version name, with no encoding, escaping or trimming. This includes any whitespace, or unsafe characters  which will be explicitly passed formatted into filenames in the logic described below.

##### Working with Alternative Metadata Naming Conventions {#docs:current:core_extensions:iceberg:overview::working-with-alternative-metadata-naming-conventions}

The `iceberg` extension can handle different metadata naming conventions by specifying them as a comma-delimited list of format strings via the `version_name_format` parameter. Each format string must contain two `%s` parameters. The first is the location of the version number in the metadata filename and the second is the location of the filename extension specified by the `metadata_compression_codec`. The behavior described above is provided by the default value of `"v%s%s.metadata.gz,%s%smetadata.gz`.
If you had an alternatively named metadata file, e.g., `rev-2.metadata.json.gz`, the table can be read via the follow statement:

```sql
SELECT *
FROM iceberg_snapshots(
    'data/iceberg/alternative_metadata_gz_naming',
    version = '2',
    version_name_format = 'rev-%s.metadata.json%s',
    metadata_compression_codec = 'gzip'
);
```

##### “Guessing” Metadata Versions {#docs:current:core_extensions:iceberg:overview::guessing-metadata-versions}

By default, either a table version number or a `version-hint.text` **must** be provided for the `iceberg` extension to read a table. This is typically provided by an external data catalog. In the event neither is present, the `iceberg` extension can attempt to guess the latest version by passing `?` as the `version` parameter:

```sql
SELECT count(*)
FROM iceberg_scan(
    'data/iceberg/lineitem_iceberg_no_hint',
    version = '?',
    allow_moved_paths = true
);
```

The “latest” version is assumed to be the filename that is lexicographically largest when sorting the filenames. Collations are not considered. This behavior is not enabled by default as it may potentially violate ACID constraints. It can be enabled by setting `unsafe_enable_version_guessing` to `true`. When this is set, `iceberg` functions will attempt to guess the latest version by default before failing.

```sql
SET unsafe_enable_version_guessing = true;
SELECT count(*)
FROM iceberg_scan(
    'data/iceberg/lineitem_iceberg_no_hint',
    allow_moved_paths = true
);
```

#### Limitations {#docs:current:core_extensions:iceberg:overview::limitations}

* Inserts into v3 Iceberg specification tables.
* Reads from v3 tables with v2 data types.
* Geometry data type.

For a set of unsupported operations when attaching to an Iceberg catalog, see [Unsupported Operations](#docs:current:core_extensions:iceberg:iceberg_rest_catalogs::unsupported-operations).

### Iceberg REST Catalogs {#docs:current:core_extensions:iceberg:iceberg_rest_catalogs}

The `iceberg` extension supports attaching Iceberg REST Catalogs. Before attaching an Iceberg REST Catalog, you must install the `iceberg` extension by following the instructions located in the [overview](#docs:current:core_extensions:iceberg:overview).

If you are attaching to an Iceberg REST Catalog managed by Amazon, please see the instructions for attaching to [Amazon S3 Tables](#docs:current:core_extensions:iceberg:amazon_s3_tables) or [Amazon SageMaker Lakehouse](#docs:current:core_extensions:iceberg:amazon_sagemaker_lakehouse).

For all other Iceberg REST Catalogs, you can follow the instructions below. Please see the [Examples](#::specific-catalog-examples) section for questions about specific catalogs.

Most Iceberg REST Catalogs authenticate via OAuth2. You can use the existing DuckDB secret workflow to store login credentials for the OAuth2 service.

```sql
CREATE SECRET iceberg_secret (
    TYPE iceberg,
    CLIENT_ID '⟨admin⟩',
    CLIENT_SECRET '⟨password⟩',
    OAUTH2_SERVER_URI '⟨http://iceberg_rest_catalog_url.com/v1/oauth/tokens⟩'
);
```

If you already have a Bearer token, you can pass it directly to your `CREATE SECRET` statement

```sql
CREATE SECRET iceberg_secret (
    TYPE iceberg,
    TOKEN '⟨bearer_token⟩'
);
```

You can attach the Iceberg catalog with the following [`ATTACH`](#docs:current:sql:statements:attach) statement.

```sql
LOAD httpfs;
ATTACH '⟨warehouse⟩' AS iceberg_catalog (
   TYPE iceberg,
   SECRET iceberg_secret, -- pass a specific secret name to prevent ambiguity
   ENDPOINT '⟨https://rest_endpoint.com⟩'
);
```

To see the available tables run
```sql
SHOW ALL TABLES;
```

#### `ATTACH` Options {#docs:current:core_extensions:iceberg:iceberg_rest_catalogs::attach-options}

A REST Catalog with OAuth2 authorization can also be attached with just an `ATTACH` statement. See the complete list of `ATTACH` options for a REST Catalog below.

| Parameter                   | Type       | Default              | Description                                                                                                                                                          |
| --------------------------- | ---------- | -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `ENDPOINT_TYPE`             | `VARCHAR`  | `NULL`               | Used for attaching S3 Tables or Glue catalogs. Allowed values are `GLUE` and `S3_TABLES`.                                                                            |
| `ENDPOINT`                  | `VARCHAR`  | `NULL`               | URL endpoint to communicate with the REST Catalog. Cannot be used in conjunction with `ENDPOINT_TYPE`.                                                               |
| `SECRET`                    | `VARCHAR`  | `NULL`               | Name of secret used to communicate with the REST Catalog.                                                                                                            |
| `CLIENT_ID`                 | `VARCHAR`  | `NULL`               | `CLIENT_ID` used for Secret.                                                                                                                                         |
| `CLIENT_SECRET`             | `VARCHAR`  | `NULL`               | `CLIENT_SECRET` needed for Secret.                                                                                                                                   |
| `DEFAULT_REGION`            | `VARCHAR`  | `NULL`               | A Default region to use when communicating with the storage layer.                                                                                                   |
| `OAUTH2_SERVER_URI`         | `VARCHAR`  | `NULL`               | OAuth2 server url for getting a Bearer Token.                                                                                                                        |
| `AUTHORIZATION_TYPE`        | `VARCHAR`  | `OAUTH2`             | Pass `SigV4` for Catalogs the require SigV4 authorization, `none` for catalogs that don't need authentication.                                                       |
| `ACCESS_DELEGATION_MODE`    | `VARCHAR`  | `vended_credentials` | Access delegation mode. Allowed values are `vended_credentials` and `none`.                                                                                          |
| `EXTRA_HTTP_HEADERS`        | `MAP`      | `NULL`               | Additional HTTP headers to send with REST Catalog requests.                                                                                                          |
| `SUPPORT_NESTED_NAMESPACES` | `BOOLEAN`  | `true`               | Option for catalogs that support nested namespaces.                                                                                                                  |
| `SUPPORT_STAGE_CREATE`      | `BOOLEAN`  | `false`              | Option for catalogs that do not support stage create.                                                                                                                |
| `MAX_TABLE_STALENESS`       | `INTERVAL` | `NULL`               | Option for preventing unnecessary requests to the Iceberg REST Catalog. You can pass human readable interval strings. `10 minutes`, `30 seconds`, `1 year` all work. |
| `PURGE_REQUESTED`        | `BOOLEAN` | `true`  | Option to send the [PurgeRequested](https://github.com/apache/iceberg/blob/4b4eb38cf6dda7b43faeb40eb00aa5db424d2ecb/open-api/rest-catalog-open-api.yaml#L1144) parameter when dropping a table.                                                 |

The following options can only be passed to a `CREATE SECRET` statement and they require `AUTHORIZATION_TYPE` to be `OAUTH2`:

| Parameter           | Type      | Default | Description                                          |
| ------------------- | --------- | ------- | ---------------------------------------------------- |
| `OAUTH2_GRANT_TYPE` | `VARCHAR` | `NULL`  | Grant Type when requesting an OAuth Token.           |
| `OAUTH2_SCOPE`      | `VARCHAR` | `NULL`  | Requested scope for the returned OAuth Access Token. |


##### Supported Operations {#docs:current:core_extensions:iceberg:iceberg_rest_catalogs::supported-operations}

The DuckDB Iceberg extension supports the following operations when used with a REST Catalog attached:

* `CREATE/DROP SCHEMA`
* `CREATE/DROP TABLE`
* `INSERT INTO`
* `UPDATE`
* `DELETE`
* `SELECT`

Since these operations are supported, the following will also work:

```sql
COPY FROM DATABASE duckdb_db TO iceberg_datalake;
-- Or
COPY FROM DATABASE iceberg_datalake TO duckdb_db;
```

This functionality enables deep copies between Iceberg and DuckDB storage.

###### Limitations for UPDATE and DELETE {#docs:current:core_extensions:iceberg:iceberg_rest_catalogs::limitations-for-update-and-delete}

The `UPDATE` and `DELETE` operations have the following limitations:

* They only work on tables that are **not partitioned** and **not sorted**. Attempting these operations on partitioned or sorted tables results in an error.
* DuckDB-Iceberg only writes **positional deletes**. Copy-on-write functionality is not yet supported.
* DuckDB-Iceberg only supports **merge-on-read semantics**. If a table has `write.update.mode` or `write.delete.mode` properties set to something other than `merge-on-read`, the operation fails.

##### Metadata Operations {#docs:current:core_extensions:iceberg:iceberg_rest_catalogs::metadata-operations}

The functions `iceberg_metadata` and `iceberg_snapshots` are also available to use with an Iceberg REST Catalog using a fully qualified path, e.g.:

```sql
SELECT * FROM iceberg_metadata(my_datalake.default.t)

-- Or
SELECT * FROM iceberg_snapshots(my_datalake.default.t)
```

This functionality enables the user to do **time traveling**.

```sql
-- Using a snapshot id
SELECT * FROM my_datalake.default.t AT (VERSION => ⟨SNAPSHOT_ID⟩)

-- Or using a timestamp
SELECT * FROM my_datalake.default.t AT (TIMESTAMP => TIMESTAMP '2025-09-22 12:32:43.217')
```

##### Interoperability with DuckLake {#docs:current:core_extensions:iceberg:iceberg_rest_catalogs::interoperability-with-ducklake}

The DuckDB Iceberg extensions exposes a function to do metadata only copies of the Iceberg metadata to [DuckLake](#docs:current:core_extensions:ducklake), which enables users to query Iceberg tables as if they were DuckLake tables.

```sql
-- Given that we have an Iceberg catalog attached aliased to iceberg_datalake
ATTACH 'ducklake:my_ducklake.ducklake' AS my_ducklake;

CALL iceberg_to_ducklake('iceberg_datalake', 'my_ducklake');
```

It is also possible to skip a set of tables provided the `skip_tables` parameter.

```sql
CALL iceberg_to_ducklake('iceberg_datalake', 'my_ducklake', skip_tables := ['table_to_skip']);
```

##### Table Properties Functions {#docs:current:core_extensions:iceberg:iceberg_rest_catalogs::table-properties-functions}

DuckDB provides functions to view and modify [Iceberg table properties](https://iceberg.apache.org/spec/#table-metadata-fields):

| Function                                                | Description                                    |
| ------------------------------------------------------- | ---------------------------------------------- |
| `iceberg_table_properties(table)`                       | Returns all properties of the specified table. |
| `set_iceberg_table_properties(table, properties)`       | Sets properties on the specified table.        |
| `remove_iceberg_table_properties(table, property_list)` | Removes properties from the specified table.   |

```sql
-- View table properties
SELECT *
FROM iceberg_table_properties(iceberg_catalog.default.my_table);

-- Set table properties
CALL set_iceberg_table_properties(
    iceberg_catalog.default.my_table,
    {'write.update.mode': 'merge-on-read', 'write.delete.mode': 'merge-on-read'}
);

-- Remove table properties
CALL remove_iceberg_table_properties(
    iceberg_catalog.default.my_table,
    ['some.property']
);
```

You can also create a table with table properties.

```sql
CREATE TABLE test_create_table (a INTEGER)
WITH (
    'format-version' = '2', -- format version will be elevated to format-version when creating a table
    'location' = 's3://path/to/data', -- location will be elevated to location when creating a table
    'property1' = 'value1',
    'property2' = 'value2'
);
```

##### Unsupported Operations {#docs:current:core_extensions:iceberg:iceberg_rest_catalogs::unsupported-operations}

The following operations are not supported by the DuckDB Iceberg extension:

* `MERGE INTO`
* `ALTER TABLE`

#### Specific Catalog Examples {#docs:current:core_extensions:iceberg:iceberg_rest_catalogs::specific-catalog-examples}

##### Cloudflare R2 Catalog {#docs:current:core_extensions:iceberg:iceberg_rest_catalogs::cloudflare-r2-catalog}

To attach to an [R2 Cloudflare](https://developers.cloudflare.com/r2/data-catalog/) managed catalog follow the attach steps below.

```sql
CREATE SECRET r2_secret (
    TYPE iceberg,
    TOKEN '⟨r2_token⟩'
);
```

You can create a token by following the [create an API token](https://developers.cloudflare.com/r2/data-catalog/get-started/#3-create-an-api-token) steps in getting started. Then, attach the catalog with the following commands.

```sql
ATTACH '⟨warehouse⟩' AS my_r2_catalog (
    TYPE iceberg,
    ENDPOINT '⟨catalog-uri⟩'
);
```

The variables for `warehouse` and `catalog-uri` are available under the settings of the R2 Object Storage Catalog (R2 Object Store, Catalog name, Settings).

Once you attached to the R2 Data Catalog, create a schema. You can set it as default with the `USE` command:

```sql
CREATE SCHEMA my_r2_catalog.my_schema;
USE my_r2_catalog.my_schema;
```

##### Polaris {#docs:current:core_extensions:iceberg:iceberg_rest_catalogs::polaris}

To attach to a [Polaris](https://polaris.apache.org) catalog, use the following commands:

```sql
CREATE SECRET polaris_secret (
    TYPE iceberg,
    CLIENT_ID '⟨admin⟩',
    CLIENT_SECRET '⟨password⟩',
);
```

```sql
ATTACH 'quickstart_catalog' AS polaris_catalog (
    TYPE iceberg,
    ENDPOINT '⟨polaris_rest_catalog_endpoint⟩',
    ACCESS_DELEGATION_MODE 'vended_credentials'
);
```

##### Lakekeeper {#docs:current:core_extensions:iceberg:iceberg_rest_catalogs::lakekeeper}

To attach to a [Lakekeeper](https://docs.lakekeeper.io) catalog the following commands will work.

```sql
CREATE SECRET lakekeeper_secret (
    TYPE iceberg,
    CLIENT_ID '⟨admin⟩',
    CLIENT_SECRET '⟨password⟩',
    OAUTH2_SCOPE '⟨scope⟩',
    OAUTH2_SERVER_URI '⟨lakekeeper_oauth_url⟩'
);
```

```sql
ATTACH '⟨warehouse⟩' AS lakekeeper_catalog (
    TYPE iceberg,
    ENDPOINT '⟨lakekeeper_irc_url⟩',
    SECRET '⟨lakekeeper_secret⟩'
);
```

##### Google Cloud BigLake {#docs:current:core_extensions:iceberg:iceberg_rest_catalogs::google-cloud-biglake}

To attach to a [Google Cloud BigLake](https://cloud.google.com/biglake) catalog, you can use extra HTTP headers to specify the GCP project for billing purposes.

First, get your Google Cloud access token:

```bash
gcloud auth application-default print-access-token
```

Then create a secret with the token and extra headers:

```sql
CREATE SECRET biglake_secret (
    TYPE iceberg,
    TOKEN '⟨your_access_token⟩',
    EXTRA_HTTP_HEADERS MAP {
        'x-goog-user-project': '⟨your_gcp_project_id⟩'
    }
);
```

Attach to the BigLake catalog:

```sql
ATTACH '⟨gs://your-biglake-bucket⟩' AS biglake_catalog (
    TYPE iceberg,
    ENDPOINT 'https://biglake.googleapis.com/iceberg/v1/restcatalog',
    SECRET biglake_secret
);
```

Example using the [BigLake public dataset](https://opensource.googleblog.com/2026/01/explore-public-datasets-with-apache-iceberg-and-biglake.html):

```sql
CREATE SECRET biglake_public_secret (
    TYPE iceberg,
    TOKEN '⟨your_access_token⟩',
    EXTRA_HTTP_HEADERS MAP {
        'x-goog-user-project': '⟨your_gcp_project_id⟩'
    }
);

ATTACH 'gs://biglake-public-nyc-taxi-iceberg' AS biglake_public (
    TYPE iceberg,
    ENDPOINT 'https://biglake.googleapis.com/iceberg/v1/restcatalog',
    SECRET biglake_public_secret
);

-- Query the data
SELECT count(*) FROM biglake_public.public_data.nyc_taxicab;
```

> **Note.**: Google Cloud access tokens expire after 1 hour. For long-running sessions, you'll need to refresh the token periodically.

#### Limitations {#docs:current:core_extensions:iceberg:iceberg_rest_catalogs::limitations}

DuckDB supports Iceberg REST Catalogs backed by S3, S3 Tables, and Google Cloud Storage (GCS). Support for other storage backends is not yet available.

### Amazon S3 Tables {#docs:current:core_extensions:iceberg:amazon_s3_tables}

> Support for S3 Tables is currently experimental.

The `iceberg` extension supports reading Iceberg tables stored in [Amazon S3 Tables](https://aws.amazon.com/s3/features/tables/).

#### Requirements {#docs:current:core_extensions:iceberg:amazon_s3_tables::requirements}

Install the following extensions:

```sql
INSTALL aws;
INSTALL httpfs;
INSTALL iceberg;
```

#### Connecting to Amazon S3 Tables {#docs:current:core_extensions:iceberg:amazon_s3_tables::connecting-to-amazon-s3-tables}

You can let DuckDB detect your AWS credentials and configuration based on the default profile in your `~/.aws` directory by creating the following secret using the [Secrets Manager](#docs:current:configuration:secrets_manager):

```sql
CREATE SECRET (
    TYPE s3,
    PROVIDER credential_chain
);
```

Alternatively, you can set the values manually:

```sql
CREATE SECRET (
    TYPE s3,
    KEY_ID '⟨key_id⟩',
    SECRET '⟨secret⟩',
    REGION '⟨region⟩'
);
```

Then, connect to the catalog using your S3 Tables ARN (available in the AWS Management Console) and the `ENDPOINT_TYPE s3_tables` option:

```sql
ATTACH '⟨s3_tables_arn⟩' AS my_s3_tables_catalog (
   TYPE iceberg,
   ENDPOINT_TYPE s3_tables
);
```

To check whether the attachment worked, list all tables:

```sql
SHOW ALL TABLES;
```

You can query a table as follows:

```sql
SELECT count(*)
FROM my_s3_tables_catalog.⟨namespace_name⟩.⟨table_name⟩;
```

### Amazon SageMaker Lakehouse (AWS Glue) {#docs:current:core_extensions:iceberg:amazon_sagemaker_lakehouse}

> Support for Amazon SageMaker Lakehouse (AWS Glue) is currently experimental.

The `iceberg` extension supports reading Iceberg tables through the [Amazon SageMaker Lakehouse (a.k.a. AWS Glue)](https://aws.amazon.com/sagemaker/lakehouse/) catalog.

#### Requirements {#docs:current:core_extensions:iceberg:amazon_sagemaker_lakehouse::requirements}

To use it, install the following extensions:

```sql
INSTALL aws;
INSTALL httpfs;
INSTALL iceberg;
```

> If you want to switch back to using extensions from the `core` repository,
> follow the [extension documentation](#docs:current:extensions:installing_extensions::force-installing-to-upgrade-extensions).

#### Connecting to Amazon SageMaker Lakehouse (AWS Glue) {#docs:current:core_extensions:iceberg:amazon_sagemaker_lakehouse::connecting-to-amazon-sagemaker-lakehouse-aws-glue}

Create an S3 secret using the [Secrets Manager](#docs:current:configuration:secrets_manager):

```sql
CREATE SECRET (
    TYPE s3,
    PROVIDER credential_chain,
    CHAIN sts,
    ASSUME_ROLE_ARN 'arn:aws:iam::⟨account_id⟩:role/⟨role⟩',
    REGION 'us-east-2'
);
```

In this example we use an STS token, but [other authentication methods are supported](#docs:current:core_extensions:aws).

Then, connect to the catalog:

```sql
ATTACH '⟨account_id⟩' AS glue_catalog (
    TYPE iceberg,
    ENDPOINT 'glue.⟨REGION⟩.amazonaws.com/iceberg',
    AUTHORIZATION_TYPE 'sigv4'
);
```

Or alternatively:

```sql
ATTACH '⟨account_id⟩' AS glue_catalog (
    TYPE iceberg,
    ENDPOINT_TYPE 'glue'
);
```

To check whether the attachment worked, list all tables:

```sql
SHOW ALL TABLES;
```

You can query a table as follows:

```sql
SELECT count(*)
FROM glue_catalog.⟨namespace_name⟩.⟨table_name⟩;
```

If you have an S3 Tables federated catalog, you can create a table using the standard `CREATE TABLE` syntax;

```sql
CREATE TABLE glue_catalog.⟨namespace_name⟩.⟨table_name⟩ (a INTEGER, b VARCHAR);
```

If the catalog is not federated by S3 Tables, you may need to create pass a `location` table property. You can do so using the `WITH` clause.

```sql
CREATE TABLE glue_catalog.⟨namespace_name⟩.⟨table_name⟩ (a INTEGER, b VARCHAR)
WITH (
    'location' = 's3://path/to/location'
);
```

You can learn more about the `WITH` clause at [Table Properties](#docs:current:core_extensions:iceberg:iceberg_rest_catalogs::table-properties-functions).

### Troubleshooting {#docs:current:core_extensions:iceberg:troubleshooting}

#### Limitations {#docs:current:core_extensions:iceberg:troubleshooting::limitations}

* Reading tables with deletes is not yet supported.

#### Curl Request Fails {#docs:current:core_extensions:iceberg:troubleshooting::curl-request-fails}

##### Problem {#docs:current:core_extensions:iceberg:troubleshooting::problem}

When trying to attach to an Iceberg REST Catalog endpoint, DuckDB returns the following error:

```console
IO Error:
Curl Request to '/v1/oauth/tokens' failed with error: 'URL using bad/illegal format or missing URL'
```

##### Solution {#docs:current:core_extensions:iceberg:troubleshooting::solution}

Make sure that you have the latest Iceberg extension installed:

```batch
duckdb
```

```plsql
FORCE INSTALL iceberg FROM core_nightly;
```

Exit DuckDB and start a new session:

```batch
duckdb
```

```plsql
LOAD iceberg;
```

#### HTTP Error 403 {#docs:current:core_extensions:iceberg:troubleshooting::http-error-403}

##### Problem {#docs:current:core_extensions:iceberg:troubleshooting::problem}

When trying to list the tables in a remote-attached catalog, DuckDB returns the following error:

```sql
SHOW ALL TABLES;
```

```console
Failed to query https://s3tables.us-east-2.amazonaws.com/iceberg/v1/arn:aws:s3tables:... http error 403 thrown.
Message: {"message":"The security token included in the request is invalid."}
```

##### Solution {#docs:current:core_extensions:iceberg:troubleshooting::solution}

Use the `duckdb_secrets()` function to check whether DuckDB loaded the required credentials:

```sql
.mode line
FROM duckdb_secrets();
```

If you do not see your credentials, set them manually using the following secret:

```sql
CREATE SECRET (
    TYPE s3,
    KEY_ID '⟨AKIAIOSFODNN7EXAMPLE⟩',
    SECRET '⟨wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY⟩',
    REGION '⟨us-east-1⟩'
);
```

## ICU Extension {#docs:current:core_extensions:icu}

The `icu` extension contains an easy-to-use version of the collation/timezone part of the [ICU library](https://github.com/unicode-org/icu).

#### Installing and Loading {#docs:current:core_extensions:icu::installing-and-loading}

The `icu` extension will be transparently [autoloaded](#docs:current:extensions:overview::autoloading-extensions) on first use from the official extension repository.
If you would like to install and load it manually, run:

```sql
INSTALL icu;
LOAD icu;
```

#### Features {#docs:current:core_extensions:icu::features}

The `icu` extension introduces the following features:

* [Region-dependent collations](#docs:current:sql:expressions:collations)
* [Time zones](#docs:current:sql:data_types:timezones), used for [timestamp data types](#docs:current:sql:data_types:timestamp) and [timestamp functions](#docs:current:sql:functions:timestamptz)

## inet Extension {#docs:current:core_extensions:inet}

The `inet` extension defines the `INET` data type for storing [IPv4](https://en.wikipedia.org/wiki/Internet_Protocol_version_4) and [IPv6](https://en.wikipedia.org/wiki/IPv6) Internet addresses. It supports the [CIDR notation](https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing#CIDR_notation) for subnet masks (e.g., `198.51.100.0/22`, `2001:db8:3c4d::/48`).

#### Installing and Loading {#docs:current:core_extensions:inet::installing-and-loading}

The `inet` extension will be transparently [autoloaded](#docs:current:extensions:overview::autoloading-extensions) on first use from the official extension repository.
If you would like to install and load it manually, run:

```sql
INSTALL inet;
LOAD inet;
```

#### Examples {#docs:current:core_extensions:inet::examples}

```sql
SELECT '127.0.0.1'::INET AS ipv4, '2001:db8:3c4d::/48'::INET AS ipv6;
```



|   ipv4    |        ipv6        |
|-----------|--------------------|
| 127.0.0.1 | 2001:db8:3c4d::/48 |

```sql
CREATE TABLE tbl (id INTEGER, ip INET);
INSERT INTO tbl VALUES
    (1, '192.168.0.0/16'),
    (2, '127.0.0.1'),
    (3, '8.8.8.8'),
    (4, 'fe80::/10'),
    (5, '2001:db8:3c4d:15::1a2f:1a2b');
SELECT * FROM tbl;
```



| id |             ip              |
|---:|-----------------------------|
| 1  | 192.168.0.0/16              |
| 2  | 127.0.0.1                   |
| 3  | 8.8.8.8                     |
| 4  | fe80::/10                   |
| 5  | 2001:db8:3c4d:15::1a2f:1a2b |

#### Operations on `INET` Values {#docs:current:core_extensions:inet::operations-on-inet-values}

`INET` values can be compared naturally, and IPv4 will sort before IPv6. Additionally, IP addresses can be modified by adding or subtracting integers.

```sql
CREATE TABLE tbl (cidr INET);
INSERT INTO tbl VALUES
    ('127.0.0.1'::INET + 10),
    ('fe80::10'::INET - 9),
    ('127.0.0.1'),
    ('2001:db8:3c4d:15::1a2f:1a2b');
SELECT cidr FROM tbl ORDER BY cidr ASC;
```



|            cidr             |
|-----------------------------|
| 127.0.0.1                   |
| 127.0.0.11                  |
| 2001:db8:3c4d:15::1a2f:1a2b |
| fe80::7                     |

#### `host` Function {#docs:current:core_extensions:inet::host-function}

The host component of an `INET` value can be extracted using the `HOST()` function.

```sql
CREATE TABLE tbl (cidr INET);
INSERT INTO tbl VALUES
    ('192.168.0.0/16'),
    ('127.0.0.1'),
    ('2001:db8:3c4d:15::1a2f:1a2b/96');
SELECT cidr, host(cidr) AS host FROM tbl;
```



|              cidr              |            host             |
|--------------------------------|-----------------------------|
| 192.168.0.0/16                 | 192.168.0.0                 |
| 127.0.0.1                      | 127.0.0.1                   |
| 2001:db8:3c4d:15::1a2f:1a2b/96 | 2001:db8:3c4d:15::1a2f:1a2b |


#### `netmask` Function {#docs:current:core_extensions:inet::netmask-function}

Computes the network mask for the address's network.

```sql
CREATE TABLE tbl (cidr INET);
INSERT INTO tbl VALUES
    ('192.168.1.5/24'),
    ('127.0.0.1'),
    ('2001:db8:3c4d:15::1a2f:1a2b/96');
SELECT cidr, netmask(cidr) AS netmask FROM tbl;
```



|              cidr              |              netmask               |
|--------------------------------|------------------------------------|
| 192.168.1.5/24                 | 255.255.255.0/24                   |
| 127.0.0.1                      | 255.255.255.255                    |
| 2001:db8:3c4d:15::1a2f:1a2b/96 | ffff:ffff:ffff:ffff:ffff:ffff::/96 |

#### `network` Function {#docs:current:core_extensions:inet::network-function}

Returns the network part of the address, zeroing out whatever is to the right of the netmask.

```sql
CREATE TABLE tbl (cidr INET);
INSERT INTO tbl VALUES
    ('192.168.1.5/24'),
    ('127.0.0.1'),
    ('2001:db8:3c4d:15::1a2f:1a2b/96');
SELECT cidr, network(cidr) AS network FROM tbl;
```



|              cidr              |        network        |
|--------------------------------|-----------------------|
| 192.168.1.5/24                 | 192.168.1.0/24        |
| 127.0.0.1                      | 127.0.0.1             |
| 2001:db8:3c4d:15::1a2f:1a2b/96 | 2001:db8:3c4d:15::/96 |

#### `broadcast` Function {#docs:current:core_extensions:inet::broadcast-function}

Computes the broadcast address for the address's network.

```sql
CREATE TABLE tbl (cidr INET);
INSERT INTO tbl VALUES
    ('192.168.1.5/24'),
    ('127.0.0.1'),
    ('2001:db8:3c4d:15::1a2f:1a2b/96');
SELECT cidr, broadcast(cidr) AS broadcast FROM tbl;
```



|              cidr              |           broadcast            |
|--------------------------------|--------------------------------|
| 192.168.1.5/24                 | 192.168.1.255/24               |
| 127.0.0.1                      | 127.0.0.1                      |
| 2001:db8:3c4d:15::1a2f:1a2b/96 | 2001:db8:3c4d:15::ffff:ffff/96 |

#### `<<=` Predicate {#docs:current:core_extensions:inet::-predicate}

Is subnet contained by or equal to subnet?

```sql
CREATE TABLE tbl (cidr INET);
INSERT INTO tbl VALUES
    ('192.168.1.0/24'),
    ('127.0.0.1'),
    ('2001:db8:3c4d:15::1a2f:1a2b/96');
SELECT cidr, INET '192.168.1.5/32' <<= cidr AS subnet_contained FROM tbl;
```



|              cidr              | subnet_contained |
|--------------------------------|------------------|
| 192.168.1.0/24                 | true             |
| 127.0.0.1                      | false            |
| 2001:db8:3c4d:15::1a2f:1a2b/96 | false            |

#### `>>=` Predicate {#docs:current:core_extensions:inet::-predicate}

Does subnet contain or equal subnet?

```sql
CREATE TABLE tbl (cidr INET);
INSERT INTO tbl VALUES
    ('192.168.1.0/24'),
    ('127.0.0.1'),
    ('2001:db8:3c4d:15::1a2f:1a2b/96');
SELECT cidr, INET '192.168.0.0/16' >>= cidr AS subnet_contains FROM tbl;
```



|              cidr              | subnet_contains |
|--------------------------------|-----------------|
| 192.168.1.0/24                 | true            |
| 127.0.0.1                      | false           |
| 2001:db8:3c4d:15::1a2f:1a2b/96 | false           |

#### HTML Escape and Unescape Functions {#docs:current:core_extensions:inet::html-escape-and-unescape-functions}

```sql
SELECT html_escape('&');
```

```text
┌──────────────────┐
│ html_escape('&') │
│     varchar      │
├──────────────────┤
│ &amp;            │
└──────────────────┘
```

```sql
SELECT html_unescape('&amp;');
```

```text
┌────────────────────────┐
│ html_unescape('&amp;') │
│        varchar         │
├────────────────────────┤
│ &                      │
└────────────────────────┘
```

## jemalloc Extension {#docs:current:core_extensions:jemalloc}

The `jemalloc` extension replaces the system's memory allocator with [jemalloc](https://jemalloc.net/).
Unlike other DuckDB extensions, the `jemalloc` extension is statically linked and cannot be installed or loaded during runtime.

#### Operating System Support {#docs:current:core_extensions:jemalloc::operating-system-support}

The availability of the `jemalloc` extension depends on the operating system.

##### Linux {#docs:current:core_extensions:jemalloc::linux}

Linux distributions of DuckDB ship with the `jemalloc` extension.
To disable the `jemalloc` extension, [build DuckDB from source](#docs:current:dev:building:overview) and set the `SKIP_EXTENSIONS` flag as follows:

```batch
GEN=ninja SKIP_EXTENSIONS="jemalloc" make
```

##### macOS {#docs:current:core_extensions:jemalloc::macos}

The macOS version of DuckDB does not ship with the `jemalloc` extension but can be [built from source](#docs:current:dev:building:macos) to include it:

```batch
GEN=ninja BUILD_JEMALLOC=1 make
```

##### Windows {#docs:current:core_extensions:jemalloc::windows}

On Windows, this extension is not available.

#### Configuration {#docs:current:core_extensions:jemalloc::configuration}

##### Environment Variables {#docs:current:core_extensions:jemalloc::environment-variables}

The jemalloc allocator in DuckDB can be configured via the `DUCKDB_JE_MALLOC_CONF` environment variable. Setting this is equivalent to setting the [`MALLOC_CONF` environment variable](https://jemalloc.net/jemalloc.3.html#environment) for jemalloc but DuckDB uses a different environment variable name to avoid potential name clashes with other applications.

##### Background Threads {#docs:current:core_extensions:jemalloc::background-threads}

By default, jemalloc's [background threads](https://jemalloc.net/jemalloc.3.html#background_thread) are disabled. To enable them, use the following configuration option:

```sql
SET allocator_background_threads = true;
```

Background threads asynchronously purge outstanding allocations so that this doesn't have to be done synchronously by the foreground threads. This improves allocation performance, and should be noticeable in allocation-heavy workloads, especially on many-core CPUs.

## Lance Extension {#docs:current:core_extensions:lance}

The `lance` extension adds support for reading and writing Lance tables. [Lance](https://github.com/lance-format/lance/) is a modern lakehouse format optimized for ML/AI workloads, with native cloud storage support.

#### Installing and Loading {#docs:current:core_extensions:lance::installing-and-loading}

You can install the `lance` extension from DuckDB's core extensions repository and load it using the following commands:

```sql
INSTALL lance;
LOAD lance;
```

#### Usage {#docs:current:core_extensions:lance::usage}

- [Full SQL reference](https://github.com/lance-format/lance-duckdb/blob/main/docs/sql.md)
- [Cloud storage reference](https://github.com/lance-format/lance-duckdb/blob/main/docs/cloud.md)

##### Query a Lance Dataset {#docs:current:core_extensions:lance::query-a-lance-dataset}

Local file:

```sql
SELECT *
FROM '⟨path/to/dataset.lance⟩'
LIMIT 10;
```

S3:

```sql
SELECT *
FROM 's3://⟨bucket/path/to/out.lance⟩'
LIMIT 10;
```

To access object store URIs (e.g., `s3://...`), configure a `TYPE lance` secret using the [Secrets Manager](#docs:current:sql:statements:create_secret):

```sql
CREATE SECRET (
    TYPE lance,
    PROVIDER credential_chain,
    SCOPE 's3://bucket/'
);

SELECT *
FROM 's3://⟨bucket/path/to/out.lance⟩'
LIMIT 10;
```

##### Write a Lance Dataset {#docs:current:core_extensions:lance::write-a-lance-dataset}

Use the [`COPY ... TO ...` statement](#docs:current:sql:statements:copy::copy--to) to materialize query results as a Lance dataset.

```sql
-- Create/overwrite a Lance dataset from a query
COPY (
    SELECT 1::BIGINT AS id, 'a'::VARCHAR AS s
    UNION ALL
    SELECT 2::BIGINT AS id, 'b'::VARCHAR AS s
) TO '⟨path/to/dataset.lance⟩' (
    FORMAT lance,
    MODE 'overwrite'
);

-- Read it back via the replacement scan
SELECT count(*) FROM '⟨path/to/dataset.lance⟩';

-- Append more rows to an existing dataset
COPY (
    SELECT 3::BIGINT AS id, 'c'::VARCHAR AS s
) TO '⟨path/to/dataset.lance⟩' (
    FORMAT lance,
    MODE 'append'
);

-- Optionally create an empty dataset (schema only)
COPY (
    SELECT 1::BIGINT AS id, 'x'::VARCHAR AS s
    WITH NO DATA
) TO 'path/to/empty.lance' (
    FORMAT lance,
    MODE 'overwrite',
    WRITE_EMPTY_FILE true
);
```

To write to `s3://...` paths, configure a `TYPE lance` secret for that scope using the [Secrets Manager](#docs:current:sql:statements:create_secret):

```sql
CREATE SECRET (
    TYPE lance,
    PROVIDER credential_chain,
    SCOPE 's3://⟨bucket⟩/'
);

COPY (SELECT 1 AS id)
TO 's3://⟨bucket/path/to/out.lance⟩'
(FORMAT lance, MODE 'overwrite');
```

##### Create a Lance Dataset via `CREATE TABLE` (Directory Namespace) {#docs:current:core_extensions:lance::create-a-lance-dataset-via-create-table-directory-namespace}

When you `ATTACH` a directory as a Lance namespace, you can create new datasets using `CREATE TABLE` (schema-only)
or `CREATE TABLE AS SELECT` (CTAS). The dataset is written to `⟨namespace_root⟩/⟨table_name.lance⟩`{:.language-sql .highlight}.

```sql
ATTACH '⟨path/to/dir⟩' AS lance_ns (TYPE lance);

-- Schema-only (creates an empty dataset)
CREATE TABLE lance_ns.main.my_empty (id BIGINT, s VARCHAR);

-- CTAS (writes query results)
CREATE TABLE lance_ns.main.my_dataset AS
    SELECT 1::BIGINT AS id, 'a'::VARCHAR AS s
    UNION ALL
    SELECT 2::BIGINT AS id, 'b'::VARCHAR AS s;

SELECT count(*) FROM lance_ns.main.my_dataset;
```

##### Vector Search {#docs:current:core_extensions:lance::vector-search}

```sql
-- Search a vector column, returning distances in `_distance` (smaller is closer)
SELECT id, label, _distance
FROM lance_vector_search(
    '⟨path/to/dataset.lance⟩', 'vec',
    [0.1, 0.2, 0.3, 0.4]::FLOAT[4],
    k = 5,
    prefilter = true
)
ORDER BY _distance ASC;
```

See the [SQL reference for full parameter documentation](https://github.com/lance-format/lance-duckdb/blob/main/docs/sql.md#search).

##### Full-Text Search (FTS) {#docs:current:core_extensions:lance::full-text-search-fts}

```sql
-- Search a text column, returning BM25-like scores in `_score`
SELECT id, text, _score
FROM lance_fts(
    '⟨path/to/dataset.lance⟩',
    'text',
    'puppy',
    k = 10,
    prefilter = true
)
ORDER BY _score DESC;
```

See the [SQL reference for full parameter documentation](https://github.com/lance-format/lance-duckdb/blob/main/docs/sql.md#search).

##### Hybrid Search (Vector + FTS) {#docs:current:core_extensions:lance::hybrid-search-vector--fts}

```sql
-- Combine vector and text scores, returning `_hybrid_score` in addition to `_distance` / `_score`
SELECT id, _hybrid_score, _distance, _score
FROM lance_hybrid_search('⟨path/to/dataset.lance⟩',
                         'vec', [0.1, 0.2, 0.3, 0.4]::FLOAT[4],
                         'text', 'puppy',
                         k = 10, prefilter = false,
                         alpha = 0.5, oversample_factor = 4)
ORDER BY _hybrid_score DESC;
```

#### Limitations {#docs:current:core_extensions:lance::limitations}

The `lance` extension is currently available for the following [platforms](#docs:lts:dev:building:overview::supported-platforms):

- `linux_amd64`
- `linux_arm64`
- `osx_arm64`
- `windows_amd64`

## MySQL Extension {#docs:current:core_extensions:mysql}

The `mysql` extension allows DuckDB to directly read and write data from/to a running MySQL instance. The data can be queried directly from the underlying MySQL database. Data can be loaded from MySQL tables into DuckDB tables, or vice versa.

#### Installing and Loading {#docs:current:core_extensions:mysql::installing-and-loading}

To install the `mysql` extension, run:

```sql
INSTALL mysql;
```

The extension is loaded automatically upon first use. If you prefer to load it manually, run:

```sql
LOAD mysql;
```

#### Reading Data from MySQL {#docs:current:core_extensions:mysql::reading-data-from-mysql}

To make a MySQL database accessible to DuckDB use the `ATTACH` command with the `mysql` or the `mysql_scanner` type:

```sql
ATTACH 'host=localhost user=root port=0 database=mysql' AS mysqldb (TYPE mysql);
USE mysqldb;
```

##### Configuration {#docs:current:core_extensions:mysql::configuration}

The connection string determines the parameters for how to connect to MySQL as a set of `key=value` pairs. Any options not provided are replaced by their default values, as per the table below. Connection information can also be specified with [environment variables](https://dev.mysql.com/doc/refman/8.3/en/environment-variables.html). If no option is provided explicitly, the MySQL extension tries to read it from an environment variable.



| Setting     | Default        | Environment variable |
|-------------|----------------|----------------------|
| database    | NULL           | MYSQL_DATABASE       |
| host        | localhost      | MYSQL_HOST           |
| password    |                | MYSQL_PWD            |
| port        | 0              | MYSQL_TCP_PORT       |
| socket      | NULL           | MYSQL_UNIX_PORT      |
| user        | _current user_ | MYSQL_USER           |
| ssl_mode    | preferred      |                      |
| ssl_ca      |                |                      |
| ssl_capath  |                |                      |
| ssl_cert    |                |                      |
| ssl_cipher  |                |                      |
| ssl_crl     |                |                      |
| ssl_crlpath |                |                      |
| ssl_key     |                |                      |

##### Configuring via Secrets {#docs:current:core_extensions:mysql::configuring-via-secrets}

MySQL connection information can also be specified with [secrets](https://duckdb.org/docs/configuration/secrets_manager). The following syntax can be used to create a secret.

```sql
CREATE SECRET (
    TYPE mysql,
    HOST '127.0.0.1',
    PORT 0,
    DATABASE mysql,
    USER 'mysql',
    PASSWORD ''
);
```

The information from the secret will be used when `ATTACH` is called. We can leave the connection string empty to use all of the information stored in the secret.

```sql
ATTACH '' AS mysql_db (TYPE mysql);
```

We can use the connection string to override individual options. For example, to connect to a different database while still using the same credentials, we can override only the database name in the following manner.

```sql
ATTACH 'database=my_other_db' AS mysql_db (TYPE mysql);
```

By default, created secrets are temporary. Secrets can be persisted using the [`CREATE PERSISTENT SECRET` command](#docs:current:configuration:secrets_manager::persistent-secrets). Persistent secrets can be used across sessions.

###### Managing Multiple Secrets {#docs:current:core_extensions:mysql::managing-multiple-secrets}

Named secrets can be used to manage connections to multiple MySQL database instances. Secrets can be given a name upon creation.

```sql
CREATE SECRET mysql_secret_one (
    TYPE mysql,
    HOST '127.0.0.1',
    PORT 0,
    DATABASE mysql,
    USER 'mysql',
    PASSWORD ''
);
```

The secret can then be explicitly referenced using the `SECRET` parameter in the `ATTACH`.

```sql
ATTACH '' AS mysql_db_one (TYPE mysql, SECRET mysql_secret_one);
```

##### SSL Connections {#docs:current:core_extensions:mysql::ssl-connections}

The [`ssl` connection parameters](https://dev.mysql.com/doc/refman/8.4/en/using-encrypted-connections.html) can be used to make SSL connections. Below is a description of the supported parameters.

| Setting     | Description                                                                                                                                      |
|-------------|--------------------------------------------------------------------------------------------------------------------------------------------------|
| ssl_mode    | The security state to use for the connection to the server: `disabled, required, verify_ca, verify_identity or preferred` (default: `preferred`) |
| ssl_ca      | The path name of the Certificate Authority (CA) certificate file                                                                                 |
| ssl_capath  | The path name of the directory that contains trusted SSL CA certificate files                                                                    |
| ssl_cert    | The path name of the client public key certificate file                                                                                          |
| ssl_cipher  | The list of permissible ciphers for SSL encryption                                                                                               |
| ssl_crl     | The path name of the file containing certificate revocation lists                                                                                |
| ssl_crlpath | The path name of the directory that contains files containing certificate revocation lists                                                       |
| ssl_key     | The path name of the client private key file                                                                                                     |

##### Reading MySQL Tables {#docs:current:core_extensions:mysql::reading-mysql-tables}

The tables in the MySQL database can be read as if they were normal DuckDB tables, but the underlying data is read directly from MySQL at query time.

```sql
SHOW ALL TABLES;
```



|      name       |
|-----------------|
| signed_integers |

```sql
SELECT * FROM signed_integers;
```



|  t   |   s    |    m     |      i      |          b           |
|-----:|-------:|---------:|------------:|---------------------:|
| -128 | -32768 | -8388608 | -2147483648 | -9223372036854775808 |
| 127  | 32767  | 8388607  | 2147483647  | 9223372036854775807  |
| NULL | NULL   | NULL     | NULL        | NULL                 |

It might be desirable to create a copy of the MySQL databases in DuckDB to prevent the system from re-reading the tables from MySQL continuously, particularly for large tables.

Data can be copied over from MySQL to DuckDB using standard SQL, for example:

```sql
CREATE TABLE duckdb_table AS FROM mysqlscanner.mysql_table;
```

#### Writing Data to MySQL {#docs:current:core_extensions:mysql::writing-data-to-mysql}

In addition to reading data from MySQL, create tables, ingest data into MySQL and make other modifications to a MySQL database using standard SQL queries.

This allows you to use DuckDB to, for example, export data that is stored in a MySQL database to Parquet, or read data from a Parquet file into MySQL.

Below is a brief example of how to create a new table in MySQL and load data into it.

```sql
ATTACH 'host=localhost user=root port=0 database=mysqlscanner' AS mysql_db (TYPE mysql);
CREATE TABLE mysql_db.tbl (id INTEGER, name VARCHAR);
INSERT INTO mysql_db.tbl VALUES (42, 'DuckDB');
```

Many operations on MySQL tables are supported. All these operations directly modify the MySQL database, and the result of subsequent operations can then be read using MySQL.
Note that if modifications are not desired, `ATTACH` can be run with the `READ_ONLY` property which prevents making modifications to the underlying database. For example:

```sql
ATTACH 'host=localhost user=root port=0 database=mysqlscanner' AS mysql_db (TYPE mysql, READ_ONLY);
```

#### Supported Operations {#docs:current:core_extensions:mysql::supported-operations}

Below is a list of supported operations.

##### `CREATE TABLE` {#docs:current:core_extensions:mysql::create-table}

```sql
CREATE TABLE mysql_db.tbl (id INTEGER, name VARCHAR);
```

##### `INSERT INTO` {#docs:current:core_extensions:mysql::insert-into}

```sql
INSERT INTO mysql_db.tbl VALUES (42, 'DuckDB');
```

##### `SELECT` {#docs:current:core_extensions:mysql::select}

```sql
SELECT * FROM mysql_db.tbl;
```

| id |  name  |
|---:|--------|
| 42 | DuckDB |

##### `COPY` {#docs:current:core_extensions:mysql::copy}

```sql
COPY mysql_db.tbl TO 'data.parquet';
COPY mysql_db.tbl FROM 'data.parquet';
```

You may also create a full copy of the database using the [`COPY FROM DATABASE` statement](#docs:current:sql:statements:copy::copy-from-database--to):

```sql
COPY FROM DATABASE mysql_db TO my_duckdb_db;
```

##### `UPDATE` {#docs:current:core_extensions:mysql::update}

```sql
UPDATE mysql_db.tbl
SET name = 'Woohoo'
WHERE id = 42;
```

##### `DELETE` {#docs:current:core_extensions:mysql::delete}

```sql
DELETE FROM mysql_db.tbl
WHERE id = 42;
```

##### `ALTER TABLE` {#docs:current:core_extensions:mysql::alter-table}

```sql
ALTER TABLE mysql_db.tbl
ADD COLUMN k INTEGER;
```

##### `DROP TABLE` {#docs:current:core_extensions:mysql::drop-table}

```sql
DROP TABLE mysql_db.tbl;
```

##### `CREATE VIEW` {#docs:current:core_extensions:mysql::create-view}

```sql
CREATE VIEW mysql_db.v1 AS SELECT 42;
```

##### `CREATE SCHEMA` and `DROP SCHEMA` {#docs:current:core_extensions:mysql::create-schema-and-drop-schema}

```sql
CREATE SCHEMA mysql_db.s1;
CREATE TABLE mysql_db.s1.integers (i INTEGER);
INSERT INTO mysql_db.s1.integers VALUES (42);
SELECT * FROM mysql_db.s1.integers;
```

| i  |
|---:|
| 42 |

```sql
DROP SCHEMA mysql_db.s1;
```

##### Transactions {#docs:current:core_extensions:mysql::transactions}

```sql
CREATE TABLE mysql_db.tmp (i INTEGER);
BEGIN;
INSERT INTO mysql_db.tmp VALUES (42);
SELECT * FROM mysql_db.tmp;
```

This returns:

| i  |
|---:|
| 42 |

```sql
ROLLBACK;
SELECT * FROM mysql_db.tmp;
```

This returns an empty table.

> The DDL statements are not transactional in MySQL.

#### Running SQL Queries in MySQL {#docs:current:core_extensions:mysql::running-sql-queries-in-mysql}

##### The `mysql_query` Table Function {#docs:current:core_extensions:mysql::the-mysql_query-table-function}

The `mysql_query` table function allows you to run arbitrary read queries within an attached database. `mysql_query` takes the name of the attached MySQL database to execute the query in, as well as the SQL query to execute. The result of the query is returned. Single-quote strings are escaped by repeating the single quote twice.

```sql
mysql_query(attached_database::VARCHAR, query::VARCHAR)
```

For example:

```sql
ATTACH 'host=localhost database=mysql' AS mysqldb (TYPE mysql);
SELECT * FROM mysql_query('mysqldb', 'SELECT * FROM cars LIMIT 3');
```

##### The `mysql_execute` Function {#docs:current:core_extensions:mysql::the-mysql_execute-function}

The `mysql_execute` function allows running arbitrary queries within MySQL, including statements that update the schema and content of the database.

```sql
ATTACH 'host=localhost database=mysql' AS mysqldb (TYPE mysql);
CALL mysql_execute('mysqldb', 'CREATE TABLE my_table (i INTEGER)');
```

#### Settings {#docs:current:core_extensions:mysql::settings}

|                 Name                 |                          Description                           |  Default  |
|--------------------------------------|----------------------------------------------------------------|-----------|
| `mysql_bit1_as_boolean`              | Whether or not to convert `BIT(1)` columns to `BOOLEAN`        | `true`    |
| `mysql_debug_show_queries`           | DEBUG SETTING: print all queries sent to MySQL to stdout       | `false`   |
| `mysql_experimental_filter_pushdown` | Whether or not to use filter pushdown (currently experimental) | `false`   |
| `mysql_tinyint1_as_boolean`          | Whether or not to convert `TINYINT(1)` columns to `BOOLEAN`    | `true`    |

#### Schema Cache {#docs:current:core_extensions:mysql::schema-cache}

To avoid having to continuously fetch schema data from MySQL, DuckDB keeps schema information – such as the names of tables, their columns, etc. – cached. If changes are made to the schema through a different connection to the MySQL instance, such as new columns being added to a table, the cached schema information might be outdated. In this case, the function `mysql_clear_cache` can be executed to clear the internal caches.

```sql
CALL mysql_clear_cache();
```

## ODBC {#core_extensions:odbc}

### ODBC Extension {#docs:current:core_extensions:odbc:overview}

The `odbc_scanner` extension allows connecting to other databases (using their [ODBC drivers](https://en.wikipedia.org/wiki/Open_Database_Connectivity)) and run queries with the [`odbc_query`](#docs:current:core_extensions:odbc:functions::odbc_query) or copy data from DuckDB with the [`odbc_copy`](#docs:current:core_extensions:odbc:functions::odbc_copy) functions.
The extension is also available under the alias `odbc`.

current_duckdb_version
#### Installing and Loading {#docs:current:core_extensions:odbc:overview::installing-and-loading}

> On Linux and macOS the extension requires [unixODBC](https://en.wikipedia.org/wiki/UnixODBC) driver manager to be installed.
> See [below](#::installing-unixodbc-driver-manager-on-linux-or-macos) for installation instructions.

The extension can be installed automatically, but needs to be loaded manually with:

```sql
LOAD odbc;
```

#### Usage Example {#docs:current:core_extensions:odbc:overview::usage-example}

```sql
-- load extension
LOAD odbc;

-- open ODBC connection to a remote DB
SET VARIABLE conn = odbc_connect('Driver={Oracle driver};DBQ=//127.0.0.1:1521/XE', 'scott', 'tiger');

-- simple query
FROM odbc_query(getvariable('conn'), 'SELECT SYSTIMESTAMP FROM DUAL');

-- query with parameters
FROM odbc_query(getvariable('conn') 
    'SELECT CAST(? AS NVARCHAR2(2)) || CAST(? AS VARCHAR2(5)) FROM DUAL',
    params=row('🦆', 'quack'));

-- copy data into remote DB
FROM odbc_copy(getvariable('conn'),
    source_file='https://blobs.duckdb.org/nl_stations.csv',
    dest_table='NL_TRAIN_STATIONS',
    create_table=TRUE);

-- close connection
SELECT odbc_close(getvariable('conn'));
```

#### Installing the Nightly Version {#docs:current:core_extensions:odbc:overview::installing-the-nightly-version}

ODBC extension is built using the version-independent DuckDB C API. The same binary (for the specific platform, for example: `windows_amd64`) can be installed and loaded on DuckDB version `1.2.0` or any newer version.

Binaries with the most recent changes, that are published to the DuckDB nightly repository, can be installed the following way:

```sql
INSTALL 'http://nightly-extensions.duckdb.org/v1.5.2/⟨platform⟩/odbc_scanner.duckdb_extension.gz';
```

> The URL with the version `1.2.0` in it should be used even if you are running later version of DuckDB.

Where the `⟨platform⟩`{:.language-sql .highlight} is one of:

- `linux_amd64`
- `linux_arm64`
- `linux_amd64_musl`
- `linux_arm64_musl`
- `osx_amd64`
- `osx_arm64`
- `windows_amd64`
- `windows_arm64`

To update installed extension to the latest nightly version run:

```sql
FORCE INSTALL 'http://nightly-extensions.duckdb.org/v1.5.2/⟨platform⟩/odbc_scanner.duckdb_extension.gz';
```

Installed version (commit ID) can be checked using the following query:

```sql
FROM duckdb_extensions()
WHERE extension_name = 'odbc_scanner';
```

To install a version built from a specific commit run:

```sql
FORCE INSTALL 'http://nightly-extensions.duckdb.org/odbc_scanner/⟨7_character_commit_id⟩/1.5.2/⟨platform⟩/odbc_scanner.duckdb_extension.gz';
```

#### Support Status of DBMS-Specific Types {#docs:current:core_extensions:odbc:overview::support-status-of-dbms-specific-types}

Tier 1:

- Oracle: [types coverage status](https://github.com/duckdb/odbc-scanner/tree/main/test/sql/oracle/README.md)
- SQL Server: [types coverage status](https://github.com/duckdb/odbc-scanner/blob/main/test/sql/mssql/README.md)
- DB2: [types coverage status](https://github.com/duckdb/odbc-scanner/blob/main/test/sql/db2/README.md)

Tier 2:

- PostgreSQL: basic types covered
- MySQL/MariaDB: basic types covered
- Firebird: [types coverage status](https://github.com/duckdb/odbc-scanner/blob/main/test/sql/firebird/README.md)

Tier 3:

- Snowflake: [types coverage status](https://github.com/duckdb/odbc-scanner/blob/main/test/sql/snowflake/README.md)
- ClickHouse: basic types covered
- Spark: basic types covered
- Arrow Flight SQL: basic types covered

#### Installing unixODBC Driver Manager on Linux or macOS {#docs:current:core_extensions:odbc:overview::installing-unixodbc-driver-manager-on-linux-or-macos}

On Linux `unixODBC` can be installed using the system package manager. Depending on the Linux distribution one of the following installation commands can be used.

Debian, Ubuntu:

```bash
sudo apt-get install unixodbc
```

RHEL, Alma, Rocky, Amazon, Fedora:

```bash
sudo dnf install unixODBC
```

Alpine:

```sh
sudo apk add unixodbc
```

On macOS unixODBC can be installed using the [Homebrew package manager](https://en.wikipedia.org/wiki/Homebrew_(package_manager)):

```bash
brew install unixodbc
```

To use legacy `x86_64` ODBC drivers under the [Rosetta](https://en.wikipedia.org/wiki/Rosetta_(software)) translator, the unixODBC must
be installed using the `x86_64` version of Homebrew:

```bash
arch -x86_64 /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
/usr/local/bin/brew install unixodbc
```

#### Connection String Examples {#docs:current:core_extensions:odbc:overview::connection-string-examples}

ODBC connection can be established using a data source name in a form `DSN=data_source1_name` or without a configured data source in a form `Driver={Driver name};parameter1=values1;...`.

[`odbc_list_drivers`](#docs:current:core_extensions:odbc:functions::odbc_list_drivers) and [`odbc_list_data_sources`](#docs:current:core_extensions:odbc:functions::odbc_list_data_sources) functions can be used to find out available drivers and data sources.

Example of connection strings without a configured data source:

Oracle: 
 
```
Driver={Oracle in instantclient_23_0};DBQ=//127.0.0.1:1521/XE;UID=scott;PWD=tiger;
```

SQL Server: 
 
```
Driver={ODBC Driver 18 for SQL Server};Server=tcp:127.0.0.1,1433;UID=sa;PWD=pwd;TrustServerCertificate=Yes;Database=test_db;
```

DB2: 

```
Driver={IBM DB2 ODBC DRIVER};HostName=127.0.0.1;Port=50000;Database=testdb;UID=db2inst1;PWD=pwd;
```

PostgreSQL: 

```
Driver={PostgreSQL Unicode};Server=127.0.0.1;Port=5432;Username=postgres;Password=postgres;Database=test_db;
```

MySQL/MariaDB: 

```
Driver={MariaDB ODBC 3.1 Driver};SERVER=127.0.0.1;PORT=3306;USER=root;PASSWORD=root;DATABASE=test_db;
```

Firebird: 

```
Driver={Firebird ODBC Driver};Database=127.0.0.1/3050:C:/path/to/test.fdb;UID=SYSDBA;PWD=pwd;CHARSET=UTF8;
```

Snowflake: 

```
Driver={SnowflakeDSIIDriver};Server=foobar-ab12345.snowflakecomputing.com;Database=SNOWFLAKE_SAMPLE_DATA;UID=username;PWD=pwd;
```

ClickHouse: 

```
Driver={ClickHouse ODBC Driver (Unicode)};Server=127.0.0.1;Port=8123;
```

Spark: 

```
Driver={Simba Spark ODBC Driver};Host=127.0.0.1;Port=10000;
```

Arrow Flight SQL (Dremio ODBC + GizmoSQL): 

```
Driver={Dremio Flight SQL ODBC Driver};Host=127.0.0.1;Port=31337;UID=gizmosql_username;PWD=gizmosql_password;useEncryption=true;
```

#### Query Parameters {#docs:current:core_extensions:odbc:overview::query-parameters}

When a DuckDB query is run using prepared statement, it is possible to pass input parameters from the client code. The extension allows to forward such input parameters over ODBC API to the queries to remote databases.

2 methods of passing query parameters are supported, using either `params` or `params_handle` named argument to [`odbc_query`](#docs:current:core_extensions:odbc:functions::odbc_query) function.

`params` argument takes a `STRUCT` value as an input. Struct field names are ignored, so the `row()` function can be used to create a `STRUCT` value inline:

```sql
FROM odbc_query(
  getvariable('conn'),
  '
    SELECT CAST(? AS VARCHAR2(3)) || CAST(? AS VARCHAR2(3)) FROM DUAL
  ', 
  params=row(?, ?))
```

If we prepare this query with `duckdb_prepare()`, bind `foo` and `bar` `VARCHAR` values to it with `duckdb_bind_value()` and
execute it with `duckdb_execute_prepared()` - the input parameters `foo` and `bar` will be forwarded to the ODBC query in the remote DB.

The problem with this approach, is that DuckDB is unable to resolve parameter types (specified in the outer query) before `duckdb_execute_prepared()` is called - such types may be different in subsequent invocations of `duckdb_execute_prepared()` and there is no way to specify these types explicitly.

This will result in re-preparing the inner query in remote DB every time `duckdb_execute_prepared()` is called.

To avoid this problem is it possible to use 2-step parameter binding with `params_handle` named argument to [`odbc_query`](#docs:current:core_extensions:odbc:functions::odbc_query):

```sql
-- create parameters handle
SET VARIABLE params = odbc_create_params();

-- when 'duckdb_prepare()' is called, the inner query will be prepared in the remote DB
FROM odbc_query(
  getvariable('conn'),
  '
    SELECT CAST(? AS VARCHAR2(3)) || CAST(? AS VARCHAR2(3)) FROM DUAL
  ', 
  params_handle=getvariable('params'));

-- now we can repeatedly bind new parameters to the handle using 'odbc_bind_params()'
-- and call 'duckdb_execute_prepared()' to run the prepared query with
-- these new parameters in remote DB
SELECT odbc_bind_params(getvariable('conn'), getvariable('params'), row(?, ?));
```

Parameter handle is tied to the prepared statement and will be freed when the statement is destroyed.

#### Connections and Concurrency {#docs:current:core_extensions:odbc:overview::connections-and-concurrency}

DuckDB uses a multi-threaded execution engine to run parts of the query in parallel. ODBC drivers may or may not support
using the same connection from different threads concurrently. To prevent possible concurrency problems the extension does not
allow to use the same connection from multiple threads. For example, the following query:

```sql
FROM odbc_query(getvariable('conn'), 'SELECT ''foo'' col1 FROM DUAL')
UNION ALL
FROM odbc_query(getvariable('conn'), 'SELECT ''bar'' col1 FROM DUAL')
```

will fail with:

```
Invalid Input Error:
'odbc_query' error: ODBC connection not found on global init, id: 139760181976192
```

This can be avoided by using multiple ODBC connections:

```sql
FROM odbc_query(getvariable('conn1'), 'SELECT ''foo'' col1 FROM DUAL')
UNION ALL
FROM odbc_query(getvariable('conn2'), 'SELECT ''bar'' col1 FROM DUAL')
```

Or by disabling multi-threaded execution setting `threads` DuckDB option to `1`.

#### Transaction Management {#docs:current:core_extensions:odbc:overview::transaction-management}

According to ODBC specification, connections to remote DB are expected to have auto-commit mode enabled by default.

As a general rule, transaction commands `BEGIN TRANSACTION`/`COMMIT`/`ROLLBACK` are not supposed to be sent over ODBC as SQL commands. Doing so may or may not be supported by the particular driver. Instead ODBC provides the API to manage transactions.

This API is exposed in the following functions:

 - [`odbc_begin_transaction`](#docs:current:core_extensions:odbc:functions::odbc_begin_transaction)
 - [`odbc_commit`](#docs:current:core_extensions:odbc:functions::odbc_commit)
 - [`odbc_rollback`](#docs:current:core_extensions:odbc:functions::odbc_rollback)

When [`odbc_begin_transaction`](#docs:current:core_extensions:odbc:functions::odbc_begin_transaction) is called on the connection, the auto-commit mode on this connection is disabled and an implicit transaction is started. There is currently no support for enabling auto-commit back on such connection.

After the transaction is started, call [`odbc_commit`](#docs:current:core_extensions:odbc:functions::odbc_commit) or [`odbc_rollback`](#docs:current:core_extensions:odbc:functions::odbc_rollback) to complete this transaction. After the completion is performed, new implicit transaction is started on this connection automatically.

#### Performance {#docs:current:core_extensions:odbc:overview::performance}

ODBC is not a high-performance API, [`odbc_query`](#docs:current:core_extensions:odbc:functions::odbc_query) uses multiple API calls per-row and performs `UCS-2` to `UTF-8` conversion for every `VARCHAR` value. Besides that, query processing is strictly single-threaded.

When [submitting issues](https://github.com/duckdb/odbc-scanner/issues) related only to performance please check the performance in comparable scenarios, for example with [pyodbc](https://pypi.org/project/pyodbc/).

### ODBC Extension Functions {#docs:current:core_extensions:odbc:functions}

- [`odbc_begin_transaction`](#::odbc_begin_transaction)
- [`odbc_bind_params`](#::odbc_bind_params)
- [`odbc_close`](#::odbc_close)
- [`odbc_commit`](#::odbc_commit)
- [`odbc_connect`](#::odbc_connect)
- [`odbc_copy`](#::odbc_copy)
- [`odbc_create_params`](#::odbc_create_params)
- [`odbc_list_data_sources`](#::odbc_list_data_sources)
- [`odbc_list_drivers`](#::odbc_list_drivers)
- [`odbc_query`](#::odbc_query)
- [`odbc_rollback`](#::odbc_rollback)

##### odbc_begin_transaction {#docs:current:core_extensions:odbc:functions::odbc_begin_transaction}

```sql
odbc_begin_transaction(conn_handle BIGINT) -> VARCHAR
```

Sets the `SQL_ATTR_AUTOCOMMIT` attribute to `SQL_AUTOCOMMIT_OFF` on the specified connection thus effectively starting an implicit transaction. [`odbc_commit`](#::odbc_commit) or [`odbc_rollback`](#::odbc_rollback) must be called on such connection to complete the transaction. The completion starts another implicit transaction on this connection. See [Transactions management](#overview::transactions-management) for details.

###### Parameters: {#docs:current:core_extensions:odbc:functions::parameters}

 - `conn_handle` (` BIGINT`): ODBC connection handle created with [`odbc_connect`](#::odbc_connect)

###### Returns: {#docs:current:core_extensions:odbc:functions::returns}

Always returns `NULL` (` VARCHAR`).

###### Example: {#docs:current:core_extensions:odbc:functions::example}

```sql
SELECT odbc_begin_transaction(getvariable('conn'))
```

##### odbc_bind_params {#docs:current:core_extensions:odbc:functions::odbc_bind_params}

```sql
odbc_bind_params(conn_handle BIGINT, params_handle BIGINT, params STRUCT) -> BIGINT
```

Binds specified parameter values to the specified parameters handle. Only necessary with 2-step parameters binding, see [Query parameters](#::query-parameters) for details.

###### Parameters: {#docs:current:core_extensions:odbc:functions::parameters}

 - `conn_handle` (` BIGINT`): ODBC connection handle created with [`odbc_connect`](#::odbc_connect)
 - `params_handle` (` BIGINT`): parameters handle created with [`odbc_create_params`](#::odbc_create_params)
 - `params` (` STRUCT`): parameters values

###### Returns: {#docs:current:core_extensions:odbc:functions::returns}

Parameters handle (` BIGINT`), the same one that was passed as a second argument.

###### Example: {#docs:current:core_extensions:odbc:functions::example}

```sql
SELECT odbc_bind_params(getvariable('conn'), getvariable('params1'), row(42, 'foo'))
```

##### odbc_close {#docs:current:core_extensions:odbc:functions::odbc_close}

```sql
odbc_close(conn_handle BIGINT) -> VARCHAR
```

Closes specified ODBC connection to a remote DB. Does not throw errors if the connection is already closed.

###### Parameters: {#docs:current:core_extensions:odbc:functions::parameters}

 - `conn_handle` (` BIGINT`): ODBC connection handle created with [`odbc_connect`](#::odbc_connect)

###### Returns: {#docs:current:core_extensions:odbc:functions::returns}

Always returns `NULL` (` VARCHAR`).

###### Example: {#docs:current:core_extensions:odbc:functions::example}

```sql
SELECT odbc_close(getvariable('conn'))
```

##### odbc_commit {#docs:current:core_extensions:odbc:functions::odbc_commit}

```sql
odbc_commit(conn_handle BIGINT) -> VARCHAR
```

Calls `SQLEndTran` with `SQL_COMMIT` argument on the specified connection, completing the current transaction. [`odbc_begin_transaction`](#::odbc_begin_transaction) must be called on this connection before this call for the completion to be effective. See [Transactions management](#overview::transactions-management) for details.

###### Parameters: {#docs:current:core_extensions:odbc:functions::parameters}

 - `conn_handle` (` BIGINT`): ODBC connection handle created with [`odbc_connect`](#::odbc_connect)

###### Returns: {#docs:current:core_extensions:odbc:functions::returns}

Always returns `NULL` (` VARCHAR`).

###### Example: {#docs:current:core_extensions:odbc:functions::example}

```sql
SELECT odbc_commit(getvariable('conn'))
```

##### odbc_connect {#docs:current:core_extensions:odbc:functions::odbc_connect}

```sql
odbc_connect(conn_string VARCHAR) -> BIGINT
```
```sql
odbc_connect(conn_string VARCHAR, username VARCHAR, password VARCHAR) -> BIGINT
```

Opens an ODBC connection to a remote DB.

If `username` and `password` (positional) parameters are specified, they are appended to the connection string as `UID` and `PWD`.

###### Parameters: {#docs:current:core_extensions:odbc:functions::parameters}

 - `conn_string` (` VARCHAR`): ODBC connection string, passed to the Driver Manager.

###### Returns: {#docs:current:core_extensions:odbc:functions::returns}

Connection handle that can be placed into a `VARIABLE`. Connection is not closed automatically, must be closed with [`odbc_close`](#::odbc_close).

###### Example: {#docs:current:core_extensions:odbc:functions::example}

```sql
SET VARIABLE conn = odbc_connect('Driver={Oracle Driver};DBQ=//127.0.0.1:1521/XE;UID=scott;PWD=tiger')
```
```sql
SET VARIABLE conn = odbc_connect('Driver={Oracle Driver};DBQ=//127.0.0.1:1521/XE', 'scott', 'tiger')
```

##### odbc_copy {#docs:current:core_extensions:odbc:functions::odbc_copy}

```sql
odbc_copy(conn_handle BIGINT, [, <optional named parameters>]) -> TABLE
```
```sql
odbc_copy(conn_string VARCHAR, [, <optional named parameters>]) -> TABLE
```

Copies rows from a DuckDB accessible file or table into the remote DB.

> **Warning.** Using `odbc_copy` from [Python Relational API](#docs:current:clients:python:relational_api).
>
> `odbc_copy` is a table function that [returns](#::returns-5) one row for each 2048 copied rows.
> When it is used from Python with `duckdb.sql()` the [Lazy Evaluation](#docs:current:clients:python:relational_api::lazy-evaluation)
> is taking place. Thus no rows will be copied until [a method that triggers execution](#docs:current:clients:python:relational_api::output)
> is called on the resulting relation and all result rows are consumed.

###### Parameters: {#docs:current:core_extensions:odbc:functions::parameters}

 - `conn_handle_or_string` (` BIGINT` or `VARCHAR`), one of:
   - ODBC connection handle created with [`odbc_connect`](#::odbc_connect)
   - ODBC connection string, intended for one-off queries, in this case new ODBC connection will be opened and will be closed automatically after the query is complete

Optional named parameters (source):

> Source query is executed using a separate DB instance from the instance on which `odbc_copy` is being called.
> Thus `source_query` cannot refer to pre-existing in-memory tables and cannot open currently opened DuckDB files.
> As a workaround, for complex source queries it is suggested to export the query result into a local Parquet file first and
> then run `odbc_copy` on that file.

 - `source_conn_string` (` VARCHAR`, default: `:memory:`): DuckDB connection string to the source DB, example: `ducklake:postgres:postgresql://username:pwd@127.0.0.1:5432/lake1`
 - `source_file` (` VARCHAR`): path to a Parquet, CSV or JSON file (remote or local) to be read with DuckDB, example: `https://blobs.duckdb.org/nl_stations.csv`, equivalent to `source_query='SELECT * FROM '<source_file>'`
 - `source_query` (` VARCHAR`): DuckDB SQL query to read the data, example: `FROM nl_train_stations`
 - `source_queries` (` LIST(VARCHAR)`): multiple DuckDB SQL queries executed one by one, last query must return the result set to copy, results of previous queries are discarded, results of all queries are materialized in memory, example:

```sql
source_queries=[
  'CREATE SECRET s (TYPE s3 [...])',
  'FROM nl_train_stations'
],
```

 - `source_limit` (` UBIGINT`, default: `0`): the number of records to read from source query/file at once, when this option is specified - the source query is run multiple times appending `LIMIT <limit> OFFSET <offset>` to it, must be more or equal to `2048`, `2048` must be dividable by it without a remainder

Optional named parameters (destination):

 - `dest_table` (` VARCHAR`): destination table name in remote DB, will be used in `INSERT` and `CREATE TABLE` queries, cannot be specified if `dest_query` is specified; different DBs have different rules regarding case sensitivity and the default case, thus the name of the destination table may need to be specified in upper-case: `TAB1` or in quoted form: `"tab1"` (or with a schema name: `"schema1"."tab1"`) 
 - `dest_query` (` VARCHAR`): query to be executed in remote DB for each source batch, must have the number of ODBC parameter placeholders `?` equal to the `source_columns_count * batch_size`, cannot be specified if `dest_table` is specified, example: `CALL import_city(?,?,?,?)`
 - `dest_query_single` (` VARCHAR`): only used when the `batch_size>0` and the number of rows read in the last source batch are less than the `batch_size`, in this case used instead of `dest_query`, must have the number of ODBC parameter placeholders `?` equal to the `source_columns_count`

Optional named parameters (create table):

 - `create_table` (` BOOLEAN`, default: `FALSE`): whether to create a table in the destination remote DB using the column names and column types from the source query, effectively implements CTAS (create table as select)
 - `column_types` (` MAP(VARCHAR, VARCHAR)`): when `create_table=TRUE` is specified, allows to provide/override the type mapping between source DuckDB types and destination RDBMS types, example:

```sql
create_table=TRUE,
column_types=MAP {
    'DUCKDB_TYPE_VARCHAR': 'VARCHAR2(10)',
    'DUCKDB_TYPE_DECIMAL': 'NUMBER({typmod1},{typmod2})'}
```

 - `column_quotes` (` VARCHAR`, default: `"`): quotation character (or string) to be used to quote column names in the generated `CREATE TABLE` and `INSERT` queries
 - `commit_after_create_table` (` BOOLEAN`, default: `FALSE`): whether to issue a `COMMIT` after executing `CREATE TABLE`, enabled automatically for Firebird

Optional named parameters (query parameters handling):

 - `decimal_params_as_chars` (` BOOLEAN`, default: `false`): pass `DECIMAL` parameters as `VARCHAR`s
 - `integral_params_as_decimals` (` BOOLEAN`, default: `false`): pass (unsigned) `TINYINT`, `SMALLINT`, `INTEGER` and `BIGINT` parameters as `SQL_C_NUMERIC`.

Optional named parameters (other):

 - `batch_size` (` UINTEGER`, default: `16`): number of records to be inserted (or executed in case of `dest_query`) in a single `SQLExecute` ODBC call to remote DB, allowed values: `1`, `2`, `4`, `8`, `16`, `32`, `64`, `128`, `256`, `512`, `1024`, `2048`
 - `use_insert_all` (` BOOLEAN`, default: `FALSE`): generate `INSERT ALL` batch insert query instead of batch insert with `INSERT ... VALUES (...), (...), ... (...)`, enabled automatically for Oracle
 - `use_insert_union` (` BOOLEAN`, default: `FALSE`): generate `INSERT ... SELECT FROM .. UNION ALL ...` batch insert query instead of batch insert with `INSERT ... VALUES (...), (...), ... (...)`, enabled automatically for Firebird
 - `dummy_table_name` (` VARCHAR`): name of the dummy table use for `INSERT ALL` and `INSERT UNION` queries, `dual` for Oracle
 - `copy_in_transaction` (` BOOLEAN`, default: `TRUE`): begin a transaction in remote DB for this copy call, commit transaction when all rows are processed, roll it back on error
 - `max_records_in_transaction` (` UBIGINT`, default: `0`): when specified causes the remote transaction to be committed every time after the specified number of rows is processed
 - `close_connection` (` BOOLEAN`, default: `false`): closes the passed connection after the function call is completed, intended to be used with one-shot invocations of the `odbc_copy`

###### Returns: {#docs:current:core_extensions:odbc:functions::returns}

A table with the following columns:

 - `completed` (` BOOLEAN`): a flag whether this output row is the last row in result set
 - `rows_processed` (` UBIGINT`): a number of rows read from the source
 - `elapsed_seconds` (` FLOAT`): a number of seconds passed after the copy process has started
 - `rows_per_second` (` FLOAT`): a number of rows processed in one second
 - `table_ddl` (` VARCHAR`): generated `CREATE TABLE` query that was executed in remote DB before starting the copy process

 One resulting row is emitted for every `2048` rows read from source. Only the last row has the `completed=TRUE` and non null `table_ddl` (only when `create_table=TRUE` is specified) values.

###### Examples: {#docs:current:core_extensions:odbc:functions::examples}

```sql
FROM odbc_copy(getvariable('conn'),
  source_file='https://blobs.duckdb.org/nl_stations.csv',
  dest_table='NL_TRAIN_STATIONS',
  create_table=TRUE)
```
```sql
FROM odbc_copy(getvariable('conn'),
  source_conn_string='ducklake:postgres:postgresql://username:pwd@127.0.0.1:5432/lake1',
  source_queries=[
    'CREATE SECRET s (TYPE s3 [...])',
    'FROM nl_train_stations'
  ],
  dest_table='NL_TRAIN_STATIONS',
  create_table=TRUE,
  batch_size=32,
  max_records_in_transaction=42);
```
##### odbc_create_params {#docs:current:core_extensions:odbc:functions::odbc_create_params}

```sql
odbc_create_params() -> BIGINT
```

Creates a parameters handle. Only necessary with 2-step parameters binding, see [Query parameters](#overview::query-parameters) for details.

###### Parameters: {#docs:current:core_extensions:odbc:functions::parameters}

None.

###### Returns: {#docs:current:core_extensions:odbc:functions::returns}

Parameters handle (` BIGINT`). When the handle is passed to [`odbc_query`](#::odbc_query) it gets tied to the underlying prepared statement and is closed automatically when the statement is closed.

###### Example: {#docs:current:core_extensions:odbc:functions::example}

```sql
SET VARIABLE params1 = odbc_create_params()
```

##### odbc_list_data_sources {#docs:current:core_extensions:odbc:functions::odbc_list_data_sources}

```sql
odbc_list_data_sources() -> TABLE(name VARCHAR, description VARCHAR, type VARCHAR)
```

Returns the list of ODBC data sources registered in the OS. Uses driver manager call `SQLDataSources`.

###### Parameters: {#docs:current:core_extensions:odbc:functions::parameters}

None.

###### Returns: {#docs:current:core_extensions:odbc:functions::returns}

A table with the following columns:

 - `name` (` VARCHAR`): data source name
 - `description` (` VARCHAR`): data source description
 - `type` (` VARCHAR`): data source type, `USER` or `SYSTEM`

###### Example: {#docs:current:core_extensions:odbc:functions::example}

```sql
FROM odbc_list_data_sources()
```

##### odbc_list_drivers {#docs:current:core_extensions:odbc:functions::odbc_list_drivers}

```sql
odbc_list_drivers() -> TABLE(description VARCHAR, attributes MAP(VARCHAR, VARCHAR))
```

Returns the list of ODBC drivers registered in the OS. Uses driver manager call `SQLDrivers`.

###### Parameters: {#docs:current:core_extensions:odbc:functions::parameters}

None.

###### Returns: {#docs:current:core_extensions:odbc:functions::returns}

A table with the following columns:

 - `description` (` VARCHAR`): driver description
 - `attributes` (` MAP(VARCHAR, VARCHAR)`): driver attributes as a `name->value` map

###### Example: {#docs:current:core_extensions:odbc:functions::example}

```sql
FROM odbc_list_drivers()
```

##### odbc_query {#docs:current:core_extensions:odbc:functions::odbc_query}

```sql
odbc_query(conn_handle BIGINT, query VARCHAR[, <optional named parameters>]) -> TABLE
```
```sql
odbc_query(conn_string VARCHAR, query VARCHAR[, <optional named parameters>]) -> TABLE
```

Runs specified query in a remote DB and returns the query results table.

###### Parameters: {#docs:current:core_extensions:odbc:functions::parameters}

 - `conn_handle_or_string` (` BIGINT` or `VARCHAR`), one of:
   - ODBC connection handle created with [`odbc_connect`](#::odbc_connect)
   - ODBC connection string, intended for one-off queries, in this case new ODBC connection will be opened and will be closed automatically after the query is complete
 - `query` (` VARCHAR`): SQL query, passed to the remote DBMS

Optional named parameters that can be used to pass query parameters:

 - `params` (` STRUCT`): query parameters to pass to remote DBMS
 - `params_handle` (` BIGINT`): parameters handle created with [`odbc_create_params`](#::odbc_create_params). Only used with 2-step parameters binding, see [Query parameters](#::query-parameters) for details.

Optional named parameters that can change types mapping:

The extension supports a number of options that can be used to change how the query parameters are passed and how the resulting data is handled. For known DBs these options are set automatically. They also can be passed as named parameters to [`odbc_query`](#::odbc_query) function to override the autoconfiguration:

 - `decimal_columns_as_chars` (` BOOLEAN`, default: `false`): read `DECIMAL` values as `VARCHAR`s that are parsed back into `DECIMAL`s before returning them to client
 - `decimal_columns_precision_through_ard` (` BOOLEAN`, default: `false`): when reading a `DECIMAL` specify its `precision` and `scale` through "Application Row Descriptor"
 - `decimal_columns_as_ard_type` (` BOOLEAN`, default: `false`): when reading a `DECIMAL` use `SQL_ARD_TYPE` instead of `SQL_C_NUMERIC`
 - `decimal_params_as_chars` (` BOOLEAN`, default: `false`): pass `DECIMAL` parameters as `VARCHAR`s
 - `integral_params_as_decimals` (` BOOLEAN`, default: `false`): pass (unsigned) `TINYINT`, `SMALLINT`, `INTEGER` and `BIGINT` parameters as `SQL_C_NUMERIC`.
 - `reset_stmt_before_execute` (` BOOLEAN`, default: `false`): reset the prepared statement (using `SQLFreeStmt(h, SQL_CLOSE)`) before executing it
 - `time_params_as_ss_time2` (` BOOLEAN`, default: `false`): pass `TIME` parameters as SQL Server's `TIME2` values
 - `timestamp_columns_as_timestamp_ns` (` BOOLEAN`, default: `false`): read `TIMESTAMP`-like (` TIMESTAMP WITH LOCAL TIME ZONE`, `DATETIME2`, `TIMESTAMP_NTZ` etc) columns with nanosecond precision (with nine fractional digits)
 - `timestamp_columns_with_typename_date_as_date` (` BOOLEAN`, default: `false`): read `TIMESTAMP` columns that have a type name `DATE` as DuckDB `DATE`s
 - `timestamp_max_fraction_precision` (` UTINYINT`, default: `9`): maximum number of fractional digits to use when reading a `TIMESTAMP` column with nanosecond precision
 - `timestamp_params_as_sf_timestamp_ntz` (` BOOLEAN`, default: `false`): pass `TIMESTAMP` parameters as Snowflake's `TIMESTAMP_NTZ`
 - `timestamptz_params_as_ss_timestampoffset` (` BOOLEAN`, default: `false`): pass `TIMESTAMP_TZ` parameters as SQL Server's `DATETIMEOFFSET`
 - `var_len_data_single_part` (` BOOLEAN`, default: `false`): read long `VARCHAR` or `VARBINARY` values as a single read (used when a driver does not support [Retrieving Variable-Length Data in Parts](https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqlgetdata-function?view=sql-server-ver17#retrieving-variable-length-data-in-parts))
 - `var_len_params_long_threshold_bytes` (` UINTEGER`, default: `4000`): a length threshold after that `SQL_WVARCHAR` parameters are passed as `SQL_WLONGVARCHAR`
 - `enable_columns_binding` (` BOOLEAN`, default: `false`): whether to allow using `SQLBindCol` instead of `SQLGetData` for fixed-size columns

Other optional named parameters:

 - `ignore_exec_failure` (` BOOLEAN`, default: `false`): when a query, that is run in remote DB, can be prepared successfully, but may or may not fail at execution time (for example, because of schema state like table existence), then this flag can be used to not throw an error when query execution fails. Empty result set is returned if query execution fails.
 - `close_connection` (` BOOLEAN`, default: `false`): closes the passed connection after the function call is completed, intended to be used with one-shot invocations of the `odbc_query`, example:

 ```sql
 FROM odbc_query(
    odbc_connect('Driver={Oracle Driver};DBQ=//127.0.0.1:1521/XE', 'scott', 'tiger'),
    'SELECT 42 FROM dual',
    close_connection=TRUE);
 ```

###### Returns: {#docs:current:core_extensions:odbc:functions::returns}

A table with the query result.

###### Example: {#docs:current:core_extensions:odbc:functions::example}

```sql
FROM odbc_query(getvariable('conn'), 
  'SELECT CAST(? AS NVARCHAR2(2)) || CAST(? AS VARCHAR2(5)) FROM dual',
  params=row('🦆', 'quack')
)
```

##### odbc_rollback {#docs:current:core_extensions:odbc:functions::odbc_rollback}

```sql
odbc_rollback(conn_handle BIGINT) -> VARCHAR
```

Calls `SQLEndTran` with `SQL_ROLLBACK` argument on the specified connection, completing the current transaction. [`odbc_begin_transaction`](#::odbc_begin_transaction) must be called on this connection before this call for the completion to be effective. See [Transactions management](#overview::transactions-management) for details.

###### Parameters: {#docs:current:core_extensions:odbc:functions::parameters}

 - `conn_handle` (` BIGINT`): ODBC connection handle created with [`odbc_connect`](#::odbc_connect)

###### Returns: {#docs:current:core_extensions:odbc:functions::returns}

Always returns `NULL` (` VARCHAR`).

###### Example: {#docs:current:core_extensions:odbc:functions::example}

```sql
SELECT odbc_rollback(getvariable('conn'))
```

## PostgreSQL Extension {#docs:current:core_extensions:postgres}

The `postgres` extension allows DuckDB to directly read and write data from a running PostgreSQL database instance. The data can be queried directly from the underlying PostgreSQL database. Data can be loaded from PostgreSQL tables into DuckDB tables, or vice versa. See the [official announcement](https://duckdb.org/2022/09/30/postgres-scanner) for implementation details and background.

#### Installing and Loading {#docs:current:core_extensions:postgres::installing-and-loading}

The `postgres` extension will be transparently [autoloaded](#docs:current:extensions:overview::autoloading-extensions) on first use from the official extension repository.
If you would like to install and load it manually, run:

```sql
INSTALL postgres;
LOAD postgres;
```

#### Connecting {#docs:current:core_extensions:postgres::connecting}

To make a PostgreSQL database accessible to DuckDB, use the `ATTACH` command with the `postgres` or `postgres_scanner` type.

To connect to the `public` schema of the PostgreSQL instance running on localhost in read-write mode, run:

```sql
ATTACH '' AS postgres_db (TYPE postgres);
```

To connect to the PostgreSQL instance with the given parameters in read-only mode, run:

```sql
ATTACH 'dbname=postgres user=postgres host=127.0.0.1' AS db (TYPE postgres, READ_ONLY);
```

By default, all schemas are attached. When working with large instances, it can be useful to only attach a specific schema. This can be accomplished using the `SCHEMA` command.

```sql
ATTACH 'dbname=postgres user=postgres host=127.0.0.1' AS db (TYPE postgres, SCHEMA 'public');
```

##### Configuration {#docs:current:core_extensions:postgres::configuration}

The `ATTACH` command takes as input either a [`libpq` connection string](https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING)
or a [PostgreSQL URI](https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING-URIS).

Below are some example connection strings and commonly used parameters. A full list of available parameters can be found in the [PostgreSQL documentation](https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS).

```text
dbname=postgresscanner
host=localhost port=5432 dbname=mydb connect_timeout=10
```

| Name       | Description                          | Default        |
| ---------- | ------------------------------------ | -------------- |
| `dbname`   | Database name                        | [user]         |
| `host`     | Name of host to connect to           | `localhost`    |
| `hostaddr` | Host IP address                      | `localhost`    |
| `passfile` | Name of file passwords are stored in | `~/.pgpass`    |
| `password` | PostgreSQL password                  | (empty)        |
| `port`     | Port number                          | `5432`         |
| `user`     | PostgreSQL user name                 | _current user_ |

An example URI is `postgresql://username@hostname/dbname`.

##### Configuring via Secrets {#docs:current:core_extensions:postgres::configuring-via-secrets}

PostgreSQL connection information can also be specified with [secrets](https://duckdb.org/docs/configuration/secrets_manager). The following syntax can be used to create a secret.

```sql
CREATE SECRET (
    TYPE postgres,
    HOST '127.0.0.1',
    PORT 5432,
    DATABASE postgres,
    USER 'postgres',
    PASSWORD ''
);
```

The information from the secret will be used when `ATTACH` is called. We can leave the PostgreSQL connection string empty to use all of the information stored in the secret.

```sql
ATTACH '' AS postgres_db (TYPE postgres);
```

We can use the PostgreSQL connection string to override individual options. For example, to connect to a different database while still using the same credentials, we can override only the database name in the following manner.

```sql
ATTACH 'dbname=my_other_db' AS postgres_db (TYPE postgres);
```

By default, created secrets are temporary. Secrets can be persisted using the [`CREATE PERSISTENT SECRET` command](#docs:current:configuration:secrets_manager::persistent-secrets). Persistent secrets can be used across sessions.

###### Managing Multiple Secrets {#docs:current:core_extensions:postgres::managing-multiple-secrets}

Named secrets can be used to manage connections to multiple PostgreSQL database instances. Secrets can be given a name upon creation.

```sql
CREATE SECRET postgres_secret_one (
    TYPE postgres,
    HOST '127.0.0.1',
    PORT 5432,
    DATABASE postgres,
    USER 'postgres',
    PASSWORD ''
);
```

The secret can then be explicitly referenced using the `SECRET` parameter in the `ATTACH`.

```sql
ATTACH '' AS postgres_db_one (TYPE postgres, SECRET postgres_secret_one);
```

> **Warning.** Avoid including credentials directly in the connection string. If a connection error occurs, the full connection string (including your credentials) may be printed to the terminal output. For better security, store credentials using DuckDB-managed secrets.

##### Configuring via Environment Variables {#docs:current:core_extensions:postgres::configuring-via-environment-variables}

PostgreSQL connection information can also be specified with [environment variables](https://www.postgresql.org/docs/current/libpq-envars.html).
This can be useful in a production environment where the connection information is managed externally
and passed in to the environment.

```bash
export PGPASSWORD="secret"
export PGHOST=localhost
export PGUSER=owner
export PGDATABASE=mydatabase
```

Then, to connect, start the `duckdb` process and run:

```sql
ATTACH '' AS p (TYPE postgres);
```

#### Usage {#docs:current:core_extensions:postgres::usage}

The tables in the PostgreSQL database can be read as if they were normal DuckDB tables, but the underlying data is read directly from PostgreSQL at query time.

```sql
SHOW ALL TABLES;
```



| name  |
| ----- |
| uuids |

```sql
SELECT * FROM uuids;
```



| u                                    |
| ------------------------------------ |
| 6d3d2541-710b-4bde-b3af-4711738636bf |
| NULL                                 |
| 00000000-0000-0000-0000-000000000001 |
| ffffffff-ffff-ffff-ffff-ffffffffffff |

It might be desirable to create a copy of the PostgreSQL databases in DuckDB to prevent the system from re-reading the tables from PostgreSQL continuously, particularly for large tables.

Data can be copied over from PostgreSQL to DuckDB using standard SQL, for example:

```sql
CREATE TABLE duckdb_table AS FROM postgres_db.postgres_tbl;
```

#### Writing Data to PostgreSQL {#docs:current:core_extensions:postgres::writing-data-to-postgresql}

In addition to reading data from PostgreSQL, the extension allows you to create tables, ingest data into PostgreSQL and make other modifications to a PostgreSQL database using standard SQL queries.

This allows you to use DuckDB to, for example, export data that is stored in a PostgreSQL database to Parquet, or read data from a Parquet file into PostgreSQL.

Below is a brief example of how to create a new table in PostgreSQL and load data into it.

```sql
ATTACH 'dbname=postgresscanner' AS postgres_db (TYPE postgres);
CREATE TABLE postgres_db.tbl (id INTEGER, name VARCHAR);
INSERT INTO postgres_db.tbl VALUES (42, 'DuckDB');
```

Many operations on PostgreSQL tables are supported. All these operations directly modify the PostgreSQL database, and the result of subsequent operations can then be read using PostgreSQL.
Note that if modifications are not desired, `ATTACH` can be run with the `READ_ONLY` property which prevents making modifications to the underlying database. For example:

```sql
ATTACH 'dbname=postgresscanner' AS postgres_db (TYPE postgres, READ_ONLY);
```

Below is a list of supported operations.

##### `CREATE TABLE` {#docs:current:core_extensions:postgres::create-table}

```sql
CREATE TABLE postgres_db.tbl (id INTEGER, name VARCHAR);
```

##### `INSERT INTO` {#docs:current:core_extensions:postgres::insert-into}

```sql
INSERT INTO postgres_db.tbl VALUES (42, 'DuckDB');
```

##### `SELECT` {#docs:current:core_extensions:postgres::select}

```sql
SELECT * FROM postgres_db.tbl;
```

|   id | name   |
| ---: | ------ |
|   42 | DuckDB |

##### `COPY` {#docs:current:core_extensions:postgres::copy}

You can copy tables back and forth between PostgreSQL and DuckDB:

```sql
COPY postgres_db.tbl TO 'data.parquet';
COPY postgres_db.tbl FROM 'data.parquet';
```

These copies use [PostgreSQL binary wire encoding](https://www.postgresql.org/docs/current/sql-copy.html).
DuckDB can also write data using this encoding to a file which you can then load into PostgreSQL using a client of your choosing if you would like to do your own connection management:

```sql
COPY 'data.parquet' TO 'pg.bin' WITH (FORMAT postgres_binary);
```

The file produced will be the equivalent of copying the file to PostgreSQL using DuckDB and then dumping it from PostgreSQL using `psql` or another client:

DuckDB:

```sql
COPY postgres_db.tbl FROM 'data.parquet';
```

PostgreSQL:

```sql
\copy tbl TO 'data.bin' WITH (FORMAT BINARY);
```

You may also create a full copy of the database using the [`COPY FROM DATABASE` statement](#docs:current:sql:statements:copy::copy-from-database--to):

```sql
COPY FROM DATABASE postgres_db TO my_duckdb_db;
```

##### `UPDATE` {#docs:current:core_extensions:postgres::update}

```sql
UPDATE postgres_db.tbl
SET name = 'Woohoo'
WHERE id = 42;
```

##### `DELETE` {#docs:current:core_extensions:postgres::delete}

```sql
DELETE FROM postgres_db.tbl
WHERE id = 42;
```

##### `ALTER TABLE` {#docs:current:core_extensions:postgres::alter-table}

```sql
ALTER TABLE postgres_db.tbl
ADD COLUMN k INTEGER;
```

##### `DROP TABLE` {#docs:current:core_extensions:postgres::drop-table}

```sql
DROP TABLE postgres_db.tbl;
```

##### `CREATE VIEW` {#docs:current:core_extensions:postgres::create-view}

```sql
CREATE VIEW postgres_db.v1 AS SELECT 42;
```

##### `CREATE SCHEMA` / `DROP SCHEMA` {#docs:current:core_extensions:postgres::create-schema--drop-schema}

```sql
CREATE SCHEMA postgres_db.s1;
CREATE TABLE postgres_db.s1.integers (i INTEGER);
INSERT INTO postgres_db.s1.integers VALUES (42);
SELECT * FROM postgres_db.s1.integers;
```

|    i |
| ---: |
|   42 |

```sql
DROP SCHEMA postgres_db.s1;
```

#### `DETACH` {#docs:current:core_extensions:postgres::detach}

```sql
DETACH postgres_db;
```

##### Transactions {#docs:current:core_extensions:postgres::transactions}

```sql
CREATE TABLE postgres_db.tmp (i INTEGER);
BEGIN;
INSERT INTO postgres_db.tmp VALUES (42);
SELECT * FROM postgres_db.tmp;
```

This returns:

|    i |
| ---: |
|   42 |

```sql
ROLLBACK;
SELECT * FROM postgres_db.tmp;
```

This returns an empty table.

#### Running SQL Queries in PostgreSQL {#docs:current:core_extensions:postgres::running-sql-queries-in-postgresql}

##### The `postgres_query` Table Function {#docs:current:core_extensions:postgres::the-postgres_query-table-function}

The `postgres_query` table function allows you to run arbitrary read queries within an attached database. `postgres_query` takes the name of the attached PostgreSQL database to execute the query in, as well as the SQL query to execute. The result of the query is returned. Single-quote strings are escaped by repeating the single quote twice.

```sql
postgres_query(attached_database::VARCHAR, query::VARCHAR)
```

For example:

```sql
ATTACH 'dbname=postgresscanner' AS postgres_db (TYPE postgres);
SELECT * FROM postgres_query('postgres_db', 'SELECT * FROM cars LIMIT 3');
```



| brand        | model      | color |
| ------------ | ---------- | ----- |
| Ferrari      | Testarossa | red   |
| Aston Martin | DB2        | blue  |
| Bentley      | Mulsanne   | gray  |

##### The `postgres_execute` Function {#docs:current:core_extensions:postgres::the-postgres_execute-function}

The `postgres_execute` function allows running arbitrary queries within PostgreSQL, including statements that update the schema and content of the database.

```sql
ATTACH 'dbname=postgresscanner' AS postgres_db (TYPE postgres);
CALL postgres_execute('postgres_db', 'CREATE TABLE my_table (i INTEGER)');
```

#### Settings {#docs:current:core_extensions:postgres::settings}

The extension exposes the following configuration parameters.

| Name                              | Description                                                                  | Default |
| --------------------------------- | ---------------------------------------------------------------------------- | ------- |
| `pg_array_as_varchar`             | Read PostgreSQL arrays as varchar - enables reading mixed dimensional arrays | `false` |
| `pg_connection_cache`             | Whether or not to use the connection cache                                   | `true`  |
| `pg_connection_limit`             | The maximum amount of concurrent PostgreSQL connections                      | `64`    |
| `pg_debug_show_queries`           | DEBUG SETTING: print all queries sent to PostgreSQL to stdout                | `false` |
| `pg_experimental_filter_pushdown` | Whether or not to use filter pushdown (currently experimental)               | `true`  |
| `pg_pages_per_task`               | The amount of pages per task                                                 | `1000`  |
| `pg_use_binary_copy`              | Whether or not to use BINARY copy to read data                               | `true`  |
| `pg_null_byte_replacement`        | When writing NULL bytes to Postgres, replace them with the given character   | `NULL`  |
| `pg_use_ctid_scan`                | Whether or not to parallelize scanning using table ctids                     | `true`  |

#### Schema Cache {#docs:current:core_extensions:postgres::schema-cache}

To avoid having to continuously fetch schema data from PostgreSQL, DuckDB keeps schema information – such as the names of tables, their columns, etc. – cached. If changes are made to the schema through a different connection to the PostgreSQL instance, such as new columns being added to a table, the cached schema information might be outdated. In this case, the function `pg_clear_cache` can be executed to clear the internal caches.

```sql
CALL pg_clear_cache();
```

> **Deprecated.** The old `postgres_attach` function is deprecated. It is recommended to switch over to the new `ATTACH` syntax.

## Spatial {#core_extensions:spatial}

### Spatial Extension {#docs:current:core_extensions:spatial:overview}

The `spatial` extension provides support for geospatial data processing in DuckDB.
For an overview of the extension, see our [blog post](https://duckdb.org/2023/04/28/spatial).

#### Installing and Loading {#docs:current:core_extensions:spatial:overview::installing-and-loading}

To install the `spatial` extension, run:

```sql
INSTALL spatial;
```

Note that the `spatial` extension is not autoloadable.
Therefore, you need to load it before using it:

```sql
LOAD spatial;
```

#### The `GEOMETRY` Type {#docs:current:core_extensions:spatial:overview::the-geometry-type}

The core of the spatial extension is the [`GEOMETRY` type](#docs:current:sql:data_types:geometry), which is a flexible and extensible data type for representing geometric objects. The `GEOMETRY` type used to be provided by the `spatial` extension, but it became a built-in data type in DuckDB v1.5. However, almost all of the associated functions for working with geometries (e.g., calculating distances, areas, intersections) are still part of `spatial`.

Besides operating on `GEOMETRY`, the spatial extension also includes a couple of experimental non-standard explicit geometry types, such as `POINT_2D`, `LINESTRING_2D`, `POLYGON_2D` and `BOX_2D` that are based on DuckDBs native nested types, such as `STRUCT` and `LIST`. Since these have a fixed and predictable internal memory layout, it is theoretically possible to optimize a lot of geospatial algorithms to be much faster when operating on these types than on the `GEOMETRY` type. However, only a couple of functions in the spatial extension have been explicitly specialized for these types so far. All of these new types are implicitly castable to `GEOMETRY`, but with a small conversion cost, so the `GEOMETRY` type is still the recommended type to use if you are planning to work with a lot of different spatial functions.

### Spatial Functions {#docs:current:core_extensions:spatial:functions}

#### Function Index  {#docs:current:core_extensions:spatial:functions::function-index-}

**[Scalar Functions](#::scalar-functions)**

| Function | Summary |
| --- | --- |
| [`DuckDB_PROJ_Compiled_Version`](#::duckdb_proj_compiled_version) | Returns a text description of the PROJ library version that this instance of DuckDB was compiled against. |
| [`DuckDB_Proj_Version`](#::duckdb_proj_version) | Returns a text description of the PROJ library version that is being used by this instance of DuckDB. |
| [`ST_Affine`](#::st_affine) | Applies an affine transformation to a geometry. |
| [`ST_Area`](#::st_area) | Compute the area of a geometry. |
| [`ST_Area_Spheroid`](#::st_area_spheroid) | Returns the area of a geometry in meters, using an ellipsoidal model of the earth |
| [`ST_AsGeoJSON`](#::st_asgeojson) | Returns the geometry as a GeoJSON fragment |
| [`ST_AsHEXWKB`](#::st_ashexwkb) | Returns the geometry as a HEXWKB string |
| [`ST_AsMVTGeom`](#::st_asmvtgeom) | Transform and clip geometry to a tile boundary |
| [`ST_AsSVG`](#::st_assvg) | Convert the geometry into a SVG fragment or path |
| [`ST_AsText`](#::st_astext) | Returns the geometry as a WKT string |
| [`ST_AsWKB`](#::st_aswkb) | Returns the geometry as a WKB (Well-Known-Binary) blob |
| [`ST_Azimuth`](#::st_azimuth) | Returns the azimuth (a clockwise angle measured from north) of two points in radian. |
| [`ST_Boundary`](#::st_boundary) | Returns the "boundary" of a geometry |
| [`ST_Buffer`](#::st_buffer) | Returns a buffer around the input geometry at the target distance |
| [`ST_BuildArea`](#::st_buildarea) | Creates a polygonal geometry by attempting to "fill in" the input geometry. |
| [`ST_Centroid`](#::st_centroid) | Returns the centroid of a geometry |
| [`ST_Collect`](#::st_collect) | Collects a list of geometries into a collection geometry. |
| [`ST_CollectionExtract`](#::st_collectionextract) | Extracts geometries from a GeometryCollection into a typed multi geometry. |
| [`ST_ConcaveHull`](#::st_concavehull) | Returns the 'concave' hull of the input geometry, containing all of the source input's points, and which can be used to create polygons from points. The ratio parameter dictates the level of concavity; 1.0 returns the convex hull; and 0 indicates to return the most concave hull possible. Set allowHoles to a non-zero value to allow output containing holes. |
| [`ST_Contains`](#::st_contains) | Returns true if the first geometry contains the second geometry |
| [`ST_ContainsProperly`](#::st_containsproperly) | Returns true if the first geometry \"properly\" contains the second geometry |
| [`ST_ConvexHull`](#::st_convexhull) | Returns the convex hull enclosing the geometry |
| [`ST_CoverageInvalidEdges`](#::st_coverageinvalidedges) | Returns the invalid edges in a polygonal coverage, which are edges that are not shared by two polygons. |
| [`ST_CoverageSimplify`](#::st_coveragesimplify) | Simplify the edges in a polygonal coverage, preserving the coverage by ensuring that there are no seams between the resulting simplified polygons. |
| [`ST_CoverageUnion`](#::st_coverageunion) | Union all geometries in a polygonal coverage into a single geometry. |
| [`ST_CoveredBy`](#::st_coveredby) | Returns true if geom1 is "covered by" geom2 |
| [`ST_Covers`](#::st_covers) | Returns true if the geom1 "covers" geom2 |
| [`ST_Crosses`](#::st_crosses) | Returns true if geom1 "crosses" geom2 |
| [`ST_DWithin`](#::st_dwithin) | Returns if two geometries are within a target distance of each-other |
| [`ST_DWithin_GEOS`](#::st_dwithin_geos) | Returns if two geometries are within a target distance of each-other |
| [`ST_DWithin_Spheroid`](#::st_dwithin_spheroid) | Returns if two POINT_2D's are within a target distance in meters, using an ellipsoidal model of the earths surface |
| [`ST_Difference`](#::st_difference) | Returns the "difference" between two geometries |
| [`ST_Dimension`](#::st_dimension) | Returns the "topological dimension" of a geometry. |
| [`ST_Disjoint`](#::st_disjoint) | Returns true if the geometries are disjoint |
| [`ST_Distance`](#::st_distance) | Returns the planar distance between two geometries |
| [`ST_Distance_GEOS`](#::st_distance_geos) | Returns the planar distance between two geometries |
| [`ST_Distance_Sphere`](#::st_distance_sphere) | Returns the haversine (great circle) distance between two geometries. |
| [`ST_Distance_Spheroid`](#::st_distance_spheroid) | Returns the distance between two geometries in meters using an ellipsoidal model of the earths surface |
| [`ST_Dump`](#::st_dump) | Dumps a geometry into a list of sub-geometries and their "path" in the original geometry. |
| [`ST_EndPoint`](#::st_endpoint) | Returns the end point of a LINESTRING. |
| [`ST_Envelope`](#::st_envelope) | Returns the minimum bounding rectangle of a geometry as a polygon geometry |
| [`ST_Equals`](#::st_equals) | Returns true if the geometries are "equal" |
| [`ST_Extent`](#::st_extent) | Returns the minimal bounding box enclosing the input geometry |
| [`ST_Extent_Approx`](#::st_extent_approx) | Returns the approximate bounding box of a geometry, if available. |
| [`ST_ExteriorRing`](#::st_exteriorring) | Returns the exterior ring (shell) of a polygon geometry. |
| [`ST_FlipCoordinates`](#::st_flipcoordinates) | Returns a new geometry with the coordinates of the input geometry "flipped" so that x = y and y = x |
| [`ST_Force2D`](#::st_force2d) | Forces the vertices of a geometry to have X and Y components |
| [`ST_Force3DM`](#::st_force3dm) | Forces the vertices of a geometry to have X, Y and M components |
| [`ST_Force3DZ`](#::st_force3dz) | Forces the vertices of a geometry to have X, Y and Z components |
| [`ST_Force4D`](#::st_force4d) | Forces the vertices of a geometry to have X, Y, Z and M components |
| [`ST_GeomFromGeoJSON`](#::st_geomfromgeojson) | Deserializes a GEOMETRY from a GeoJSON fragment. |
| [`ST_GeomFromHEXEWKB`](#::st_geomfromhexewkb) | Deserialize a GEOMETRY from a HEX(E)WKB encoded string |
| [`ST_GeomFromHEXWKB`](#::st_geomfromhexwkb) | Deserialize a GEOMETRY from a HEX(E)WKB encoded string |
| [`ST_GeomFromText`](#::st_geomfromtext) | Deserialize a GEOMETRY from a WKT encoded string |
| [`ST_GeomFromWKB`](#::st_geomfromwkb) | Deserializes a GEOMETRY from a WKB encoded blob |
| [`ST_GeometryType`](#::st_geometrytype) | Returns a 'GEOMETRY_TYPE' enum identifying the input geometry type. Possible enum return types are: `POINT`, `LINESTRING`, `POLYGON`, `MULTIPOINT`, `MULTILINESTRING`, `MULTIPOLYGON` and `GEOMETRYCOLLECTION`. |
| [`ST_HasM`](#::st_hasm) | Check if the input geometry has M values. |
| [`ST_HasZ`](#::st_hasz) | Check if the input geometry has Z values. |
| [`ST_Hilbert`](#::st_hilbert) | Encodes the X and Y values as the hilbert curve index for a curve covering the given bounding box. |
| [`ST_InterpolatePoint`](#::st_interpolatepoint) | Computes the closest point on a LINESTRING to a given POINT and returns the interpolated M value of that point. |
| [`ST_Intersection`](#::st_intersection) | Returns the intersection of two geometries |
| [`ST_Intersects`](#::st_intersects) | Returns true if the geometries intersect |
| [`ST_Intersects_Extent`](#::st_intersects_extent) | Returns true if the extent of two geometries intersects |
| [`ST_IsClosed`](#::st_isclosed) | Check if a geometry is 'closed' |
| [`ST_IsEmpty`](#::st_isempty) | Returns true if the geometry is "empty". |
| [`ST_IsRing`](#::st_isring) | Returns true if the geometry is a ring (both ST_IsClosed and ST_IsSimple). |
| [`ST_IsSimple`](#::st_issimple) | Returns true if the geometry is simple |
| [`ST_IsValid`](#::st_isvalid) | Returns true if the geometry is valid |
| [`ST_Length`](#::st_length) | Returns the length of the input line geometry |
| [`ST_Length_Spheroid`](#::st_length_spheroid) | Returns the length of the input geometry in meters, using an ellipsoidal model of the earth |
| [`ST_LineInterpolatePoint`](#::st_lineinterpolatepoint) | Returns a point interpolated along a line at a fraction of total 2D length. |
| [`ST_LineInterpolatePoints`](#::st_lineinterpolatepoints) | Returns a multi-point interpolated along a line at a fraction of total 2D length. |
| [`ST_LineLocatePoint`](#::st_linelocatepoint) | Returns the location on a line closest to a point as a fraction of the total 2D length of the line. |
| [`ST_LineMerge`](#::st_linemerge) | "Merges" the input line geometry, optionally taking direction into account. |
| [`ST_LineString2DFromWKB`](#::st_linestring2dfromwkb) | Deserialize a LINESTRING_2D from a WKB encoded blob |
| [`ST_LineSubstring`](#::st_linesubstring) | Returns a substring of a line between two fractions of total 2D length. |
| [`ST_LocateAlong`](#::st_locatealong) | Returns a point or multi-point, containing the point(s) at the geometry with the given measure |
| [`ST_LocateBetween`](#::st_locatebetween) | Returns a geometry or geometry collection created by filtering and interpolating vertices within a range of "M" values |
| [`ST_M`](#::st_m) | Returns the M coordinate of a point geometry |
| [`ST_MMax`](#::st_mmax) | Returns the maximum M coordinate of a geometry |
| [`ST_MMin`](#::st_mmin) | Returns the minimum M coordinate of a geometry |
| [`ST_MakeBox2D`](#::st_makebox2d) | Create a BOX2D from two POINT geometries |
| [`ST_MakeEnvelope`](#::st_makeenvelope) | Create a rectangular polygon from min/max coordinates |
| [`ST_MakeLine`](#::st_makeline) | Create a LINESTRING from a list of POINT geometries |
| [`ST_MakePoint`](#::st_makepoint) | Creates a GEOMETRY point from an pair of floating point numbers. |
| [`ST_MakePolygon`](#::st_makepolygon) | Create a POLYGON from a LINESTRING shell |
| [`ST_MakeValid`](#::st_makevalid) | Returns a valid representation of the geometry |
| [`ST_MaximumInscribedCircle`](#::st_maximuminscribedcircle) | Returns the maximum inscribed circle of the input geometry, optionally with a tolerance. |
| [`ST_MinimumRotatedRectangle`](#::st_minimumrotatedrectangle) | Returns the minimum rotated rectangle that bounds the input geometry, finding the surrounding box that has the lowest area by using a rotated rectangle, rather than taking the lowest and highest coordinate values as per ST_Envelope(). |
| [`ST_Multi`](#::st_multi) | Turns a single geometry into a multi geometry. |
| [`ST_NGeometries`](#::st_ngeometries) | Returns the number of component geometries in a collection geometry. |
| [`ST_NInteriorRings`](#::st_ninteriorrings) | Returns the number of interior rings of a polygon |
| [`ST_NPoints`](#::st_npoints) | Returns the number of vertices within a geometry |
| [`ST_Node`](#::st_node) | Returns a "noded" MultiLinestring, produced by combining a collection of input linestrings and adding additional vertices where they intersect. |
| [`ST_Normalize`](#::st_normalize) | Returns the "normalized" representation of the geometry |
| [`ST_NumGeometries`](#::st_numgeometries) | Returns the number of component geometries in a collection geometry. |
| [`ST_NumInteriorRings`](#::st_numinteriorrings) | Returns the number of interior rings of a polygon |
| [`ST_NumPoints`](#::st_numpoints) | Returns the number of vertices within a geometry |
| [`ST_Overlaps`](#::st_overlaps) | Returns true if the geometries overlap |
| [`ST_Perimeter`](#::st_perimeter) | Returns the length of the perimeter of the geometry |
| [`ST_Perimeter_Spheroid`](#::st_perimeter_spheroid) | Returns the length of the perimeter in meters using an ellipsoidal model of the earths surface |
| [`ST_Point`](#::st_point) | Creates a GEOMETRY point |
| [`ST_Point2D`](#::st_point2d) | Creates a POINT_2D |
| [`ST_Point2DFromWKB`](#::st_point2dfromwkb) | Deserialize a POINT_2D from a WKB encoded blob |
| [`ST_Point3D`](#::st_point3d) | Creates a POINT_3D |
| [`ST_Point4D`](#::st_point4d) | Creates a POINT_4D |
| [`ST_PointN`](#::st_pointn) | Returns the n'th vertex from the input geometry as a point geometry |
| [`ST_PointOnSurface`](#::st_pointonsurface) | Returns a point guaranteed to lie on the surface of the geometry |
| [`ST_Points`](#::st_points) | Collects all the vertices in the geometry into a MULTIPOINT |
| [`ST_Polygon2DFromWKB`](#::st_polygon2dfromwkb) | Deserialize a POLYGON_2D from a WKB encoded blob |
| [`ST_Polygonize`](#::st_polygonize) | Returns a polygonized representation of the input geometries |
| [`ST_QuadKey`](#::st_quadkey) | Compute the [quadkey](https://learn.microsoft.com/en-us/bingmaps/articles/bing-maps-tile-system) for a given lon/lat point at a given level. |
| [`ST_ReducePrecision`](#::st_reduceprecision) | Returns the geometry with all vertices reduced to the given precision |
| [`ST_RemoveRepeatedPoints`](#::st_removerepeatedpoints) | Remove repeated points from a LINESTRING. |
| [`ST_Reverse`](#::st_reverse) | Returns the geometry with the order of its vertices reversed |
| [`ST_ShortestLine`](#::st_shortestline) | Returns the shortest line between two geometries |
| [`ST_Simplify`](#::st_simplify) | Returns a simplified version of the geometry |
| [`ST_SimplifyPreserveTopology`](#::st_simplifypreservetopology) | Returns a simplified version of the geometry that preserves topology |
| [`ST_StartPoint`](#::st_startpoint) | Returns the start point of a LINESTRING. |
| [`ST_TileEnvelope`](#::st_tileenvelope) | The `ST_TileEnvelope` scalar function generates tile envelope rectangular polygons from specified zoom level and tile indices. |
| [`ST_Touches`](#::st_touches) | Returns true if the geometries touch |
| [`ST_Transform`](#::st_transform) | Transforms a geometry between two coordinate systems |
| [`ST_Union`](#::st_union) | Returns the union of two geometries |
| [`ST_VoronoiDiagram`](#::st_voronoidiagram) | Returns the Voronoi diagram of the supplied MultiPoint geometry |
| [`ST_Within`](#::st_within) | Returns true if the first geometry is within the second |
| [`ST_WithinProperly`](#::st_withinproperly) | Returns true if the first geometry \"properly\" is contained by the second geometry |
| [`ST_X`](#::st_x) | Returns the X coordinate of a point geometry |
| [`ST_XMax`](#::st_xmax) | Returns the maximum X coordinate of a geometry |
| [`ST_XMin`](#::st_xmin) | Returns the minimum X coordinate of a geometry |
| [`ST_Y`](#::st_y) | Returns the Y coordinate of a point geometry |
| [`ST_YMax`](#::st_ymax) | Returns the maximum Y coordinate of a geometry |
| [`ST_YMin`](#::st_ymin) | Returns the minimum Y coordinate of a geometry |
| [`ST_Z`](#::st_z) | Returns the Z coordinate of a point geometry |
| [`ST_ZMFlag`](#::st_zmflag) | Returns a flag indicating the presence of Z and M values in the input geometry. |
| [`ST_ZMax`](#::st_zmax) | Returns the maximum Z coordinate of a geometry |
| [`ST_ZMin`](#::st_zmin) | Returns the minimum Z coordinate of a geometry |

**[Aggregate Functions](#::aggregate-functions)**

| Function | Summary |
| --- | --- |
| [`ST_AsMVT`](#::st_asmvt) | Make a Mapbox Vector Tile from a set of geometries and properties |
| [`ST_CoverageInvalidEdges_Agg`](#::st_coverageinvalidedges_agg) | Returns the invalid edges of a coverage geometry |
| [`ST_CoverageSimplify_Agg`](#::st_coveragesimplify_agg) | Simplifies a set of geometries while maintaining coverage |
| [`ST_CoverageUnion_Agg`](#::st_coverageunion_agg) | Unions a set of geometries while maintaining coverage |
| [`ST_Envelope_Agg`](#::st_envelope_agg) | Alias for [ST_Extent_Agg](#::st_extent_agg). |
| [`ST_Extent_Agg`](#::st_extent_agg) | Computes the minimal-bounding-box polygon containing the set of input geometries |
| [`ST_Intersection_Agg`](#::st_intersection_agg) | Computes the intersection of a set of geometries |
| [`ST_MemUnion_Agg`](#::st_memunion_agg) | Computes the union of a set of input geometries. |
| [`ST_Union_Agg`](#::st_union_agg) | Computes the union of a set of input geometries |

**[Macro Functions](#::macro-functions)**

| Function | Summary |
| --- | --- |
| [`ST_Rotate`](#::st_rotate) | Alias of ST_RotateZ |
| [`ST_RotateX`](#::st_rotatex) | Rotates a geometry around the X axis. This is a shorthand macro for calling ST_Affine. |
| [`ST_RotateY`](#::st_rotatey) | Rotates a geometry around the Y axis. This is a shorthand macro for calling ST_Affine. |
| [`ST_RotateZ`](#::st_rotatez) | Rotates a geometry around the Z axis. This is a shorthand macro for calling ST_Affine. |
| [`ST_Scale`](#::st_scale) |  |
| [`ST_TransScale`](#::st_transscale) | Translates and then scales a geometry in X and Y direction. This is a shorthand macro for calling ST_Affine. |
| [`ST_Translate`](#::st_translate) |  |

**[Table Functions](#::table-functions)**

| Function | Summary |
| --- | --- |
| [`ST_Drivers`](#::st_drivers) | Returns the list of supported GDAL drivers and file formats |
| [`ST_GeneratePoints`](#::st_generatepoints) | Generates a set of random points within the specified bounding box. |
| [`ST_Read`](#::st_read) | Read and import a variety of geospatial file formats using the GDAL library. |
| [`ST_ReadOSM`](#::st_readosm) | The `ST_ReadOsm()` table function enables reading compressed OpenStreetMap data directly from a `.osm.pbf file.` |
| [`ST_ReadSHP`](#::st_readshp) | Read a Shapefile without relying on the GDAL library |
| [`ST_Read_Meta`](#::st_read_meta) | Read the metadata from a variety of geospatial file formats using the GDAL library. |

----

#### Scalar Functions {#docs:current:core_extensions:spatial:functions::scalar-functions}

##### DuckDB_PROJ_Compiled_Version {#docs:current:core_extensions:spatial:functions::duckdb_proj_compiled_version}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
VARCHAR DuckDB_PROJ_Compiled_Version ()
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns a text description of the PROJ library version that this instance of DuckDB was compiled against.

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT duckdb_proj_compiled_version();
┌────────────────────────────────┐
│ duckdb_proj_compiled_version() │
│            varchar             │
├────────────────────────────────┤
│ Rel. 9.1.1, December 1st, 2022 │
└────────────────────────────────┘
```

----

##### DuckDB_Proj_Version {#docs:current:core_extensions:spatial:functions::duckdb_proj_version}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
VARCHAR DuckDB_Proj_Version ()
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns a text description of the PROJ library version that is being used by this instance of DuckDB.

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT duckdb_proj_version();
┌───────────────────────┐
│ duckdb_proj_version() │
│        varchar        │
├───────────────────────┤
│ 9.1.1                 │
└───────────────────────┘
```

----

##### ST_Affine {#docs:current:core_extensions:spatial:functions::st_affine}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
GEOMETRY ST_Affine (geom GEOMETRY, a DOUBLE, b DOUBLE, c DOUBLE, d DOUBLE, e DOUBLE, f DOUBLE, g DOUBLE, h DOUBLE, i DOUBLE, xoff DOUBLE, yoff DOUBLE, zoff DOUBLE)
GEOMETRY ST_Affine (geom GEOMETRY, a DOUBLE, b DOUBLE, d DOUBLE, e DOUBLE, xoff DOUBLE, yoff DOUBLE)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Applies an affine transformation to a geometry.

For the 2D variant, the transformation matrix is defined as follows:

```text
| a b xoff |
| d e yoff |
| 0 0 1    |
```

For the 3D variant, the transformation matrix is defined as follows:

```text
| a b c xoff |
| d e f yoff |
| g h i zoff |
| 0 0 0 1    |
```

The transformation is applied to all vertices of the geometry.

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
-- Translate a point by (2, 3)
SELECT ST_Affine(ST_Point(1, 1),
                 1, 0,   -- a, b
                 0, 1,   -- d, e
                 2, 3);  -- xoff, yoff
----
POINT (3 4)

-- Scale a geometry by factor 2 in X and Y
SELECT ST_Affine(ST_Point(1, 1),
                 2, 0, 0,   -- a, b, c
                 0, 2, 0,   -- d, e, f
                 0, 0, 1,   -- g, h, i
                 0, 0, 0);  -- xoff, yoff, zoff
----
POINT (2 2)
```

----

##### ST_Area {#docs:current:core_extensions:spatial:functions::st_area}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
DOUBLE ST_Area (geom GEOMETRY)
DOUBLE ST_Area (polygon POLYGON_2D)
DOUBLE ST_Area (linestring LINESTRING_2D)
DOUBLE ST_Area (point POINT_2D)
DOUBLE ST_Area (box BOX_2D)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Compute the area of a geometry.

Returns `0.0` for any geometry that is not a `POLYGON`, `MULTIPOLYGON` or `GEOMETRYCOLLECTION` containing polygon
geometries.

The area is in the same units as the spatial reference system of the geometry.

The `POINT_2D` and `LINESTRING_2D` overloads of this function always return `0.0` but are included for completeness.

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
select ST_Area('POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))'::geometry);
-- 1.0
```

----

##### ST_Area_Spheroid {#docs:current:core_extensions:spatial:functions::st_area_spheroid}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
DOUBLE ST_Area_Spheroid (geom GEOMETRY)
DOUBLE ST_Area_Spheroid (poly POLYGON_2D)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the area of a geometry in meters, using an ellipsoidal model of the earth

The input geometry is assumed to be in the [EPSG:4326](https://en.wikipedia.org/wiki/World_Geodetic_System) coordinate system (WGS84), with [latitude, longitude] axis order and the area is returned in square meters. This function uses the [GeographicLib](https://geographiclib.sourceforge.io/) library, calculating the area using an ellipsoidal model of the earth. This is a highly accurate method for calculating the area of a polygon taking the curvature of the earth into account, but is also the slowest.

Returns `0.0` for any geometry that is not a `POLYGON`, `MULTIPOLYGON` or `GEOMETRYCOLLECTION` containing polygon geometries.

----

##### ST_AsGeoJSON {#docs:current:core_extensions:spatial:functions::st_asgeojson}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
JSON ST_AsGeoJSON (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the geometry as a GeoJSON fragment

This does not return a complete GeoJSON document, only the geometry fragment.
To construct a complete GeoJSON document or feature, look into using the DuckDB JSON extension in conjunction with this function.
This function supports geometries with Z values, but not M values. M values are ignored.

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
select ST_AsGeoJSON('POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))'::geometry);
----
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}

-- Convert a geometry into a full GeoJSON feature (requires the JSON extension to be loaded)
SELECT CAST({
    type: 'Feature',
    geometry: ST_AsGeoJSON(ST_Point(1,2)),
    properties: {
        name: 'my_point'
    }
} AS JSON);
----
{"type":"Feature","geometry":{"type":"Point","coordinates":[1.0,2.0]},"properties":{"name":"my_point"}}
```

----

##### ST_AsHEXWKB {#docs:current:core_extensions:spatial:functions::st_ashexwkb}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
VARCHAR ST_AsHEXWKB (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the geometry as a HEXWKB string

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT ST_AsHexWKB('POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))'::geometry);
----
01030000000100000005000000000000000000000000000...
```

----

##### ST_AsMVTGeom {#docs:current:core_extensions:spatial:functions::st_asmvtgeom}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
GEOMETRY ST_AsMVTGeom (geom GEOMETRY, bounds BOX_2D, extent BIGINT, buffer BIGINT, clip_geom BOOLEAN)
GEOMETRY ST_AsMVTGeom (geom GEOMETRY, bounds BOX_2D, extent BIGINT, buffer BIGINT)
GEOMETRY ST_AsMVTGeom (geom GEOMETRY, bounds BOX_2D, extent BIGINT)
GEOMETRY ST_AsMVTGeom (geom GEOMETRY, bounds BOX_2D)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Transform and clip geometry to a tile boundary

See "ST_AsMVT" for more details

----

##### ST_AsSVG {#docs:current:core_extensions:spatial:functions::st_assvg}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
VARCHAR ST_AsSVG (geom GEOMETRY, relative BOOLEAN, precision INTEGER)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Convert the geometry into a SVG fragment or path

The SVG fragment is returned as a string. The fragment is a path element that can be used in an SVG document.
The second boolean argument specifies whether the path should be relative or absolute.
The third argument specifies the maximum number of digits to use for the coordinates.

Points are formatted as cx/cy using absolute coordinates or x/y using relative coordinates.

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT ST_AsSVG('POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))'::GEOMETRY, false, 15);
----
M 0 0 L 0 -1 1 -1 1 0 Z
```

----

##### ST_AsText {#docs:current:core_extensions:spatial:functions::st_astext}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
VARCHAR ST_AsText (geom GEOMETRY)
VARCHAR ST_AsText (point POINT_2D)
VARCHAR ST_AsText (linestring LINESTRING_2D)
VARCHAR ST_AsText (polygon POLYGON_2D)
VARCHAR ST_AsText (box BOX_2D)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the geometry as a WKT string

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT ST_MakeEnvelope(0,0,1,1);
----
POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))
```

----

##### ST_AsWKB {#docs:current:core_extensions:spatial:functions::st_aswkb}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
WKB_BLOB ST_AsWKB (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the geometry as a WKB (Well-Known-Binary) blob

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT ST_AsWKB('POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))'::GEOMETRY)::BLOB;
----
\x01\x03\x00\x00\x00\x01\x00\x00\x00\x05...
```

----

##### ST_Azimuth {#docs:current:core_extensions:spatial:functions::st_azimuth}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
DOUBLE ST_Azimuth (origin GEOMETRY, target GEOMETRY)
DOUBLE ST_Azimuth (origin POINT_2D, target POINT_2D)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the azimuth (a clockwise angle measured from north) of two points in radian.

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT degrees(ST_Azimuth(ST_Point(0, 0), ST_Point(0, 1)));
----
90.0
```

----

##### ST_Boundary {#docs:current:core_extensions:spatial:functions::st_boundary}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_Boundary (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the "boundary" of a geometry

----

##### ST_Buffer {#docs:current:core_extensions:spatial:functions::st_buffer}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
GEOMETRY ST_Buffer (geom GEOMETRY, distance DOUBLE)
GEOMETRY ST_Buffer (geom GEOMETRY, distance DOUBLE, num_triangles INTEGER)
GEOMETRY ST_Buffer (geom GEOMETRY, distance DOUBLE, num_triangles INTEGER, cap_style VARCHAR, join_style VARCHAR, mitre_limit DOUBLE)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns a buffer around the input geometry at the target distance

`geom` is the input geometry.

`distance` is the target distance for the buffer, using the same units as the input geometry.

`num_triangles` represents how many triangles that will be produced to approximate a quarter circle. The larger the number, the smoother the resulting geometry. The default value is 8.

`cap_style` must be one of "CAP_ROUND", "CAP_FLAT", "CAP_SQUARE". This parameter is case-insensitive.

`join_style` must be one of "JOIN_ROUND", "JOIN_MITRE", "JOIN_BEVEL". This parameter is case-insensitive.

`mitre_limit` only applies when `join_style` is "JOIN_MITRE". It is the ratio of the distance from the corner to the mitre point to the corner radius. The default value is 1.0.

This is a planar operation and will not take into account the curvature of the earth.

----

##### ST_BuildArea {#docs:current:core_extensions:spatial:functions::st_buildarea}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_BuildArea (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Creates a polygonal geometry by attempting to "fill in" the input geometry.

Unlike ST_Polygonize, this function does not fill in holes.

----

##### ST_Centroid {#docs:current:core_extensions:spatial:functions::st_centroid}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
GEOMETRY ST_Centroid (geom GEOMETRY)
POINT_2D ST_Centroid (point POINT_2D)
POINT_2D ST_Centroid (linestring LINESTRING_2D)
POINT_2D ST_Centroid (polygon POLYGON_2D)
POINT_2D ST_Centroid (box BOX_2D)
POINT_2D ST_Centroid (box BOX_2DF)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the centroid of a geometry

----

##### ST_Collect {#docs:current:core_extensions:spatial:functions::st_collect}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_Collect (geoms GEOMETRY[])
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Collects a list of geometries into a collection geometry.
* If all geometries are `POINT`'s, a `MULTIPOINT` is returned.
* If all geometries are `LINESTRING`'s, a `MULTILINESTRING` is returned.
* If all geometries are `POLYGON`'s, a `MULTIPOLYGON` is returned.
* Otherwise if the input collection contains a mix of geometry types, a `GEOMETRYCOLLECTION` is returned.

Empty and `NULL` geometries are ignored. If all geometries are empty or `NULL`, a `GEOMETRYCOLLECTION EMPTY` is returned.

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
-- With all POINT's, a MULTIPOINT is returned
SELECT ST_Collect([ST_Point(1, 2), ST_Point(3, 4)]);
----
MULTIPOINT (1 2, 3 4)

-- With mixed geometry types, a GEOMETRYCOLLECTION is returned
SELECT ST_Collect([ST_Point(1, 2), ST_GeomFromText('LINESTRING(3 4, 5 6)')]);
----
GEOMETRYCOLLECTION (POINT (1 2), LINESTRING (3 4, 5 6))

-- Note that the empty geometry is ignored, so the result is a MULTIPOINT
SELECT ST_Collect([ST_Point(1, 2), NULL, ST_GeomFromText('GEOMETRYCOLLECTION EMPTY')]);
----
MULTIPOINT (1 2)

-- If all geometries are empty or NULL, a GEOMETRYCOLLECTION EMPTY is returned
SELECT ST_Collect([NULL, ST_GeomFromText('GEOMETRYCOLLECTION EMPTY')]);
----
GEOMETRYCOLLECTION EMPTY

-- Tip: You can use the `ST_Collect` function together with the `list()` aggregate function to collect multiple rows of geometries into a single geometry collection:

CREATE TABLE points (geom GEOMETRY);

INSERT INTO points VALUES (ST_Point(1, 2)), (ST_Point(3, 4));

SELECT ST_Collect(list(geom)) FROM points;
----
MULTIPOINT (1 2, 3 4)
```

----

##### ST_CollectionExtract {#docs:current:core_extensions:spatial:functions::st_collectionextract}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
GEOMETRY ST_CollectionExtract (geom GEOMETRY, type INTEGER)
GEOMETRY ST_CollectionExtract (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Extracts geometries from a GeometryCollection into a typed multi geometry.

If the input geometry is a GeometryCollection, the function will return a multi geometry, determined by the `type` parameter.
* If `type` = 1, returns a MultiPoint containing all the Points in the collection.
* If `type` = 2, returns a MultiLineString containing all the LineStrings in the collection.
* If `type` = 3, returns a MultiPolygon containing all the Polygons in the collection.

If no `type` parameters is provided, the function will return a multi geometry matching the highest "surface dimension"
of the contained geometries. E.g. if the collection contains only Points, a MultiPoint will be returned. But if the
collection contains both Points and LineStrings, a MultiLineString will be returned. Similarly, if the collection
contains Polygons, a MultiPolygon will be returned. Contained geometries of a lower surface dimension will be ignored.

If the input geometry contains nested GeometryCollections, their geometries will be extracted recursively and included
into the final multi geometry as well.

If the input geometry is not a GeometryCollection, the function will return the input geometry as is.

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
select st_collectionextract('MULTIPOINT(1 2,3 4)'::geometry, 1);
-- MULTIPOINT (1 2, 3 4)
```

----

##### ST_ConcaveHull {#docs:current:core_extensions:spatial:functions::st_concavehull}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_ConcaveHull (geom GEOMETRY, ratio DOUBLE, allowHoles BOOLEAN)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the 'concave' hull of the input geometry, containing all of the source input's points, and which can be used to create polygons from points. The ratio parameter dictates the level of concavity; 1.0 returns the convex hull; and 0 indicates to return the most concave hull possible. Set allowHoles to a non-zero value to allow output containing holes.

----

##### ST_Contains {#docs:current:core_extensions:spatial:functions::st_contains}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
BOOLEAN ST_Contains (geom1 POLYGON_2D, geom2 POINT_2D)
BOOLEAN ST_Contains (geom1 GEOMETRY, geom2 GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns true if the first geometry contains the second geometry

In contrast to `ST_ContainsProperly`, this function will also return true if `geom2` is contained strictly on the boundary of `geom1`.
A geometry always `ST_Contains` itself, but does not `ST_ContainsProperly` itself.

----

##### ST_ContainsProperly {#docs:current:core_extensions:spatial:functions::st_containsproperly}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
BOOLEAN ST_ContainsProperly (geom1 GEOMETRY, geom2 GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns true if the first geometry \"properly\" contains the second geometry

In contrast to `ST_Contains`, this function does not return true if `geom2` is contained strictly on the boundary of `geom1`.
A geometry always `ST_Contains` itself, but does not `ST_ContainsProperly` itself.

----

##### ST_ConvexHull {#docs:current:core_extensions:spatial:functions::st_convexhull}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_ConvexHull (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the convex hull enclosing the geometry

----

##### ST_CoverageInvalidEdges {#docs:current:core_extensions:spatial:functions::st_coverageinvalidedges}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
GEOMETRY ST_CoverageInvalidEdges (geoms GEOMETRY[], tolerance DOUBLE)
GEOMETRY ST_CoverageInvalidEdges (geoms GEOMETRY[])
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the invalid edges in a polygonal coverage, which are edges that are not shared by two polygons.
Returns NULL if the input is not a polygonal coverage, or if the input is valid.
Tolerance is 0 by default.

----

##### ST_CoverageSimplify {#docs:current:core_extensions:spatial:functions::st_coveragesimplify}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
GEOMETRY ST_CoverageSimplify (geoms GEOMETRY[], tolerance DOUBLE, simplify_boundary BOOLEAN)
GEOMETRY ST_CoverageSimplify (geoms GEOMETRY[], tolerance DOUBLE)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Simplify the edges in a polygonal coverage, preserving the coverage by ensuring that there are no seams between the resulting simplified polygons.

By default, the boundary of the coverage is also simplified, but this can be controlled with the optional third 'simplify_boundary' parameter.

----

##### ST_CoverageUnion {#docs:current:core_extensions:spatial:functions::st_coverageunion}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_CoverageUnion (geoms GEOMETRY[])
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Union all geometries in a polygonal coverage into a single geometry.
This may be faster than using `ST_Union`, but may use more memory.

----

##### ST_CoveredBy {#docs:current:core_extensions:spatial:functions::st_coveredby}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
BOOLEAN ST_CoveredBy (geom1 GEOMETRY, geom2 GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns true if geom1 is "covered by" geom2

----

##### ST_Covers {#docs:current:core_extensions:spatial:functions::st_covers}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
BOOLEAN ST_Covers (geom1 GEOMETRY, geom2 GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns true if the geom1 "covers" geom2

----

##### ST_Crosses {#docs:current:core_extensions:spatial:functions::st_crosses}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
BOOLEAN ST_Crosses (geom1 GEOMETRY, geom2 GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns true if geom1 "crosses" geom2

----

##### ST_DWithin {#docs:current:core_extensions:spatial:functions::st_dwithin}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
BOOLEAN ST_DWithin (geom1 GEOMETRY, geom2 GEOMETRY, distance DOUBLE)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns if two geometries are within a target distance of each-other

----

##### ST_DWithin_GEOS {#docs:current:core_extensions:spatial:functions::st_dwithin_geos}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
BOOLEAN ST_DWithin_GEOS (geom1 GEOMETRY, geom2 GEOMETRY, distance DOUBLE)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns if two geometries are within a target distance of each-other

----

##### ST_DWithin_Spheroid {#docs:current:core_extensions:spatial:functions::st_dwithin_spheroid}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
BOOLEAN ST_DWithin_Spheroid (p1 POINT_2D, p2 POINT_2D, distance DOUBLE)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns if two POINT_2D's are within a target distance in meters, using an ellipsoidal model of the earths surface

The input geometry is assumed to be in the [EPSG:4326](https://en.wikipedia.org/wiki/World_Geodetic_System) coordinate system (WGS84), with [latitude, longitude] axis order and the distance is returned in meters. This function uses the [GeographicLib](https://geographiclib.sourceforge.io/) library to solve the [inverse geodesic problem](https://en.wikipedia.org/wiki/Geodesics_on_an_ellipsoid#Solution_of_the_direct_and_inverse_problems), calculating the distance between two points using an ellipsoidal model of the earth. This is a highly accurate method for calculating the distance between two arbitrary points taking the curvature of the earths surface into account, but is also the slowest.

----

##### ST_Difference {#docs:current:core_extensions:spatial:functions::st_difference}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_Difference (geom1 GEOMETRY, geom2 GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the "difference" between two geometries

----

##### ST_Dimension {#docs:current:core_extensions:spatial:functions::st_dimension}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
INTEGER ST_Dimension (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the "topological dimension" of a geometry.

* For POINT and MULTIPOINT geometries, returns `0`.
* For LINESTRING and MULTILINESTRING, returns `1`.
* For POLYGON and MULTIPOLYGON, returns `2`.
* For GEOMETRYCOLLECTION, returns the maximum dimension of the contained geometries, or 0 if the collection is empty.

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
select st_dimension('POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))'::geometry);
----
2
```

----

##### ST_Disjoint {#docs:current:core_extensions:spatial:functions::st_disjoint}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
BOOLEAN ST_Disjoint (geom1 GEOMETRY, geom2 GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns true if the geometries are disjoint

----

##### ST_Distance {#docs:current:core_extensions:spatial:functions::st_distance}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
DOUBLE ST_Distance (point1 POINT_2D, point2 POINT_2D)
DOUBLE ST_Distance (point POINT_2D, linestring LINESTRING_2D)
DOUBLE ST_Distance (linestring LINESTRING_2D, point POINT_2D)
DOUBLE ST_Distance (geom1 GEOMETRY, geom2 GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the planar distance between two geometries

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT ST_Distance('POINT (0 0)'::GEOMETRY, 'POINT (3 4)'::GEOMETRY);
----
5.0

-- Z coordinates are ignored
SELECT ST_Distance('POINT Z (0 0 0)'::GEOMETRY, 'POINT Z (3 4 5)'::GEOMETRY);
----
5.0
```

----

##### ST_Distance_GEOS {#docs:current:core_extensions:spatial:functions::st_distance_geos}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
DOUBLE ST_Distance_GEOS (geom1 GEOMETRY, geom2 GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the planar distance between two geometries

----

##### ST_Distance_Sphere {#docs:current:core_extensions:spatial:functions::st_distance_sphere}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
DOUBLE ST_Distance_Sphere (geom1 GEOMETRY, geom2 GEOMETRY)
DOUBLE ST_Distance_Sphere (point1 POINT_2D, point2 POINT_2D)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the haversine (great circle) distance between two geometries.

* Only supports POINT geometries.
* Returns the distance in meters.
* The input is expected to be in WGS84 (EPSG:4326) coordinates, using a [latitude, longitude] axis order.

----

##### ST_Distance_Spheroid {#docs:current:core_extensions:spatial:functions::st_distance_spheroid}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
DOUBLE ST_Distance_Spheroid (p1 POINT_2D, p2 POINT_2D)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the distance between two geometries in meters using an ellipsoidal model of the earths surface

The input geometry is assumed to be in the [EPSG:4326](https://en.wikipedia.org/wiki/World_Geodetic_System) coordinate system (WGS84), with [latitude, longitude] axis order and the distance limit is expected to be in meters. This function uses the [GeographicLib](https://geographiclib.sourceforge.io/) library to solve the [inverse geodesic problem](https://en.wikipedia.org/wiki/Geodesics_on_an_ellipsoid#Solution_of_the_direct_and_inverse_problems), calculating the distance between two points using an ellipsoidal model of the earth. This is a highly accurate method for calculating the distance between two arbitrary points taking the curvature of the earths surface into account, but is also the slowest.

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
-- Note: the coordinates are in WGS84 and [latitude, longitude] axis order
-- What's the distance between New York and Amsterdam (JFK and AMS airport)?
SELECT st_distance_spheroid(
st_point(40.6446, -73.7797),
st_point(52.3130, 4.7725)
);
----
5863418.7459356235
-- Roughly 5863km!
```

----

##### ST_Dump {#docs:current:core_extensions:spatial:functions::st_dump}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
STRUCT(geom GEOMETRY, path INTEGER[])[] ST_Dump (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Dumps a geometry into a list of sub-geometries and their "path" in the original geometry.

You can use the `UNNEST(res, recursive := true)` function to explode  resulting list of structs into multiple rows.

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
select st_dump('MULTIPOINT(1 2,3 4)'::geometry);
----
[{'geom': 'POINT(1 2)', 'path': [0]}, {'geom': 'POINT(3 4)', 'path': [1]}]

select unnest(st_dump('MULTIPOINT(1 2,3 4)'::geometry), recursive := true);
-- ┌─────────────┬─────────┐
-- │    geom     │  path   │
-- │  geometry   │ int32[] │
-- ├─────────────┼─────────┤
-- │ POINT (1 2) │ [1]     │
-- │ POINT (3 4) │ [2]     │
-- └─────────────┴─────────┘
```

----

##### ST_EndPoint {#docs:current:core_extensions:spatial:functions::st_endpoint}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
GEOMETRY ST_EndPoint (geom GEOMETRY)
POINT_2D ST_EndPoint (line LINESTRING_2D)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the end point of a LINESTRING.

----

##### ST_Envelope {#docs:current:core_extensions:spatial:functions::st_envelope}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_Envelope (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the minimum bounding rectangle of a geometry as a polygon geometry

----

##### ST_Equals {#docs:current:core_extensions:spatial:functions::st_equals}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
BOOLEAN ST_Equals (geom1 GEOMETRY, geom2 GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns true if the geometries are "equal"

----

##### ST_Extent {#docs:current:core_extensions:spatial:functions::st_extent}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
BOX_2D ST_Extent (geom GEOMETRY)
BOX_2D ST_Extent (wkb WKB_BLOB)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the minimal bounding box enclosing the input geometry

----

##### ST_Extent_Approx {#docs:current:core_extensions:spatial:functions::st_extent_approx}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
BOX_2DF ST_Extent_Approx (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the approximate bounding box of a geometry, if available.

This function is only really used internally, and returns the cached bounding box of the geometry if it exists.
This function may be removed or renamed in the future.

----

##### ST_ExteriorRing {#docs:current:core_extensions:spatial:functions::st_exteriorring}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
GEOMETRY ST_ExteriorRing (geom GEOMETRY)
LINESTRING_2D ST_ExteriorRing (polygon POLYGON_2D)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the exterior ring (shell) of a polygon geometry.

----

##### ST_FlipCoordinates {#docs:current:core_extensions:spatial:functions::st_flipcoordinates}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
GEOMETRY ST_FlipCoordinates (geom GEOMETRY)
POINT_2D ST_FlipCoordinates (point POINT_2D)
LINESTRING_2D ST_FlipCoordinates (linestring LINESTRING_2D)
POLYGON_2D ST_FlipCoordinates (polygon POLYGON_2D)
BOX_2D ST_FlipCoordinates (box BOX_2D)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns a new geometry with the coordinates of the input geometry "flipped" so that x = y and y = x

----

##### ST_Force2D {#docs:current:core_extensions:spatial:functions::st_force2d}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_Force2D (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Forces the vertices of a geometry to have X and Y components

This function will drop any Z and M values from the input geometry, if present. If the input geometry is already 2D, it will be returned as is.

----

##### ST_Force3DM {#docs:current:core_extensions:spatial:functions::st_force3dm}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_Force3DM (geom GEOMETRY, m DOUBLE)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Forces the vertices of a geometry to have X, Y and M components

The following cases apply:
* If the input geometry has a Z component but no M component, the Z component will be replaced with the new M value.
* If the input geometry has a M component but no Z component, it will be returned as is.
* If the input geometry has both a Z component and a M component, the Z component will be removed.
* Otherwise, if the input geometry has neither a Z or M component, the new M value will be added to the vertices of the input geometry.

----

##### ST_Force3DZ {#docs:current:core_extensions:spatial:functions::st_force3dz}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_Force3DZ (geom GEOMETRY, z DOUBLE)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Forces the vertices of a geometry to have X, Y and Z components

The following cases apply:
* If the input geometry has a M component but no Z component, the M component will be replaced with the new Z value.
* If the input geometry has a Z component but no M component, it will be returned as is.
* If the input geometry has both a Z component and a M component, the M component will be removed.
* Otherwise, if the input geometry has neither a Z or M component, the new Z value will be added to the vertices of the input geometry.

----

##### ST_Force4D {#docs:current:core_extensions:spatial:functions::st_force4d}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_Force4D (geom GEOMETRY, z DOUBLE, m DOUBLE)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Forces the vertices of a geometry to have X, Y, Z and M components

The following cases apply:
* If the input geometry has a Z component but no M component, the new M value will be added to the vertices of the input geometry.
* If the input geometry has a M component but no Z component, the new Z value will be added to the vertices of the input geometry.
* If the input geometry has both a Z component and a M component, the geometry will be returned as is.
* Otherwise, if the input geometry has neither a Z or M component, the new Z and M values will be added to the vertices of the input geometry.

----

##### ST_GeomFromGeoJSON {#docs:current:core_extensions:spatial:functions::st_geomfromgeojson}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
GEOMETRY ST_GeomFromGeoJSON (geojson JSON)
GEOMETRY ST_GeomFromGeoJSON (geojson VARCHAR)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Deserializes a GEOMETRY from a GeoJSON fragment.

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT ST_GeomFromGeoJSON('{"type":"Point","coordinates":[1.0,2.0]}');
----
POINT (1 2)
```

----

##### ST_GeomFromHEXEWKB {#docs:current:core_extensions:spatial:functions::st_geomfromhexewkb}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_GeomFromHEXEWKB (hexwkb VARCHAR)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Deserialize a GEOMETRY from a HEX(E)WKB encoded string

DuckDB spatial doesn't currently differentiate between `WKB` and `EWKB`, so `ST_GeomFromHEXWKB` and `ST_GeomFromHEXEWKB` are just aliases of each other.

----

##### ST_GeomFromHEXWKB {#docs:current:core_extensions:spatial:functions::st_geomfromhexwkb}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_GeomFromHEXWKB (hexwkb VARCHAR)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Deserialize a GEOMETRY from a HEX(E)WKB encoded string

DuckDB spatial doesn't currently differentiate between `WKB` and `EWKB`, so `ST_GeomFromHEXWKB` and `ST_GeomFromHEXEWKB` are just aliases of each other.

----

##### ST_GeomFromText {#docs:current:core_extensions:spatial:functions::st_geomfromtext}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
GEOMETRY ST_GeomFromText (wkt VARCHAR)
GEOMETRY ST_GeomFromText (wkt VARCHAR, ignore_invalid BOOLEAN)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Deserialize a GEOMETRY from a WKT encoded string

----

##### ST_GeomFromWKB {#docs:current:core_extensions:spatial:functions::st_geomfromwkb}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
GEOMETRY ST_GeomFromWKB (wkb WKB_BLOB)
GEOMETRY ST_GeomFromWKB (blob BLOB)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Deserializes a GEOMETRY from a WKB encoded blob

----

##### ST_GeometryType {#docs:current:core_extensions:spatial:functions::st_geometrytype}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
ANY ST_GeometryType (geom GEOMETRY)
ANY ST_GeometryType (point POINT_2D)
ANY ST_GeometryType (linestring LINESTRING_2D)
ANY ST_GeometryType (polygon POLYGON_2D)
ANY ST_GeometryType (wkb WKB_BLOB)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns a 'GEOMETRY_TYPE' enum identifying the input geometry type. Possible enum return types are: `POINT`, `LINESTRING`, `POLYGON`, `MULTIPOINT`, `MULTILINESTRING`, `MULTIPOLYGON` and `GEOMETRYCOLLECTION`.

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT DISTINCT ST_GeometryType(ST_GeomFromText('POINT(1 1)'));
----
POINT
```

----

##### ST_HasM {#docs:current:core_extensions:spatial:functions::st_hasm}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
BOOLEAN ST_HasM (geom GEOMETRY)
BOOLEAN ST_HasM (wkb WKB_BLOB)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Check if the input geometry has M values.

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
-- HasM for a 2D geometry
SELECT ST_HasM(ST_GeomFromText('POINT(1 1)'));
----
false

-- HasM for a 3DZ geometry
SELECT ST_HasM(ST_GeomFromText('POINT Z(1 1 1)'));
----
false

-- HasM for a 3DM geometry
SELECT ST_HasM(ST_GeomFromText('POINT M(1 1 1)'));
----
true

-- HasM for a 4D geometry
SELECT ST_HasM(ST_GeomFromText('POINT ZM(1 1 1 1)'));
----
true
```

----

##### ST_HasZ {#docs:current:core_extensions:spatial:functions::st_hasz}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
BOOLEAN ST_HasZ (geom GEOMETRY)
BOOLEAN ST_HasZ (wkb WKB_BLOB)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Check if the input geometry has Z values.

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
-- HasZ for a 2D geometry
SELECT ST_HasZ(ST_GeomFromText('POINT(1 1)'));
----
false

-- HasZ for a 3DZ geometry
SELECT ST_HasZ(ST_GeomFromText('POINT Z(1 1 1)'));
----
true

-- HasZ for a 3DM geometry
SELECT ST_HasZ(ST_GeomFromText('POINT M(1 1 1)'));
----
false

-- HasZ for a 4D geometry
SELECT ST_HasZ(ST_GeomFromText('POINT ZM(1 1 1 1)'));
----
true
```

----

##### ST_Hilbert {#docs:current:core_extensions:spatial:functions::st_hilbert}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
UINTEGER ST_Hilbert (x DOUBLE, y DOUBLE, bounds BOX_2D)
UINTEGER ST_Hilbert (geom GEOMETRY, bounds BOX_2D)
UINTEGER ST_Hilbert (geom GEOMETRY)
UINTEGER ST_Hilbert (box BOX_2D, bounds BOX_2D)
UINTEGER ST_Hilbert (box BOX_2DF, bounds BOX_2DF)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Encodes the X and Y values as the hilbert curve index for a curve covering the given bounding box.
If a geometry is provided, the center of the approximate bounding box is used as the point to encode.
If no bounding box is provided, the hilbert curve index is mapped to the full range of a single-presicion float.
For the BOX_2D and BOX_2DF variants, the center of the box is used as the point to encode.

----

##### ST_InterpolatePoint {#docs:current:core_extensions:spatial:functions::st_interpolatepoint}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
DOUBLE ST_InterpolatePoint (line GEOMETRY, point GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Computes the closest point on a LINESTRING to a given POINT and returns the interpolated M value of that point.

First argument must be a linestring and must have a M dimension. The second argument must be a point. 
Neither argument can be empty.

----

##### ST_Intersection {#docs:current:core_extensions:spatial:functions::st_intersection}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_Intersection (geom1 GEOMETRY, geom2 GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the intersection of two geometries

----

##### ST_Intersects {#docs:current:core_extensions:spatial:functions::st_intersects}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
BOOLEAN ST_Intersects (box1 BOX_2D, box2 BOX_2D)
BOOLEAN ST_Intersects (geom1 GEOMETRY, geom2 GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns true if the geometries intersect

----

##### ST_Intersects_Extent {#docs:current:core_extensions:spatial:functions::st_intersects_extent}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
BOOLEAN ST_Intersects_Extent (geom1 GEOMETRY, geom2 GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns true if the extent of two geometries intersects

----

##### ST_IsClosed {#docs:current:core_extensions:spatial:functions::st_isclosed}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
BOOLEAN ST_IsClosed (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Check if a geometry is 'closed'

----

##### ST_IsEmpty {#docs:current:core_extensions:spatial:functions::st_isempty}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
BOOLEAN ST_IsEmpty (geom GEOMETRY)
BOOLEAN ST_IsEmpty (linestring LINESTRING_2D)
BOOLEAN ST_IsEmpty (polygon POLYGON_2D)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns true if the geometry is "empty".

----

##### ST_IsRing {#docs:current:core_extensions:spatial:functions::st_isring}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
BOOLEAN ST_IsRing (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns true if the geometry is a ring (both ST_IsClosed and ST_IsSimple).

----

##### ST_IsSimple {#docs:current:core_extensions:spatial:functions::st_issimple}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
BOOLEAN ST_IsSimple (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns true if the geometry is simple

----

##### ST_IsValid {#docs:current:core_extensions:spatial:functions::st_isvalid}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
BOOLEAN ST_IsValid (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns true if the geometry is valid

----

##### ST_Length {#docs:current:core_extensions:spatial:functions::st_length}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
DOUBLE ST_Length (geom GEOMETRY)
DOUBLE ST_Length (linestring LINESTRING_2D)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the length of the input line geometry

----

##### ST_Length_Spheroid {#docs:current:core_extensions:spatial:functions::st_length_spheroid}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
DOUBLE ST_Length_Spheroid (geom GEOMETRY)
DOUBLE ST_Length_Spheroid (line LINESTRING_2D)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the length of the input geometry in meters, using an ellipsoidal model of the earth

The input geometry is assumed to be in the [EPSG:4326](https://en.wikipedia.org/wiki/World_Geodetic_System) coordinate system (WGS84), with [latitude, longitude] axis order and the length is returned in meters. This function uses the [GeographicLib](https://geographiclib.sourceforge.io/) library, calculating the length using an ellipsoidal model of the earth. This is a highly accurate method for calculating the length of a line geometry taking the curvature of the earth into account, but is also the slowest.

Returns `0.0` for any geometry that is not a `LINESTRING`, `MULTILINESTRING` or `GEOMETRYCOLLECTION` containing line geometries.

----

##### ST_LineInterpolatePoint {#docs:current:core_extensions:spatial:functions::st_lineinterpolatepoint}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_LineInterpolatePoint (line GEOMETRY, fraction DOUBLE)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns a point interpolated along a line at a fraction of total 2D length.

----

##### ST_LineInterpolatePoints {#docs:current:core_extensions:spatial:functions::st_lineinterpolatepoints}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_LineInterpolatePoints (line GEOMETRY, fraction DOUBLE, repeat BOOLEAN)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns a multi-point interpolated along a line at a fraction of total 2D length.

if repeat is false, the result is a single point, (and equivalent to ST_LineInterpolatePoint),
otherwise, the result is a multi-point with points repeated at the fraction interval.

----

##### ST_LineLocatePoint {#docs:current:core_extensions:spatial:functions::st_linelocatepoint}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
DOUBLE ST_LineLocatePoint (line GEOMETRY, point GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the location on a line closest to a point as a fraction of the total 2D length of the line.

----

##### ST_LineMerge {#docs:current:core_extensions:spatial:functions::st_linemerge}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
GEOMETRY ST_LineMerge (geom GEOMETRY)
GEOMETRY ST_LineMerge (geom GEOMETRY, preserve_direction BOOLEAN)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

"Merges" the input line geometry, optionally taking direction into account.

----

##### ST_LineString2DFromWKB {#docs:current:core_extensions:spatial:functions::st_linestring2dfromwkb}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_LineString2DFromWKB (linestring LINESTRING_2D)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Deserialize a LINESTRING_2D from a WKB encoded blob

----

##### ST_LineSubstring {#docs:current:core_extensions:spatial:functions::st_linesubstring}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_LineSubstring (line GEOMETRY, start_fraction DOUBLE, end_fraction DOUBLE)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns a substring of a line between two fractions of total 2D length.

----

##### ST_LocateAlong {#docs:current:core_extensions:spatial:functions::st_locatealong}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
GEOMETRY ST_LocateAlong (line GEOMETRY, measure DOUBLE, offset DOUBLE)
GEOMETRY ST_LocateAlong (line GEOMETRY, measure DOUBLE)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns a point or multi-point, containing the point(s) at the geometry with the given measure

For a LINESTRING, or MULTILINESTRING, the location is determined by interpolating between M values
For a POINT and MULTIPOINT, the point is returned if the measure matches the M value of the vertex, otherwise an empty geometry is returned
For a POLYGON, only the exterior ring is considered, and treated as a LINESTRING

If offset is provided, the resulting point(s) is offset by the given amount perpendicular to the line direction.

----

##### ST_LocateBetween {#docs:current:core_extensions:spatial:functions::st_locatebetween}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
GEOMETRY ST_LocateBetween (line GEOMETRY, start_measure DOUBLE, end_measure DOUBLE, offset DOUBLE)
GEOMETRY ST_LocateBetween (line GEOMETRY, start_measure DOUBLE, end_measure DOUBLE)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns a geometry or geometry collection created by filtering and interpolating vertices within a range of "M" values

Creates a geometry or geometry collection, containing the parts formed by vertices that have an "M" value within the "start_measure" and "end_measure" range

For LINESTRING or MULTILINESTRING, if a line segment would cross either the upper or lower bound, a vertex is added by interpolating the coordinates at the "intersection"
For a POINT and MULTIPOINT, the point is added to the collection if its vertex has an "M" value within the range, otherwise it is skipped
For a POLYGON, only the exterior ring is considered, and treated like a LINESTRING

If offset is provided, the resulting vertices are offset by the given amount perpendicular to the line direction.

----

##### ST_M {#docs:current:core_extensions:spatial:functions::st_m}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
DOUBLE ST_M (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the M coordinate of a point geometry

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT ST_M(ST_Point(1, 2, 3, 4))
```

----

##### ST_MMax {#docs:current:core_extensions:spatial:functions::st_mmax}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
DOUBLE ST_MMax (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the maximum M coordinate of a geometry

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT ST_MMax(ST_Point(1, 2, 3, 4))
```

----

##### ST_MMin {#docs:current:core_extensions:spatial:functions::st_mmin}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
DOUBLE ST_MMin (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the minimum M coordinate of a geometry

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT ST_MMin(ST_Point(1, 2, 3, 4))
```

----

##### ST_MakeBox2D {#docs:current:core_extensions:spatial:functions::st_makebox2d}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
BOX_2D ST_MakeBox2D (point1 GEOMETRY, point2 GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Create a BOX2D from two POINT geometries

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT ST_MakeBox2D(ST_Point(0, 0), ST_Point(1, 1));
----
BOX(0 0, 1 1)
```

----

##### ST_MakeEnvelope {#docs:current:core_extensions:spatial:functions::st_makeenvelope}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_MakeEnvelope (min_x DOUBLE, min_y DOUBLE, max_x DOUBLE, max_y DOUBLE)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Create a rectangular polygon from min/max coordinates

----

##### ST_MakeLine {#docs:current:core_extensions:spatial:functions::st_makeline}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
GEOMETRY ST_MakeLine (geoms GEOMETRY[])
GEOMETRY ST_MakeLine (start GEOMETRY, end GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Create a LINESTRING from a list of POINT geometries

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT ST_MakeLine([ST_Point(0, 0), ST_Point(1, 1)]);
----
LINESTRING(0 0, 1 1)
```

----

##### ST_MakePoint {#docs:current:core_extensions:spatial:functions::st_makepoint}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
POINT_2D ST_MakePoint (x DOUBLE, y DOUBLE)
POINT_3D ST_MakePoint (x DOUBLE, y DOUBLE, z DOUBLE)
POINT_4D ST_MakePoint (x DOUBLE, y DOUBLE, z DOUBLE, m DOUBLE)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Creates a GEOMETRY point from an pair of floating point numbers.

For geodetic coordinate systems, x is typically the longitude value and y is the latitude value.

Note that ST_Point is equivalent. ST_MakePoint is provided for PostGIS compatibility.

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT ST_AsText(ST_MakePoint(143.3, -24.2));
----
POINT (143.3 -24.2)
```

----

##### ST_MakePolygon {#docs:current:core_extensions:spatial:functions::st_makepolygon}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
GEOMETRY ST_MakePolygon (shell GEOMETRY)
GEOMETRY ST_MakePolygon (shell GEOMETRY, holes GEOMETRY[])
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Create a POLYGON from a LINESTRING shell

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT ST_MakePolygon(ST_LineString([ST_Point(0, 0), ST_Point(1, 0), ST_Point(1, 1), ST_Point(0, 0)]));
```

----

##### ST_MakeValid {#docs:current:core_extensions:spatial:functions::st_makevalid}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_MakeValid (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns a valid representation of the geometry

----

##### ST_MaximumInscribedCircle {#docs:current:core_extensions:spatial:functions::st_maximuminscribedcircle}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
STRUCT(center GEOMETRY, nearest GEOMETRY, radius DOUBLE) ST_MaximumInscribedCircle (geom GEOMETRY)
STRUCT(center GEOMETRY, nearest GEOMETRY, radius DOUBLE) ST_MaximumInscribedCircle (geom GEOMETRY, tolerance DOUBLE)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the maximum inscribed circle of the input geometry, optionally with a tolerance.

By default, the tolerance is computed as `max(width, height) / 1000`.
The return value is a struct with the center of the circle, the nearest point to the center on the boundary of the geometry and the radius of the circle.

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
-- Find the maximum inscribed circle of a square
SELECT ST_MaximumInscribedCircle(
    ST_GeomFromText('POLYGON((0 0, 10 0, 10 10, 0 10, 0 0))')
);
----
{'center': POINT (5 5), 'nearest': POINT (5 0), 'radius': 5.0}
```

----

##### ST_MinimumRotatedRectangle {#docs:current:core_extensions:spatial:functions::st_minimumrotatedrectangle}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_MinimumRotatedRectangle (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the minimum rotated rectangle that bounds the input geometry, finding the surrounding box that has the lowest area by using a rotated rectangle, rather than taking the lowest and highest coordinate values as per ST_Envelope().

----

##### ST_Multi {#docs:current:core_extensions:spatial:functions::st_multi}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_Multi (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Turns a single geometry into a multi geometry.

If the geometry is already a multi geometry, it is returned as is.

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT ST_Multi(ST_GeomFromText('POINT(1 2)'));
----
MULTIPOINT (1 2)

SELECT ST_Multi(ST_GeomFromText('LINESTRING(1 1, 2 2)'));
----
MULTILINESTRING ((1 1, 2 2))

SELECT ST_Multi(ST_GeomFromText('POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))'));
----
MULTIPOLYGON (((0 0, 0 1, 1 1, 1 0, 0 0)))
```

----

##### ST_NGeometries {#docs:current:core_extensions:spatial:functions::st_ngeometries}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
INTEGER ST_NGeometries (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the number of component geometries in a collection geometry.
If the input geometry is not a collection, this function returns 0 or 1 depending on if the geometry is empty or not.

----

##### ST_NInteriorRings {#docs:current:core_extensions:spatial:functions::st_ninteriorrings}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
INTEGER ST_NInteriorRings (geom GEOMETRY)
INTEGER ST_NInteriorRings (polygon POLYGON_2D)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the number of interior rings of a polygon

----

##### ST_NPoints {#docs:current:core_extensions:spatial:functions::st_npoints}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
UINTEGER ST_NPoints (geom GEOMETRY)
UBIGINT ST_NPoints (point POINT_2D)
UBIGINT ST_NPoints (linestring LINESTRING_2D)
UBIGINT ST_NPoints (polygon POLYGON_2D)
UBIGINT ST_NPoints (box BOX_2D)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the number of vertices within a geometry

----

##### ST_Node {#docs:current:core_extensions:spatial:functions::st_node}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_Node (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns a "noded" MultiLinestring, produced by combining a collection of input linestrings and adding additional vertices where they intersect.

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
-- Create a noded multilinestring from two intersecting lines
SELECT ST_Node(
    ST_GeomFromText('MULTILINESTRING((0 0, 2 2), (0 2, 2 0))')
);
----
MULTILINESTRING ((0 0, 1 1), (1 1, 2 2), (0 2, 1 1), (1 1, 2 0))
```

----

##### ST_Normalize {#docs:current:core_extensions:spatial:functions::st_normalize}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_Normalize (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the "normalized" representation of the geometry

----

##### ST_NumGeometries {#docs:current:core_extensions:spatial:functions::st_numgeometries}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
INTEGER ST_NumGeometries (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the number of component geometries in a collection geometry.
If the input geometry is not a collection, this function returns 0 or 1 depending on if the geometry is empty or not.

----

##### ST_NumInteriorRings {#docs:current:core_extensions:spatial:functions::st_numinteriorrings}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
INTEGER ST_NumInteriorRings (geom GEOMETRY)
INTEGER ST_NumInteriorRings (polygon POLYGON_2D)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the number of interior rings of a polygon

----

##### ST_NumPoints {#docs:current:core_extensions:spatial:functions::st_numpoints}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
UINTEGER ST_NumPoints (geom GEOMETRY)
UBIGINT ST_NumPoints (point POINT_2D)
UBIGINT ST_NumPoints (linestring LINESTRING_2D)
UBIGINT ST_NumPoints (polygon POLYGON_2D)
UBIGINT ST_NumPoints (box BOX_2D)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the number of vertices within a geometry

----

##### ST_Overlaps {#docs:current:core_extensions:spatial:functions::st_overlaps}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
BOOLEAN ST_Overlaps (geom1 GEOMETRY, geom2 GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns true if the geometries overlap

----

##### ST_Perimeter {#docs:current:core_extensions:spatial:functions::st_perimeter}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
DOUBLE ST_Perimeter (geom GEOMETRY)
DOUBLE ST_Perimeter (polygon POLYGON_2D)
DOUBLE ST_Perimeter (box BOX_2D)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the length of the perimeter of the geometry

----

##### ST_Perimeter_Spheroid {#docs:current:core_extensions:spatial:functions::st_perimeter_spheroid}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
DOUBLE ST_Perimeter_Spheroid (geom GEOMETRY)
DOUBLE ST_Perimeter_Spheroid (poly POLYGON_2D)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the length of the perimeter in meters using an ellipsoidal model of the earths surface

The input geometry is assumed to be in the [EPSG:4326](https://en.wikipedia.org/wiki/World_Geodetic_System) coordinate system (WGS84), with [latitude, longitude] axis order and the length is returned in meters. This function uses the [GeographicLib](https://geographiclib.sourceforge.io/) library, calculating the perimeter using an ellipsoidal model of the earth. This is a highly accurate method for calculating the perimeter of a polygon taking the curvature of the earth into account, but is also the slowest.

Returns `0.0` for any geometry that is not a `POLYGON`, `MULTIPOLYGON` or `GEOMETRYCOLLECTION` containing polygon geometries.

----

##### ST_Point {#docs:current:core_extensions:spatial:functions::st_point}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_Point (x DOUBLE, y DOUBLE)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Creates a GEOMETRY point

----

##### ST_Point2D {#docs:current:core_extensions:spatial:functions::st_point2d}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
POINT_2D ST_Point2D (x DOUBLE, y DOUBLE)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Creates a POINT_2D

----

##### ST_Point2DFromWKB {#docs:current:core_extensions:spatial:functions::st_point2dfromwkb}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_Point2DFromWKB (point POINT_2D)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Deserialize a POINT_2D from a WKB encoded blob

----

##### ST_Point3D {#docs:current:core_extensions:spatial:functions::st_point3d}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
POINT_3D ST_Point3D (x DOUBLE, y DOUBLE, z DOUBLE)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Creates a POINT_3D

----

##### ST_Point4D {#docs:current:core_extensions:spatial:functions::st_point4d}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
POINT_4D ST_Point4D (x DOUBLE, y DOUBLE, z DOUBLE, m DOUBLE)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Creates a POINT_4D

----

##### ST_PointN {#docs:current:core_extensions:spatial:functions::st_pointn}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
GEOMETRY ST_PointN (geom GEOMETRY, index INTEGER)
POINT_2D ST_PointN (linestring LINESTRING_2D, index INTEGER)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the n'th vertex from the input geometry as a point geometry

----

##### ST_PointOnSurface {#docs:current:core_extensions:spatial:functions::st_pointonsurface}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_PointOnSurface (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns a point guaranteed to lie on the surface of the geometry

----

##### ST_Points {#docs:current:core_extensions:spatial:functions::st_points}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_Points (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Collects all the vertices in the geometry into a MULTIPOINT

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
select st_points('LINESTRING(1 1, 2 2)'::geometry);
----
MULTIPOINT (1 1, 2 2)

select st_points('MULTIPOLYGON Z EMPTY'::geometry);
----
MULTIPOINT Z EMPTY
```

----

##### ST_Polygon2DFromWKB {#docs:current:core_extensions:spatial:functions::st_polygon2dfromwkb}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_Polygon2DFromWKB (polygon POLYGON_2D)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Deserialize a POLYGON_2D from a WKB encoded blob

----

##### ST_Polygonize {#docs:current:core_extensions:spatial:functions::st_polygonize}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_Polygonize (geometries GEOMETRY[])
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns a polygonized representation of the input geometries

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
-- Create a polygon from a closed linestring ring
SELECT ST_Polygonize([
    ST_GeomFromText('LINESTRING(0 0, 0 10, 10 10, 10 0, 0 0)')
]);

GEOMETRYCOLLECTION (POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0)))
```

----

##### ST_QuadKey {#docs:current:core_extensions:spatial:functions::st_quadkey}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
VARCHAR ST_QuadKey (longitude DOUBLE, latitude DOUBLE, level INTEGER)
VARCHAR ST_QuadKey (point GEOMETRY, level INTEGER)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Compute the [quadkey](https://learn.microsoft.com/en-us/bingmaps/articles/bing-maps-tile-system) for a given lon/lat point at a given level.
Note that the parameter order is __longitude__, __latitude__.

`level` has to be between 1 and 23, inclusive.

The input coordinates will be clamped to the lon/lat bounds of the earth (longitude between -180 and 180, latitude between -85.05112878 and 85.05112878).

The geometry overload throws an error if the input geometry is not a `POINT`

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT ST_QuadKey(st_point(11.08, 49.45), 10);
----
1333203202
```

----

##### ST_ReducePrecision {#docs:current:core_extensions:spatial:functions::st_reduceprecision}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_ReducePrecision (geom GEOMETRY, precision DOUBLE)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the geometry with all vertices reduced to the given precision

----

##### ST_RemoveRepeatedPoints {#docs:current:core_extensions:spatial:functions::st_removerepeatedpoints}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
LINESTRING_2D ST_RemoveRepeatedPoints (line LINESTRING_2D)
LINESTRING_2D ST_RemoveRepeatedPoints (line LINESTRING_2D, tolerance DOUBLE)
GEOMETRY ST_RemoveRepeatedPoints (geom GEOMETRY)
GEOMETRY ST_RemoveRepeatedPoints (geom GEOMETRY, tolerance DOUBLE)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Remove repeated points from a LINESTRING.

----

##### ST_Reverse {#docs:current:core_extensions:spatial:functions::st_reverse}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_Reverse (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the geometry with the order of its vertices reversed

----

##### ST_ShortestLine {#docs:current:core_extensions:spatial:functions::st_shortestline}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_ShortestLine (geom1 GEOMETRY, geom2 GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the shortest line between two geometries

----

##### ST_Simplify {#docs:current:core_extensions:spatial:functions::st_simplify}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_Simplify (geom GEOMETRY, tolerance DOUBLE)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns a simplified version of the geometry

----

##### ST_SimplifyPreserveTopology {#docs:current:core_extensions:spatial:functions::st_simplifypreservetopology}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_SimplifyPreserveTopology (geom GEOMETRY, tolerance DOUBLE)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns a simplified version of the geometry that preserves topology

----

##### ST_StartPoint {#docs:current:core_extensions:spatial:functions::st_startpoint}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
GEOMETRY ST_StartPoint (geom GEOMETRY)
POINT_2D ST_StartPoint (line LINESTRING_2D)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the start point of a LINESTRING.

----

##### ST_TileEnvelope {#docs:current:core_extensions:spatial:functions::st_tileenvelope}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_TileEnvelope (tile_zoom INTEGER, tile_x INTEGER, tile_y INTEGER)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

The `ST_TileEnvelope` scalar function generates tile envelope rectangular polygons from specified zoom level and tile indices.

This is used in MVT generation to select the features corresponding to the tile extent. The envelope is in the Web Mercator
coordinate reference system (EPSG:3857). The tile pyramid starts at zoom level 0, corresponding to a single tile for the
world. Each zoom level doubles the number of tiles in each direction, such that zoom level 1 is 2 tiles wide by 2 tiles high,
zoom level 2 is 4 tiles wide by 4 tiles high, and so on. Tile indices start at `[x=0, y=0]` at the top left, and increase
down and right. For example, at zoom level 2, the top right tile is `[x=3, y=0]`, the bottom left tile is `[x=0, y=3]`, and
the bottom right is `[x=3, y=3]`.

```sql
SELECT ST_TileEnvelope(2, 3, 1);
```

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT ST_TileEnvelope(2, 3, 1);
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                         st_tileenvelope(2, 3, 1)                                          │
│                                                 geometry                                                  │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ POLYGON ((1.00188E+07 0, 1.00188E+07 1.00188E+07, 2.00375E+07 1.00188E+07, 2.00375E+07 0, 1.00188E+07 0)) │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────┘
```

----

##### ST_Touches {#docs:current:core_extensions:spatial:functions::st_touches}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
BOOLEAN ST_Touches (geom1 GEOMETRY, geom2 GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns true if the geometries touch

----

##### ST_Transform {#docs:current:core_extensions:spatial:functions::st_transform}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
BOX_2D ST_Transform (box BOX_2D, source_crs VARCHAR, target_crs VARCHAR)
BOX_2D ST_Transform (box BOX_2D, source_crs VARCHAR, target_crs VARCHAR, always_xy BOOLEAN)
POINT_2D ST_Transform (point POINT_2D, source_crs VARCHAR, target_crs VARCHAR)
POINT_2D ST_Transform (point POINT_2D, source_crs VARCHAR, target_crs VARCHAR, always_xy BOOLEAN)
GEOMETRY ST_Transform (geom GEOMETRY, source_crs VARCHAR, target_crs VARCHAR)
GEOMETRY ST_Transform (geom GEOMETRY, source_crs VARCHAR, target_crs VARCHAR, always_xy BOOLEAN)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Transforms a geometry between two coordinate systems

The source and target coordinate systems can be specified using any format that the [PROJ library](https://proj.org) supports.

The third optional `always_xy` parameter can be used to force the input and output geometries to be interpreted as having a [easting, northing] coordinate axis order regardless of what the source and target coordinate system definition says. This is particularly useful when transforming to/from the [WGS84/EPSG:4326](https://en.wikipedia.org/wiki/World_Geodetic_System) coordinate system (what most people think of when they hear "longitude"/"latitude" or "GPS coordinates"), which is defined as having a [latitude, longitude] axis order even though [longitude, latitude] is commonly used in practice (e.g. in [GeoJSON](https://tools.ietf.org/html/rfc7946)). More details available in the [PROJ documentation](https://proj.org/en/9.3/faq.html#why-is-the-axis-ordering-in-proj-not-consistent).

DuckDB spatial vendors its own static copy of the PROJ database of coordinate systems, so if you have your own installation of PROJ on your system the available coordinate systems may differ to what's available in other GIS software.

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
-- Transform a geometry from EPSG:4326 to EPSG:3857 (WGS84 to WebMercator)
-- Note that since WGS84 is defined as having a [latitude, longitude] axis order
-- we follow the standard and provide the input geometry using that axis order,
-- but the output will be [easting, northing] because that is what's defined by
-- WebMercator.

SELECT
    ST_Transform(
        st_point(52.373123, 4.892360),
        'EPSG:4326',
        'EPSG:3857'
    );
----
POINT (544615.0239773799 6867874.103539125)

-- Alternatively, let's say we got our input point from e.g. a GeoJSON file,
-- which uses WGS84 but with [longitude, latitude] axis order. We can use the
-- `always_xy` parameter to force the input geometry to be interpreted as having
-- a [northing, easting] axis order instead, even though the source coordinate
-- reference system definition (WGS84) says otherwise.

SELECT 
    ST_Transform(
        -- note the axis order is reversed here
        st_point(4.892360, 52.373123),
        'EPSG:4326',
        'EPSG:3857',
        always_xy := true
    );
----
POINT (544615.0239773799 6867874.103539125)

-- Transform a geometry from OSG36 British National Grid EPSG:27700 to EPSG:4326 WGS84
-- Standard transform is often fine for the first few decimal places before being wrong
-- which could result in an error starting at about 10m and possibly much more
SELECT ST_Transform(bng, 'EPSG:27700', 'EPSG:4326', xy := true) AS without_grid_file
FROM (SELECT ST_GeomFromText('POINT( 170370.718 11572.405 )') AS bng);
----
POINT (-5.202992651563592 49.96007490162923)

-- By using an official NTv2 grid file, we can reduce the error down around the 9th decimal place
-- which in theory is below a millimetre, and in practice unlikely that your coordinates are that precise
-- British National Grid "NTv2 format files" download available here:
-- https://www.ordnancesurvey.co.uk/products/os-net/for-developers
SELECT ST_Transform(bng
    , '+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +units=m +no_defs +nadgrids=/full/path/to/OSTN15-NTv2/OSTN15_NTv2_OSGBtoETRS.gsb +type=crs'
    , 'EPSG:4326', xy := true) AS with_grid_file
FROM (SELECT ST_GeomFromText('POINT( 170370.718 11572.405 )') AS bng) t;
----
POINT (-5.203046090608746 49.96006137018598)
```

----

##### ST_Union {#docs:current:core_extensions:spatial:functions::st_union}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_Union (geom1 GEOMETRY, geom2 GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the union of two geometries

----

##### ST_VoronoiDiagram {#docs:current:core_extensions:spatial:functions::st_voronoidiagram}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_VoronoiDiagram (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the Voronoi diagram of the supplied MultiPoint geometry

----

##### ST_Within {#docs:current:core_extensions:spatial:functions::st_within}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
BOOLEAN ST_Within (geom1 POINT_2D, geom2 POLYGON_2D)
BOOLEAN ST_Within (geom1 GEOMETRY, geom2 GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns true if the first geometry is within the second

----

##### ST_WithinProperly {#docs:current:core_extensions:spatial:functions::st_withinproperly}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
BOOLEAN ST_WithinProperly (geom1 GEOMETRY, geom2 GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns true if the first geometry \"properly\" is contained by the second geometry

This function functions the same as `ST_ContainsProperly`, but the arguments are swapped.

----

##### ST_X {#docs:current:core_extensions:spatial:functions::st_x}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
DOUBLE ST_X (geom GEOMETRY)
DOUBLE ST_X (point POINT_2D)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the X coordinate of a point geometry

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT ST_X(ST_Point(1, 2))
```

----

##### ST_XMax {#docs:current:core_extensions:spatial:functions::st_xmax}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
DOUBLE ST_XMax (geom GEOMETRY)
DOUBLE ST_XMax (point POINT_2D)
DOUBLE ST_XMax (line LINESTRING_2D)
DOUBLE ST_XMax (polygon POLYGON_2D)
DOUBLE ST_XMax (box BOX_2D)
FLOAT ST_XMax (box BOX_2DF)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the maximum X coordinate of a geometry

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT ST_XMax(ST_Point(1, 2))
```

----

##### ST_XMin {#docs:current:core_extensions:spatial:functions::st_xmin}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
DOUBLE ST_XMin (geom GEOMETRY)
DOUBLE ST_XMin (point POINT_2D)
DOUBLE ST_XMin (line LINESTRING_2D)
DOUBLE ST_XMin (polygon POLYGON_2D)
DOUBLE ST_XMin (box BOX_2D)
FLOAT ST_XMin (box BOX_2DF)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the minimum X coordinate of a geometry

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT ST_XMin(ST_Point(1, 2))
```

----

##### ST_Y {#docs:current:core_extensions:spatial:functions::st_y}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
DOUBLE ST_Y (geom GEOMETRY)
DOUBLE ST_Y (point POINT_2D)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the Y coordinate of a point geometry

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT ST_Y(ST_Point(1, 2))
```

----

##### ST_YMax {#docs:current:core_extensions:spatial:functions::st_ymax}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
DOUBLE ST_YMax (geom GEOMETRY)
DOUBLE ST_YMax (point POINT_2D)
DOUBLE ST_YMax (line LINESTRING_2D)
DOUBLE ST_YMax (polygon POLYGON_2D)
DOUBLE ST_YMax (box BOX_2D)
FLOAT ST_YMax (box BOX_2DF)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the maximum Y coordinate of a geometry

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT ST_YMax(ST_Point(1, 2))
```

----

##### ST_YMin {#docs:current:core_extensions:spatial:functions::st_ymin}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
DOUBLE ST_YMin (geom GEOMETRY)
DOUBLE ST_YMin (point POINT_2D)
DOUBLE ST_YMin (line LINESTRING_2D)
DOUBLE ST_YMin (polygon POLYGON_2D)
DOUBLE ST_YMin (box BOX_2D)
FLOAT ST_YMin (box BOX_2DF)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the minimum Y coordinate of a geometry

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT ST_YMin(ST_Point(1, 2))
```

----

##### ST_Z {#docs:current:core_extensions:spatial:functions::st_z}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
DOUBLE ST_Z (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the Z coordinate of a point geometry

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT ST_Z(ST_Point(1, 2, 3))
```

----

##### ST_ZMFlag {#docs:current:core_extensions:spatial:functions::st_zmflag}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
UTINYINT ST_ZMFlag (geom GEOMETRY)
UTINYINT ST_ZMFlag (wkb WKB_BLOB)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns a flag indicating the presence of Z and M values in the input geometry.
0 = No Z or M values
1 = M values only
2 = Z values only
3 = Z and M values

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
-- ZMFlag for a 2D geometry
SELECT ST_ZMFlag(ST_GeomFromText('POINT(1 1)'));
----
0

-- ZMFlag for a 3DZ geometry
SELECT ST_ZMFlag(ST_GeomFromText('POINT Z(1 1 1)'));
----
2

-- ZMFlag for a 3DM geometry
SELECT ST_ZMFlag(ST_GeomFromText('POINT M(1 1 1)'));
----
1

-- ZMFlag for a 4D geometry
SELECT ST_ZMFlag(ST_GeomFromText('POINT ZM(1 1 1 1)'));
----
3
```

----

##### ST_ZMax {#docs:current:core_extensions:spatial:functions::st_zmax}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
DOUBLE ST_ZMax (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the maximum Z coordinate of a geometry

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT ST_ZMax(ST_Point(1, 2, 3))
```

----

##### ST_ZMin {#docs:current:core_extensions:spatial:functions::st_zmin}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
DOUBLE ST_ZMin (geom GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the minimum Z coordinate of a geometry

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT ST_ZMin(ST_Point(1, 2, 3))
```

----

#### Aggregate Functions {#docs:current:core_extensions:spatial:functions::aggregate-functions}

##### ST_AsMVT {#docs:current:core_extensions:spatial:functions::st_asmvt}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
BLOB ST_AsMVT (col0 ANY)
BLOB ST_AsMVT (col0 ANY, col1 VARCHAR)
BLOB ST_AsMVT (col0 ANY, col1 VARCHAR, col2 INTEGER)
BLOB ST_AsMVT (col0 ANY, col1 VARCHAR, col2 INTEGER, col3 VARCHAR)
BLOB ST_AsMVT (col0 ANY, col1 VARCHAR, col2 INTEGER, col3 VARCHAR, col4 VARCHAR)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Make a Mapbox Vector Tile from a set of geometries and properties
The function takes as input a row type (STRUCT) containing a geometry column and any number of property columns.
It returns a single binary BLOB containing the Mapbox Vector Tile.

The function has the following signature:

`ST_AsMVT(row STRUCT, layer_name VARCHAR DEFAULT 'layer', extent INTEGER DEFAULT 4096, geom_column_name VARCHAR DEFAULT NULL, feature_id_column_name VARCHAR DEFAULT NULL) -> BLOB`

- The first argument is a struct containing the geometry and properties.
- The second argument is the name of the layer in the vector tile. This argument is optional and defaults to 'layer'.
- The third argument is the extent of the tile. This argument is optional and defaults to 4096.
- The fourth argument is the name of the geometry column in the input row. This argument is optional. If not provided, the first geometry column in the input row will be used. If multiple geometry columns are present, an error will be raised.
- The fifth argument is the name of the feature id column in the input row. This argument is optional. If provided, the values in this column will be used as feature ids in the vector tile. The column must be of type INTEGER or BIGINT. If set to negative or NULL, a feature id will not be assigned to the corresponding feature.

The input struct must contain exactly one geometry column of type GEOMETRY. It can contain any number of property columns of types VARCHAR, FLOAT, DOUBLE, INTEGER, BIGINT, or BOOLEAN.

Example:
```sql
SELECT ST_AsMVT({'geom': geom, 'id': id, 'name': name}, 'cities', 4096, 'geom', 'id') AS tile
FROM cities;
 ```

This example creates a vector tile named 'cities' with an extent of 4096 from the 'cities' table, using 'geom' as the geometry column and 'id' as the feature id column.

However, you probably want to use the ST_AsMVTGeom function to first transform and clip your geometries to the tile extent.
The following example assumes the geometry is in WebMercator ("EPSG:3857") coordinates.
Replace `⟨z⟩`{:.language-sql .highlight}, `⟨x⟩`{:.language-sql .highlight}, and `⟨y⟩`{:.language-sql .highlight} with the appropriate tile coordinates, `⟨your_table⟩`{:.language-sql .highlight} with your table name, and `⟨tile_path⟩` with the path to write the tile to.

```sql
COPY (
    SELECT ST_AsMVT({
        "geometry": ST_AsMVTGeom(
            geometry,
            ST_Extent(ST_TileEnvelope(⟨z⟩, ⟨x⟩, ⟨y⟩)),
            4096,
            256,
            false
        )
    })
    FROM ⟨your_table⟩ WHERE ST_Intersects(geometry, ST_TileEnvelope(⟨z⟩, ⟨x⟩, ⟨y⟩))
) to ⟨tile_path⟩ (FORMAT 'BLOB');
```

----

##### ST_CoverageInvalidEdges_Agg {#docs:current:core_extensions:spatial:functions::st_coverageinvalidedges_agg}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
GEOMETRY ST_CoverageInvalidEdges_Agg (col0 GEOMETRY)
GEOMETRY ST_CoverageInvalidEdges_Agg (col0 GEOMETRY, col1 DOUBLE)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the invalid edges of a coverage geometry

----

##### ST_CoverageSimplify_Agg {#docs:current:core_extensions:spatial:functions::st_coveragesimplify_agg}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
GEOMETRY ST_CoverageSimplify_Agg (col0 GEOMETRY, col1 DOUBLE)
GEOMETRY ST_CoverageSimplify_Agg (col0 GEOMETRY, col1 DOUBLE, col2 BOOLEAN)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Simplifies a set of geometries while maintaining coverage

----

##### ST_CoverageUnion_Agg {#docs:current:core_extensions:spatial:functions::st_coverageunion_agg}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_CoverageUnion_Agg (col0 GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Unions a set of geometries while maintaining coverage

----

##### ST_Envelope_Agg {#docs:current:core_extensions:spatial:functions::st_envelope_agg}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_Envelope_Agg (col0 GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Alias for [ST_Extent_Agg](#::st_extent_agg).

Computes the minimal-bounding-box polygon containing the set of input geometries.

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT ST_Extent_Agg(geom) FROM UNNEST([ST_Point(1,1), ST_Point(5,5)]) AS _(geom);
-- POLYGON ((1 1, 1 5, 5 5, 5 1, 1 1))
```

----

##### ST_Extent_Agg {#docs:current:core_extensions:spatial:functions::st_extent_agg}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_Extent_Agg (col0 GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Computes the minimal-bounding-box polygon containing the set of input geometries

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT ST_Extent_Agg(geom) FROM UNNEST([ST_Point(1,1), ST_Point(5,5)]) AS _(geom);
-- POLYGON ((1 1, 1 5, 5 5, 5 1, 1 1))
```

----

##### ST_Intersection_Agg {#docs:current:core_extensions:spatial:functions::st_intersection_agg}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_Intersection_Agg (col0 GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Computes the intersection of a set of geometries

----

##### ST_MemUnion_Agg {#docs:current:core_extensions:spatial:functions::st_memunion_agg}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_MemUnion_Agg (col0 GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Computes the union of a set of input geometries.
                "Slower, but might be more memory efficient than ST_UnionAgg as each geometry is merged into the union individually rather than all at once.

----

##### ST_Union_Agg {#docs:current:core_extensions:spatial:functions::st_union_agg}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_Union_Agg (col0 GEOMETRY)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Computes the union of a set of input geometries

----

#### Macro Functions {#docs:current:core_extensions:spatial:functions::macro-functions}

##### ST_Rotate {#docs:current:core_extensions:spatial:functions::st_rotate}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_Rotate (geom GEOMETRY, radians double)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Alias of ST_RotateZ

----

##### ST_RotateX {#docs:current:core_extensions:spatial:functions::st_rotatex}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_RotateX (geom GEOMETRY, radians double)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Rotates a geometry around the X axis. This is a shorthand macro for calling ST_Affine.

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
-- Rotate a 3D point 90 degrees (π/2 radians) around the X-axis
SELECT ST_RotateX(ST_GeomFromText('POINT Z(0 1 0)'), pi()/2);
----
POINT Z (0 0 1)
```

----

##### ST_RotateY {#docs:current:core_extensions:spatial:functions::st_rotatey}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_RotateY (geom GEOMETRY, radians double)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Rotates a geometry around the Y axis. This is a shorthand macro for calling ST_Affine.

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
-- Rotate a 3D point 90 degrees (π/2 radians) around the Y-axis
SELECT ST_RotateY(ST_GeomFromText('POINT Z(1 0 0)'), pi()/2);
----
POINT Z (0 0 -1)
```

----

##### ST_RotateZ {#docs:current:core_extensions:spatial:functions::st_rotatez}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_RotateZ (geom GEOMETRY, radians double)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Rotates a geometry around the Z axis. This is a shorthand macro for calling ST_Affine.

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
-- Rotate a point 90 degrees (π/2 radians) around the Z-axis
SELECT ST_RotateZ(ST_Point(1, 0), pi()/2);
----
POINT (0 1)
```

----

##### ST_Scale {#docs:current:core_extensions:spatial:functions::st_scale}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
GEOMETRY ST_Scale (geom GEOMETRY, xs double, ys double, zs double)
GEOMETRY ST_Scale (geom GEOMETRY, xs double, ys double)
```

----

##### ST_TransScale {#docs:current:core_extensions:spatial:functions::st_transscale}


###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
GEOMETRY ST_TransScale (geom GEOMETRY, dx double, dy double, xs double, ys double)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Translates and then scales a geometry in X and Y direction. This is a shorthand macro for calling ST_Affine.

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
-- Translate by (1, 2) then scale by (2, 3)
SELECT ST_TransScale(ST_Point(1, 1), 1, 2, 2, 3);
----
POINT (4 9)
```

----

##### ST_Translate {#docs:current:core_extensions:spatial:functions::st_translate}


###### Signatures {#docs:current:core_extensions:spatial:functions::signatures}

```sql
GEOMETRY ST_Translate (geom GEOMETRY, dx double, dy double, dz double)
GEOMETRY ST_Translate (geom GEOMETRY, dx double, dy double)
```

----

#### Table Functions {#docs:current:core_extensions:spatial:functions::table-functions}

##### ST_Drivers {#docs:current:core_extensions:spatial:functions::st_drivers}

###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
ST_Drivers ()
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Returns the list of supported GDAL drivers and file formats

Note that far from all of these drivers have been tested properly.
Some may require additional options to be passed to work as expected.
If you run into any issues please first consult the [consult the GDAL docs](https://gdal.org/drivers/vector/index.html).

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT * FROM ST_Drivers();
```

----

##### ST_GeneratePoints {#docs:current:core_extensions:spatial:functions::st_generatepoints}

###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
ST_GeneratePoints (col0 BOX_2D, col1 BIGINT)
ST_GeneratePoints (col0 BOX_2D, col1 BIGINT, col2 BIGINT)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Generates a set of random points within the specified bounding box.

Takes a bounding box (min_x, min_y, max_x, max_y), a count of points to generate and optionally a seed for the random number generator.

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT * FROM ST_GeneratePoints({min_x: 0, min_y:0, max_x:10, max_y:10}::BOX_2D, 5, 42);
```

----

##### ST_Read {#docs:current:core_extensions:spatial:functions::st_read}

###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
ST_Read (col0 VARCHAR, keep_wkb BOOLEAN, max_batch_size INTEGER, sequential_layer_scan BOOLEAN, layer VARCHAR, sibling_files VARCHAR[], spatial_filter WKB_BLOB, spatial_filter_box BOX_2D, allowed_drivers VARCHAR[], open_options VARCHAR[])
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Read and import a variety of geospatial file formats using the GDAL library.

The `ST_Read` table function is based on the [GDAL](https://gdal.org/index.html) translator library and enables reading spatial data from a variety of geospatial vector file formats as if they were DuckDB tables.

> See [ST_Drivers](#::st_drivers) for a list of supported file formats and drivers.

Except for the `path` parameter, all parameters are optional.

| Parameter | Type | Description |
| --------- | -----| ----------- |
| `path` | VARCHAR | The path to the file to read. Mandatory |
| `sequential_layer_scan` | BOOLEAN | If set to true, the table function will scan through all layers sequentially and return the first layer that matches the given layer name. This is required for some drivers to work properly, e.g., the OSM driver. |
| `spatial_filter` | WKB_BLOB | If set to a WKB blob, the table function will only return rows that intersect with the given WKB geometry. Some drivers may support efficient spatial filtering natively, in which case it will be pushed down. Otherwise the filtering is done by GDAL which may be much slower. |
| `open_options` | VARCHAR[] | A list of key-value pairs that are passed to the GDAL driver to control the opening of the file. E.g., the GeoJSON driver supports a FLATTEN_NESTED_ATTRIBUTES=YES option to flatten nested attributes. |
| `layer` | VARCHAR | The name of the layer to read from the file. If NULL, the first layer is returned. Can also be a layer index (starting at 0). |
| `allowed_drivers` | VARCHAR[] | A list of GDAL driver names that are allowed to be used to open the file. If empty, all drivers are allowed. |
| `sibling_files` | VARCHAR[] | A list of sibling files that are required to open the file. E.g., the ESRI Shapefile driver requires a .shx file to be present. Although most of the time these can be discovered automatically. |
| `spatial_filter_box` | BOX_2D | If set to a BOX_2D, the table function will only return rows that intersect with the given bounding box. Similar to spatial_filter. |
| `keep_wkb` | BOOLEAN | If set, the table function will return geometries in a wkb_geometry column with the type WKB_BLOB (which can be cast to BLOB) instead of GEOMETRY. This is useful if you want to use DuckDB with more exotic geometry subtypes that DuckDB spatial doesn't support representing in the GEOMETRY type yet. |

Note that GDAL is single-threaded, so this table function will not be able to make full use of parallelism.

By using `ST_Read`, the spatial extension also provides “replacement scans” for common geospatial file formats, allowing you to query files of these formats as if they were tables directly.

```sql
SELECT * FROM './path/to/some/shapefile/dataset.shp';
```

In practice this is just syntax-sugar for calling ST_Read, so there is no difference in performance. If you want to pass additional options, you should use the ST_Read table function directly.

The following formats are currently recognized by their file extension:

| Format | Extension |
| ------ | --------- |
| ESRI ShapeFile | .shp |
| GeoPackage | .gpkg |
| FlatGeoBuf | .fgb |

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
-- Read a Shapefile
SELECT * FROM ST_Read('some/file/path/filename.shp');

-- Read a GeoJSON file
CREATE TABLE my_geojson_table AS SELECT * FROM ST_Read('some/file/path/filename.json');
```

----

##### ST_ReadOSM {#docs:current:core_extensions:spatial:functions::st_readosm}

###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
ST_ReadOSM (col0 VARCHAR)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

The `ST_ReadOsm()` table function enables reading compressed OpenStreetMap data directly from a `.osm.pbf file.`

This function uses multithreading and zero-copy protobuf parsing which makes it a lot faster than using the `ST_Read()` OSM driver, however it only outputs the raw OSM data (Nodes, Ways, Relations), without constructing any geometries. For simple node entities (like PoI's) you can trivially construct POINT geometries, but it is also possible to construct LINESTRING and POLYGON geometries by manually joining refs and nodes together in SQL, although with available memory usually being a limiting factor.
The `ST_ReadOSM()` function also provides a "replacement scan" to enable reading from a file directly as if it were a table. This is just syntax sugar for calling `ST_ReadOSM()` though. Example:

```sql
SELECT * FROM 'tmp/data/germany.osm.pbf' LIMIT 5;
```

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
SELECT *
FROM ST_ReadOSM('tmp/data/germany.osm.pbf')
WHERE tags['highway'] != []
LIMIT 5;
----
┌──────────────────────┬────────┬──────────────────────┬─────────┬────────────────────┬────────────┬───────────┬────────────────────────┐
│         kind         │   id   │         tags         │  refs   │        lat         │    lon     │ ref_roles │       ref_types        │
│ enum('node', 'way'…  │ int64  │ map(varchar, varch…  │ int64[] │       double       │   double   │ varchar[] │ enum('node', 'way', …  │
├──────────────────────┼────────┼──────────────────────┼─────────┼────────────────────┼────────────┼───────────┼────────────────────────┤
│ node                 │ 122351 │ {bicycle=yes, butt…  │         │         53.5492951 │   9.977553 │           │                        │
│ node                 │ 122397 │ {crossing=no, high…  │         │ 53.520990100000006 │ 10.0156924 │           │                        │
│ node                 │ 122493 │ {TMC:cid_58:tabcd_…  │         │ 53.129614600000004 │  8.1970173 │           │                        │
│ node                 │ 123566 │ {highway=traffic_s…  │         │ 54.617268200000005 │  8.9718171 │           │                        │
│ node                 │ 125801 │ {TMC:cid_58:tabcd_…  │         │ 53.070685000000005 │  8.7819939 │           │                        │
└──────────────────────┴────────┴──────────────────────┴─────────┴────────────────────┴────────────┴───────────┴────────────────────────┘
```

----

##### ST_ReadSHP {#docs:current:core_extensions:spatial:functions::st_readshp}

###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
ST_ReadSHP (col0 VARCHAR, encoding VARCHAR)
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Read a Shapefile without relying on the GDAL library

----

##### ST_Read_Meta {#docs:current:core_extensions:spatial:functions::st_read_meta}

###### Signature {#docs:current:core_extensions:spatial:functions::signature}

```sql
ST_Read_Meta (col0 VARCHAR)
ST_Read_Meta (col0 VARCHAR[])
```

###### Description {#docs:current:core_extensions:spatial:functions::description}

Read the metadata from a variety of geospatial file formats using the GDAL library.

The `ST_Read_Meta` table function accompanies the `ST_Read` table function, but instead of reading the contents of a file, this function scans the metadata instead.
Since the data model of the underlying GDAL library is quite flexible, most of the interesting metadata is within the returned `layers` column, which is a somewhat complex nested structure of DuckDB `STRUCT` and `LIST` types.

###### Example {#docs:current:core_extensions:spatial:functions::example}

```sql
-- Find the coordinate reference system authority name and code for the first layers first geometry column in the file
SELECT
    layers[1].geometry_fields[1].crs.auth_name as name,
    layers[1].geometry_fields[1].crs.auth_code as code
FROM st_read_meta('../../tmp/data/amsterdam_roads.fgb');
```

----

### R-Tree Indexes {#docs:current:core_extensions:spatial:r-tree_indexes}

The [`spatial` extension](#docs:current:core_extensions:spatial:overview) provides support for spatial indexing through the R-tree extension index type.

#### Why Should I Use an R-Tree Index? {#docs:current:core_extensions:spatial:r-tree_indexes::why-should-i-use-an-r-tree-index}

When working with geospatial datasets, it is very common that you want to filter rows based on their spatial relationship with a specific region of interest. Unfortunately, even though DuckDB's vectorized execution engine is pretty fast, this sort of operation does not scale very well to large datasets as it always requires a full table scan to check every row in the table. However, by indexing a table with an R-tree, it is possible to accelerate these types of queries significantly.

#### How Do R-Tree Indexes Work? {#docs:current:core_extensions:spatial:r-tree_indexes::how-do-r-tree-indexes-work}

An R-tree is a balanced tree data structure that stores the approximate _minimum bounding rectangle_ of each geometry (and the internal ID of the corresponding row) in the leaf nodes, and the bounding rectangle enclosing all of the child nodes in each internal node.

> The _minimum bounding rectangle_ (MBR) of a geometry is the smallest rectangle that completely encloses the geometry. Usually when we talk about the bounding rectangle of a geometry (or the bounding "box" in the context of 2D geometry), we mean the minimum bounding rectangle. Additionally, we tend to assume that bounding boxes/rectangles are _axis-aligned,_ i.e., the rectangle is **not** rotated – the sides are always parallel to the coordinate axes. The MBR of a point is the point itself.

By traversing the R-tree from top to bottom, it is possible to very quickly search a R-tree-indexed table for only those rows where the indexed geometry column intersect a specific region of interest, as you can skip searching entire sub-trees if the bounding rectangles of their parent nodes don't intersect the query region at all. Once the leaf nodes are reached, only the specific rows whose geometries intersect the query region have to be fetched from disk, and the often much more expensive exact spatial predicate check (and any other filters) only have to be executed for these rows.

#### What Are the Limitations of R-Tree Indexes in DuckDB? {#docs:current:core_extensions:spatial:r-tree_indexes::what-are-the-limitations-of-r-tree-indexes-in-duckdb}

Before you get started using the R-tree index, there are some limitations to be aware of:

* The R-tree index is only supported for the `GEOMETRY` data type.
* The R-tree index will only be used to perform "index scans" when the table is filtered (using a `WHERE` clause) with one of the following spatial predicate functions (as they all imply intersection): `ST_Equals`, `ST_Intersects`, `ST_Touches`, `ST_Crosses`, `ST_Within`, `ST_Contains`, `ST_Overlaps`, `ST_Covers`, `ST_CoveredBy`, `ST_ContainsProperly`.
* One of the arguments to the spatial predicate function must be a "constant" (i.e., an expression whose result is known at query planning time). This is because the query planner needs to know the bounding box of the query region _before_ the query itself is executed to use the R-tree index scan.

In the future we want to enable R-tree indexes to be used to accelerate additional predicate functions and more complex queries such a spatial joins.

#### How to Use R-Tree Indexes in DuckDB {#docs:current:core_extensions:spatial:r-tree_indexes::how-to-use-r-tree-indexes-in-duckdb}

To create an R-tree index, simply use the `CREATE INDEX` statement with the `USING RTREE` clause, passing the geometry column to index within the parentheses. For example:

```sql
-- Create a table with a geometry column
CREATE TABLE my_table (geom GEOMETRY);

-- Create an R-tree index on the geometry column
CREATE INDEX my_idx ON my_table USING RTREE (geom);
```

You can also pass in additional options when creating an R-tree index using the `WITH` clause to control the behavior of the R-tree index. For example, to specify the maximum number of entries per node in the R-tree, you can use the `max_node_capacity` option:

```sql
CREATE INDEX my_idx ON my_table USING RTREE (geom) WITH (max_node_capacity = 16);
```

The impact tweaking these options will have on performance is highly dependent on the system setup DuckDB is running on, the spatial distribution of the dataset and the query patterns of your specific workload. The defaults should be good enough, but you if you want to experiment with different parameters, see the [full list of options here](#::options).

#### Example {#docs:current:core_extensions:spatial:r-tree_indexes::example}

Here is an example that shows how to create an R-tree index on a geometry column and where we can see that the `RTREE_INDEX_SCAN` operator is used when the table is filtered with a spatial predicate:

```sql
INSTALL spatial;
LOAD spatial;

-- Create a table with 10_000_000 random points
CREATE TABLE t1 AS SELECT point::GEOMETRY AS geom
FROM st_generatepoints({min_x: 0, min_y: 0, max_x: 100, max_y: 100}::BOX_2D, 10_000, 1337);

-- Create an index on the table.
CREATE INDEX my_idx ON t1 USING RTREE (geom);

-- Perform a query with a "spatial predicate" on the indexed geometry column
-- Note how the second argument in this case, the ST_MakeEnvelope call is a "constant"
SELECT count(*) FROM t1 WHERE ST_Within(geom, ST_MakeEnvelope(45, 45, 65, 65));
```

```text
390
```

We can check for ourselves that an R-tree index scan is used by using the `EXPLAIN` statement:

```sql
EXPLAIN SELECT count(*) FROM t1 WHERE ST_Within(geom, ST_MakeEnvelope(45, 45, 65, 65));
```

```text
┌───────────────────────────┐
│    UNGROUPED_AGGREGATE    │
│    ────────────────────   │
│        Aggregates:        │
│        count_star()       │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│           FILTER          │
│    ────────────────────   │
│ ST_Within(geom, '...')    │ 
│                           │
│         ~2000 Rows        │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│     RTREE_INDEX_SCAN      │
│    ────────────────────   │
│   t1 (RTREE INDEX SCAN :  │
│           my_idx)         │
│                           │
│     Projections: geom     │
│                           │
│        ~10000 Rows        │
└───────────────────────────┘
```

#### Performance Considerations {#docs:current:core_extensions:spatial:r-tree_indexes::performance-considerations}

##### Bulk Loading & Maintenance {#docs:current:core_extensions:spatial:r-tree_indexes::bulk-loading--maintenance}

Creating R-trees on top of an already populated table is much faster than first creating the index and then inserting the data. This is because the R-tree will have to periodically rebalance itself and perform a somewhat costly splitting operation when a node reaches max capacity after an insert, potentially causing additional splits to cascade up the tree. However, when the R-tree index is created on an already populated table, a special bottom up "bulk loading algorithm" (Sort-Tile-Recursive) is used, which divides all entries into an already balanced tree as the total number of required nodes can be computed from the beginning.

Additionally, using the bulk loading algorithm tends to create a R-tree with a better structure (less overlap between bounding boxes), which usually leads to better query performance. If you find that the performance of querying the R-tree starts to deteriorate after a large number of updates or deletions, dropping and re-creating the index might produce a higher quality R-tree.

##### Memory Usage {#docs:current:core_extensions:spatial:r-tree_indexes::memory-usage}

Like DuckDB's built in ART-index, all the associated buffers containing the R-tree will be lazily loaded from disk (when running DuckDB in disk-backed mode), but they are currently never unloaded unless the index is dropped. This means that if you end up scanning the entire index, the entire index will be loaded into memory and stay there for the duration of the database connection. However, all memory used by the R-tree index (even during bulk-loading) is tracked by DuckDB, and will count towards the memory limit set by the `memory_limit` configuration parameter.

##### Tuning {#docs:current:core_extensions:spatial:r-tree_indexes::tuning}

Depending on you specific workload, you might want to experiment with the `max_node_capacity` and `min_node_capacity` options to change the structure of the R-tree and how it responds to insertions and deletions, see the [full list of options here](#::options). In general, a tree with a higher total number of nodes (i.e., a lower `max_node_capacity`) _may_ result in a more granular structure that enables more aggressive pruning of sub-trees during query execution, but it will also require more memory to store the tree itself and be more punishing when querying larger regions as more internal nodes will have to be traversed.

#### Options {#docs:current:core_extensions:spatial:r-tree_indexes::options}

The following options can be passed to the `WITH` clause when creating an R-tree index: (e.g., `CREATE INDEX my_idx ON my_table USING RTREE (geom) WITH (⟨option⟩ = ⟨value⟩);`{:.language-sql .highlight})

| Option              | Description                                          |  Default                  |
|---------------------|------------------------------------------------------|---------------------------|
| `max_node_capacity` | The maximum number of entries per node in the R-tree | `128`                     |
| `min_node_capacity` | The minimum number of entries per node in the R-tree | `0.4 * max_node_capacity` |

*Should a node fall under the minimum number of entries after a deletion, the node will be dissolved and all the entries reinserted from the top of the tree. This is a common operation in R-tree implementations to prevent the tree from becoming too unbalanced.

#### R-Tree Table Functions {#docs:current:core_extensions:spatial:r-tree_indexes::r-tree-table-functions}

The `rtree_index_dump(VARCHAR)` table function can be used to return all the nodes within an R-tree index which might come on handy when debugging, profiling or otherwise just inspecting the structure of the index. The function takes the name of the R-tree index as an argument and returns a table with the following columns:

| Column name | Type       | Description                                                                   |
|-------------|------------|-------------------------------------------------------------------------------|
| `level`     | `INTEGER`  | The level of the node in the R-tree. The root node has level 0                |
| `bounds`    | `BOX_2DF`  | The bounding box of the node                                                  |
| `row_id`    | `ROW_TYPE` | If this is a leaf node, the `rowid` of the row in the table, otherwise `NULL` |

Example:

```sql
-- Create a table with 64 random points
CREATE TABLE t1 AS SELECT point::GEOMETRY AS geom
FROM st_generatepoints({min_x: 0, min_y: 0, max_x: 100, max_y: 100}::BOX_2D, 64, 1337);

-- Create an R-tree index on the geometry column (with a low max_node_capacity for demonstration purposes)
CREATE INDEX my_idx ON t1 USING RTREE (geom) WITH (max_node_capacity = 4);

-- Inspect the R-tree index. Notice how the area of the bounding boxes of the branch nodes 
-- decreases as we go deeper into the tree.
SELECT 
  level, 
  bounds::GEOMETRY AS geom, 
  CASE WHEN row_id IS NULL THEN st_area(geom) ELSE NULL END AS area, 
  row_id, 
  CASE WHEN row_id IS NULL THEN 'branch' ELSE 'leaf' END AS kind 
FROM rtree_index_dump('my_idx') 
ORDER BY area DESC;
```

```text
┌───────┬──────────────────────────────┬────────────────────┬────────┬─────────┐
│ level │             geom             │        area        │ row_id │  kind   │
│ int32 │           geometry           │       double       │ int64  │ varchar │
├───────┼──────────────────────────────┼────────────────────┼────────┼─────────┤
│     0 │ POLYGON ((2.17285037040710…  │  3286.396482226409 │        │ branch  │
│     0 │ POLYGON ((6.00962591171264…  │  3193.725100864862 │        │ branch  │
│     0 │ POLYGON ((0.74995160102844…  │  3099.921458393704 │        │ branch  │
│     0 │ POLYGON ((14.6168870925903…  │ 2322.2760491675654 │        │ branch  │
│     1 │ POLYGON ((2.17285037040710…  │  604.1520104388514 │        │ branch  │
│     1 │ POLYGON ((26.6022186279296…  │  569.1665467030252 │        │ branch  │
│     1 │ POLYGON ((35.7942314147949…  │ 435.24662436250037 │        │ branch  │
│     1 │ POLYGON ((62.2643051147460…  │ 396.39027683023596 │        │ branch  │
│     1 │ POLYGON ((59.5225715637207…  │ 386.09153403820187 │        │ branch  │
│     1 │ POLYGON ((82.3060836791992…  │ 369.15115640929434 │        │ branch  │
│     · │              ·               │          ·         │      · │  ·      │
│     · │              ·               │          ·         │      · │  ·      │
│     · │              ·               │          ·         │      · │  ·      │
│     2 │ POLYGON ((20.5411434173584…  │                    │     35 │ leaf    │
│     2 │ POLYGON ((14.6168870925903…  │                    │     36 │ leaf    │
│     2 │ POLYGON ((43.7271652221679…  │                    │     39 │ leaf    │
│     2 │ POLYGON ((53.4629211425781…  │                    │     44 │ leaf    │
│     2 │ POLYGON ((26.6022186279296…  │                    │     62 │ leaf    │
│     2 │ POLYGON ((53.1732063293457…  │                    │     63 │ leaf    │
│     2 │ POLYGON ((78.1427154541015…  │                    │     10 │ leaf    │
│     2 │ POLYGON ((75.1728591918945…  │                    │     15 │ leaf    │
│     2 │ POLYGON ((62.2643051147460…  │                    │     42 │ leaf    │
│     2 │ POLYGON ((80.5032577514648…  │                    │     49 │ leaf    │
├───────┴──────────────────────────────┴────────────────────┴────────┴─────────┤
│ 84 rows (20 shown)                                                 5 columns │
└──────────────────────────────────────────────────────────────────────────────┘
```

### GDAL Integration {#docs:current:core_extensions:spatial:gdal}

The spatial extension integrates the [GDAL](https://gdal.org/en/latest/) translator library to read and write spatial data from a variety of geospatial vector file formats. See the documentation for the [`st_read` table function](#docs:current:core_extensions:spatial:functions::st_read) for how to make use of this in practice.

In order to spare users from having to setup and install additional dependencies on their system, the spatial extension bundles its own copy of the GDAL library. This also means that spatial's version of GDAL may not be the latest version available or provide support for all of the file formats that a system-wide GDAL installation otherwise would. Refer to the section on the [`st_drivers` table function](#docs:current:core_extensions:spatial:functions::st_drivers) to inspect which GDAL drivers are currently available.

#### GDAL Based `COPY` Function {#docs:current:core_extensions:spatial:gdal::gdal-based-copy-function}

The spatial extension does not only enable _importing_ geospatial file formats (through the `ST_Read` function), it also enables _exporting_ DuckDB tables to different geospatial vector formats through a GDAL based `COPY` function.

For example, to export a table to a GeoJSON file, with generated bounding boxes, you can use the following query:

```sql
COPY ⟨table⟩ TO 'some/file/path/filename.geojson'
WITH (FORMAT gdal, DRIVER 'GeoJSON', LAYER_CREATION_OPTIONS 'WRITE_BBOX=YES', SRS 'EPSG:4326');
```

Available options:

* `FORMAT`: is the only required option and must be set to `GDAL` to use the GDAL based copy function.
* `DRIVER`: is the GDAL driver to use for the export. Use `ST_Drivers()` to list the names of all available drivers.
* `LAYER_CREATION_OPTIONS`: list of options to pass to the GDAL driver. See the GDAL docs for the driver you are using for a list of available options.
* `SRS`: Set a spatial reference system as metadata to use for the export. This can be a WKT string, an EPSG code or a proj-string, basically anything you would normally be able to pass to GDAL. Note that this will **not** perform any reprojection of the input geometry, it just sets the metadata if the target driver supports it.

#### Limitations {#docs:current:core_extensions:spatial:gdal::limitations}

Note that only vector based drivers are supported by the GDAL integration. Reading and writing raster formats are not supported.

## SQLite Extension {#docs:current:core_extensions:sqlite}

The SQLite extension allows DuckDB to directly read and write data from a SQLite database file. The data can be queried directly from the underlying SQLite tables. Data can be loaded from SQLite tables into DuckDB tables, or vice versa.

#### Installing and Loading {#docs:current:core_extensions:sqlite::installing-and-loading}

The `sqlite` extension will be transparently [autoloaded](#docs:current:extensions:overview::autoloading-extensions) on first use from the official extension repository.
If you would like to install and load it manually, run:

```sql
INSTALL sqlite;
LOAD sqlite;
```

#### Usage {#docs:current:core_extensions:sqlite::usage}

To make a SQLite file accessible to DuckDB, use the `ATTACH` statement with the `sqlite` or `sqlite_scanner` type. Attached SQLite databases support both read and write operations.

For example, to attach to the [`sakila.db` file](https://github.com/duckdb/sqlite_scanner/raw/main/data/db/sakila.db), run:

```sql
ATTACH 'sakila.db' (TYPE sqlite);
USE sakila;
```

The tables in the file can be read as if they were normal DuckDB tables, but the underlying data is read directly from the SQLite tables in the file at query time.

```sql
SHOW TABLES;
```



|          name          |
|------------------------|
| actor                  |
| address                |
| category               |
| city                   |
| country                |
| customer               |
| customer_list          |
| film                   |
| film_actor             |
| film_category          |
| film_list              |
| film_text              |
| inventory              |
| language               |
| payment                |
| rental                 |
| sales_by_film_category |
| sales_by_store         |
| staff                  |
| staff_list             |
| store                  |

You can query the tables using SQL, e.g., using the example queries from [`sakila-examples.sql`](https://github.com/duckdb/sqlite_scanner/blob/main/data/sql/sakila-examples.sql):

```sql
SELECT
    cat.name AS category_name,
    sum(ifnull(pay.amount, 0)) AS revenue
FROM category cat
LEFT JOIN film_category flm_cat
       ON cat.category_id = flm_cat.category_id
LEFT JOIN film fil
       ON flm_cat.film_id = fil.film_id
LEFT JOIN inventory inv
       ON fil.film_id = inv.film_id
LEFT JOIN rental ren
       ON inv.inventory_id = ren.inventory_id
LEFT JOIN payment pay
       ON ren.rental_id = pay.rental_id
GROUP BY cat.name
ORDER BY revenue DESC
LIMIT 5;
```

#### Data Types {#docs:current:core_extensions:sqlite::data-types}

SQLite is a [weakly typed database system](https://www.sqlite.org/datatype3.html). As such, when storing data in a SQLite table, types are not enforced. The following is valid SQL in SQLite:

```sql
CREATE TABLE numbers (i INTEGER);
INSERT INTO numbers VALUES ('hello');
```

DuckDB is a strongly typed database system, as such, it requires all columns to have defined types and the system rigorously checks data for correctness.

When querying SQLite, DuckDB must deduce a specific column type mapping. DuckDB follows SQLite's [type affinity rules](https://www.sqlite.org/datatype3.html#type_affinity) with a few extensions.

1. If the declared type contains the string `INT` then it is translated into the type `BIGINT`
2. If the declared type of the column contains any of the strings `CHAR`, `CLOB`, or `TEXT` then it is translated into `VARCHAR`.
3. If the declared type for a column contains the string `BLOB` or if no type is specified then it is translated into `BLOB`.
4. If the declared type for a column contains any of the strings `REAL`, `FLOA`, `DOUB`, `DEC` or `NUM` then it is translated into `DOUBLE`.
5. If the declared type is `DATE`, then it is translated into `DATE`.
6. If the declared type contains the string `TIME`, then it is translated into `TIMESTAMP`.
7. If none of the above apply, then it is translated into `VARCHAR`.

As DuckDB enforces the corresponding columns to contain only correctly typed values, we cannot load the string “hello” into a column of type `BIGINT`. As such, an error is thrown when reading from the “numbers” table above:

```console
Mismatch Type Error: Invalid type in column "i": column was declared as integer, found "hello" of type "text" instead.
```

This error can be avoided by setting the `sqlite_all_varchar` option:

```sql
SET GLOBAL sqlite_all_varchar = true;
```

When set, this option overrides the type conversion rules described above, and instead always converts the SQLite columns into a `VARCHAR` column. Note that this setting must be set *before* `sqlite_attach` is called.

#### Opening SQLite Databases Directly {#docs:current:core_extensions:sqlite::opening-sqlite-databases-directly}

SQLite databases can also be opened directly and can be used transparently instead of a DuckDB database file. In any client, when connecting, a path to a SQLite database file can be provided and the SQLite database will be opened instead.

For example, with the shell, a SQLite database can be opened as follows:

```batch
duckdb sakila.db
```

```sql
SELECT first_name
FROM actor
LIMIT 3;
```

| first_name |
|------------|
| PENELOPE   |
| NICK       |
| ED         |

#### Writing Data to SQLite {#docs:current:core_extensions:sqlite::writing-data-to-sqlite}

In addition to reading data from SQLite, the extension also allows you to create new SQLite database files, create tables, ingest data into SQLite and make other modifications to SQLite database files using standard SQL queries.

This allows you to use DuckDB to, for example, export data that is stored in a SQLite database to Parquet, or read data from a Parquet file into SQLite.

Below is a brief example of how to create a new SQLite database and load data into it.

```sql
ATTACH 'new_sqlite_database.db' AS sqlite_db (TYPE sqlite);
CREATE TABLE sqlite_db.tbl (id INTEGER, name VARCHAR);
INSERT INTO sqlite_db.tbl VALUES (42, 'DuckDB');
```

The resulting SQLite database can then be read into from SQLite.

```batch
sqlite3 new_sqlite_database.db
```

```sql
SQLite version 3.39.5 2022-10-14 20:58:05
sqlite> SELECT * FROM tbl;
```

```text
id  name  
--  ------
42  DuckDB
```

Many operations on SQLite tables are supported. All these operations directly modify the SQLite database, and the result of subsequent operations can then be read using SQLite.

#### Concurrency {#docs:current:core_extensions:sqlite::concurrency}

DuckDB can read or modify a SQLite database while DuckDB or SQLite reads or modifies the same database from a different thread or a separate process. More than one thread or process can read the SQLite database at the same time, but only a single thread or process can write to the database at one time. Database locking is handled by the SQLite library, not DuckDB. Within the same process, SQLite uses mutexes. When accessed from different processes, SQLite uses file system locks. The locking mechanisms also depend on SQLite configuration, like WAL mode. Refer to the [SQLite documentation on locking](https://www.sqlite.org/lockingv3.html) for more information.

> **Warning.** Linking multiple copies of the SQLite library into the same application can lead to application errors. See [sqlite_scanner Issue #82](https://github.com/duckdb/sqlite_scanner/issues/82) for more information.

#### Settings {#docs:current:core_extensions:sqlite::settings}

The extension exposes the following configuration parameters.

| Name                              | Description                                                                  | Default |
| --------------------------------- | ---------------------------------------------------------------------------- | ------- |
| `sqlite_debug_show_queries`       | DEBUG SETTING: print all queries sent to SQLite to stdout                    | `false` |

#### Supported Operations {#docs:current:core_extensions:sqlite::supported-operations}

Below is a list of supported operations.

##### `CREATE TABLE` {#docs:current:core_extensions:sqlite::create-table}

```sql
CREATE TABLE sqlite_db.tbl (id INTEGER, name VARCHAR);
```

##### `INSERT INTO` {#docs:current:core_extensions:sqlite::insert-into}

```sql
INSERT INTO sqlite_db.tbl VALUES (42, 'DuckDB');
```

##### `SELECT` {#docs:current:core_extensions:sqlite::select}

```sql
SELECT * FROM sqlite_db.tbl;
```

| id |  name  |
|---:|--------|
| 42 | DuckDB |

##### `COPY` {#docs:current:core_extensions:sqlite::copy}

```sql
COPY sqlite_db.tbl TO 'data.parquet';
COPY sqlite_db.tbl FROM 'data.parquet';
```

##### `UPDATE` {#docs:current:core_extensions:sqlite::update}

```sql
UPDATE sqlite_db.tbl SET name = 'Woohoo' WHERE id = 42;
```

##### `DELETE` {#docs:current:core_extensions:sqlite::delete}

```sql
DELETE FROM sqlite_db.tbl WHERE id = 42;
```

##### `ALTER TABLE` {#docs:current:core_extensions:sqlite::alter-table}

```sql
ALTER TABLE sqlite_db.tbl ADD COLUMN k INTEGER;
```

##### `DROP TABLE` {#docs:current:core_extensions:sqlite::drop-table}

```sql
DROP TABLE sqlite_db.tbl;
```

##### `CREATE VIEW` {#docs:current:core_extensions:sqlite::create-view}

```sql
CREATE VIEW sqlite_db.v1 AS SELECT 42;
```

##### Transactions {#docs:current:core_extensions:sqlite::transactions}

```sql
CREATE TABLE sqlite_db.tmp (i INTEGER);
```

```sql
BEGIN;
INSERT INTO sqlite_db.tmp VALUES (42);
SELECT * FROM sqlite_db.tmp;
```

| i  |
|---:|
| 42 |

```sql
ROLLBACK;
SELECT * FROM sqlite_db.tmp;
```

| i |
|--:|
|   |

> **Deprecated.** The old `sqlite_attach` function is deprecated. It is recommended to switch over to the new [`ATTACH` syntax](#docs:current:sql:statements:attach).

#### Compatibility {#docs:current:core_extensions:sqlite::compatibility}

The SQLite extension can read databases written by [Turso](https://turso.tech/), a Rust rewrite of SQLite.

## TPC-DS Extension {#docs:current:core_extensions:tpcds}

The `tpcds` extension implements the data generator and queries for the [TPC-DS benchmark](https://www.tpc.org/tpcds/).

#### Installing and Loading {#docs:current:core_extensions:tpcds::installing-and-loading}

The `tpcds` extension will be transparently [autoloaded](#docs:current:extensions:overview::autoloading-extensions) on first use from the official extension repository.
If you would like to install and load it manually, run:

```sql
INSTALL tpcds;
LOAD tpcds;
```

#### Usage {#docs:current:core_extensions:tpcds::usage}

To generate data for scale factor 1, use:

```sql
CALL dsdgen(sf = 1);
```

To run a query, e.g., query 8, use:

```sql
PRAGMA tpcds(8);
```

| s_store_name | sum(ss_net_profit) |
|--------------|-------------------:|
| able         | -10354620.18       |
| ation        | -10576395.52       |
| bar          | -10625236.01       |
| ese          | -10076698.16       |
| ought        | -10994052.78       |

#### Generating the Schema {#docs:current:core_extensions:tpcds::generating-the-schema}

It's possible to generate the schema of TPC-DS without any data by setting the scale factor to 0:

```sql
CALL dsdgen(sf = 0);
```

#### Pre-Generated Datasets {#docs:current:core_extensions:tpcds::pre-generated-datasets}

Pre-generated DuckDB databases for TPC-DS are available for download:

* [`tpcds-sf10.db`](https://blobs.duckdb.org/data/tpcds-sf10.db) (2.9 GB)
* [`tpcds-sf30.db`](https://blobs.duckdb.org/data/tpcds-sf30.db) (7.7 GB)
* [`tpcds-sf100.db`](https://blobs.duckdb.org/data/tpcds-sf100.db) (26.6 GB)
* [`tpcds-sf300.db`](https://blobs.duckdb.org/data/tpcds-sf300.db) (79.3 GB)

#### Limitations {#docs:current:core_extensions:tpcds::limitations}

The `tpcds(⟨query_id⟩)`{:.language-sql .highlight} function runs a fixed TPC-DS query with pre-defined bind parameters (a.k.a. substitution parameters).
It is not possible to change the query parameters using the `tpcds` extension.

## TPC-H Extension {#docs:current:core_extensions:tpch}

The `tpch` extension implements the data generator and queries for the [TPC-H benchmark](https://www.tpc.org/tpch/).

#### Installing and Loading {#docs:current:core_extensions:tpch::installing-and-loading}

The `tpch` extension is shipped by default in some DuckDB builds, otherwise it will be transparently [autoloaded](#docs:current:extensions:overview::autoloading-extensions) on first use.
If you would like to install and load it manually, run:

```sql
INSTALL tpch;
LOAD tpch;
```

#### Benchmarking with the TPC-H Workload {#docs:current:core_extensions:tpch::benchmarking-with-the-tpc-h-workload}

To run the full TPC-H workload with DuckDB, use the [standalone DuckDB TPC-H implementation project](https://github.com/duckdb/duckdb-tpch-power-test).

#### Usage {#docs:current:core_extensions:tpch::usage}

##### Generating Data {#docs:current:core_extensions:tpch::generating-data}

To generate data for scale factor 1, use:

```sql
CALL dbgen(sf = 1);
```

Calling `dbgen` does not clean up existing TPC-H tables.
To clean up existing tables, use `DROP TABLE` before running `dbgen`:

```sql
DROP TABLE IF EXISTS customer;
DROP TABLE IF EXISTS lineitem;
DROP TABLE IF EXISTS nation;
DROP TABLE IF EXISTS orders;
DROP TABLE IF EXISTS part;
DROP TABLE IF EXISTS partsupp;
DROP TABLE IF EXISTS region;
DROP TABLE IF EXISTS supplier;
```

##### Running a Query {#docs:current:core_extensions:tpch::running-a-query}

To run a query, e.g., query 4, use:

```sql
PRAGMA tpch(4);
```

| o_orderpriority | order_count |
| --------------- | ----------: |
| 1-URGENT        |       10594 |
| 2-HIGH          |       10476 |
| 3-MEDIUM        |       10410 |
| 4-NOT SPECIFIED |       10556 |
| 5-LOW           |       10487 |

##### Listing Queries {#docs:current:core_extensions:tpch::listing-queries}

To list all 22 queries, run:

```sql
FROM tpch_queries();
```

This function returns a table with columns `query_nr` and `query`.

##### Listing Expected Answers {#docs:current:core_extensions:tpch::listing-expected-answers}

To produce the expected results for all queries on scale factors 0.01, 0.1 and 1, run:

```sql
FROM tpch_answers();
```

This function returns a table with columns `query_nr`, `scale_factor` and `answer`.

#### Generating the Schema {#docs:current:core_extensions:tpch::generating-the-schema}

It's possible to generate the schema of TPC-H without any data by setting the scale factor to 0:

```sql
CALL dbgen(sf = 0);
```

#### Data Generator Parameters {#docs:current:core_extensions:tpch::data-generator-parameters}

The data generator function `dbgen` has the following parameters:

| Name        | Type       | Description                                                                                                                       |
| ----------- | ---------- | --------------------------------------------------------------------------------------------------------------------------------- |
| `catalog`   | `VARCHAR`  | Target catalog                                                                                                                    |
| `children`  | `UINTEGER` | Number of partitions                                                                                                              |
| `overwrite` | `BOOLEAN`  | (Not used)                                                                                                                        |
| `sf`        | `DOUBLE`   | Scale factor                                                                                                                      |
| `step`      | `UINTEGER` | Defines the partition to be generated, indexed from 0 to `children` - 1. Must be defined when the `children` arguments is defined |
| `suffix`    | `VARCHAR`  | Append the `suffix` to table names                                                                                                |

#### Pre-Generated Datasets {#docs:current:core_extensions:tpch::pre-generated-datasets}

Pre-generated DuckDB databases for TPC-H are available for download:

* [`tpch-sf1.db`](https://blobs.duckdb.org/data/tpch-sf1.db) (250 MB)
* [`tpch-sf3.db`](https://blobs.duckdb.org/data/tpch-sf3.db) (754 MB)
* [`tpch-sf10.db`](https://blobs.duckdb.org/data/tpch-sf10.db) (2.5 GB)
* [`tpch-sf30.db`](https://blobs.duckdb.org/data/tpch-sf30.db) (7.6 GB)
* [`tpch-sf100.db`](https://blobs.duckdb.org/data/tpch-sf100.db) (26 GB)
* [`tpch-sf300.db`](https://blobs.duckdb.org/data/tpch-sf300.db) (78 GB)
* [`tpch-sf1000.db`](https://blobs.duckdb.org/data/tpch-sf1000.db) (265 GB)
* [`tpch-sf3000.db`](https://blobs.duckdb.org/data/tpch-sf3000.db) (796 GB)

#### Resource Usage of the Data Generator {#docs:current:core_extensions:tpch::resource-usage-of-the-data-generator}

Generating TPC-H datasets for large scale factors takes a significant amount of time.
Additionally, _if the generation is performed in a single step,_ it requires a large amount of memory.
The following table gives an estimate on the resources required to produce DuckDB database files containing the generated TPC-H dataset using 128 threads.

| Scale factor | Database size | Generation time | Single-step generation's memory usage |
| -----------: | ------------: | --------------: | ------------------------------------: |
|          100 |         26 GB |      17 minutes |                                 71 GB |
|          300 |         78 GB |      51 minutes |                                211 GB |
|        1,000 |        265 GB |  2 h 53 minutes |                                647 GB |
|        3,000 |        796 GB |  8 h 30 minutes |                               1799 GB |

The numbers shown above were achieved by running the `dbgen` function in a single step, for example:

```sql
CALL dbgen(sf = 300);
```

If you have a limited amount of memory available, you can run the `dbgen` function in steps.
For example, you may generate SF300 in 10 steps:

```sql
CALL dbgen(sf = 300, children = 10, step = 0);
CALL dbgen(sf = 300, children = 10, step = 1);
...
CALL dbgen(sf = 300, children = 10, step = 9);
```

#### Limitation {#docs:current:core_extensions:tpch::limitation}

The `tpch(⟨query_id⟩)`{:.language-sql .highlight} function runs a fixed TPC-H query with pre-defined bind parameters (a.k.a. substitution parameters). It is not possible to change the query parameters using the `tpch` extension. To run the queries with the parameters prescribed by the TPC-H benchmark, use a TPC-H framework implementation.

## UI Extension {#docs:current:core_extensions:ui}

The `ui` extension adds a user interface for your local DuckDB instance.

The UI is built and maintained by [MotherDuck](https://motherduck.com/).
An overview of its features can be found
in the [MotherDuck documentation](https://motherduck.com/docs/getting-started/motherduck-quick-tour/).

#### Usage {#docs:current:core_extensions:ui::usage}

To start the UI from the command line:

```batch
duckdb -ui
```

To start the UI from SQL:

```sql
CALL start_ui();
```

Running either of these will open the UI in your default browser.

The UI connects to the DuckDB instance it was started from,
so any data you’ve already loaded will be available.
Since this instance is a native process (not Wasm), it can leverage all
the resources of your local environment: all cores, memory and files.
Closing this instance will cause the UI to stop working.

The UI is served from an HTTP server embedded in DuckDB.
To start this server without launching the browser, run:

```sql
CALL start_ui_server();
```

You can then load the UI in your browser by navigating to
`http://localhost:4213`.

To stop the HTTP server, run:

```sql
CALL stop_ui_server();
```

#### Local Query Execution {#docs:current:core_extensions:ui::local-query-execution}

By default, the DuckDB UI runs your queries fully locally: your queries and data never leave your computer.
If you would like to use [MotherDuck](https://motherduck.com/) through the UI, you have to opt-in explicitly and sign into MotherDuck.

#### Configuration {#docs:current:core_extensions:ui::configuration}

##### Local Port {#docs:current:core_extensions:ui::local-port}

The local port of the HTTP server can be configured with a SQL command like:

```sql
SET ui_local_port = 4213;
```

The environment variable `ui_local_port` can also be used.

The default port is 4213. (Why? 4 = D, 21 = U, 3 = C)

##### Remote URL {#docs:current:core_extensions:ui::remote-url}

The local HTTP server fetches the files for the UI from a remote HTTP
server so they can be kept up-to-date.

The default URL for the remote server is <https://ui.duckdb.org>.

An alternate remote URL can be configured with a SQL command like:

```sql
SET ui_remote_url = 'https://ui.duckdb.org';
```

The environment variable `ui_remote_port` can also be used.

This setting is available mainly for testing purposes.

Be sure you trust any URL you configure, as the application can access
the data you load into DuckDB.

Because of this risk, the setting is only respected
if `allow_unsigned_extensions` is enabled.

##### Polling Interval {#docs:current:core_extensions:ui::polling-interval}

The UI extension polls for some information on a background thread.
It watches for changes to the list of attached databases,
and it detects when you connect to MotherDuck.

These checks take very little time to complete, so the default polling
interval is short (284 milliseconds).
You can configure it with a SQL command like:

```sql
SET ui_polling_interval = 284;
```

The environment variable `ui_polling_interval` can also be used.

Setting the polling interval to 0 will disable polling entirely.
This is not recommended, as the list of databases in the UI could get
out of date, and some ways of connecting to MotherDuck will not work
properly.

#### Tips {#docs:current:core_extensions:ui::tips}

##### Opening a CSV File with the DuckDB UI {#docs:current:core_extensions:ui::opening-a-csv-file-with-the-duckdb-ui}

Using the [DuckDB CLI client](#docs:current:clients:cli:overview),
you can start the UI with a CSV available as a view using the [`-cmd` argument](#docs:current:clients:cli:arguments):

```batch
duckdb -cmd "CREATE VIEW ⟨view_name⟩ AS FROM '⟨filename⟩.csv';" -ui
```

##### Running the UI in Read-Only Mode {#docs:current:core_extensions:ui::running-the-ui-in-read-only-mode}

The DuckDB UI uses DuckDB tables as storage internally (e.g., for saving notebooks).
Therefore, running the UI directly on a read-only database [is not supported](https://github.com/duckdb/duckdb-ui/issues/61):

```batch
duckdb -ui -readonly read_only_test.db
```

In the UI, this results in:

```console
Catalog Error: SET schema: No catalog + schema named "memory.main" found.
```

To work around this, run the UI on another database file:

```batch
duckdb -ui ui_catalog.db
```

Then, open a notebook and attach to the database:

```sql
ATTACH 'test.db' (READ_ONLY) AS my_db;
USE my_db;
```

#### Limitations {#docs:current:core_extensions:ui::limitations}

* The UI currently does not support `windows_arm64`.

## Unity Catalog Extension {#docs:current:core_extensions:unity_catalog}

The `unity_catalog` extension adds support for the [`Unity Catalog`](https://www.unitycatalog.io/) atop the
[`Delta Lake`](https://delta.io/) format and [DuckDB Delta extension](#docs:current:core_extensions:delta).

The `delta` extension adds support for the [Delta Lake open-source storage format](https://delta.io/). It is built using the [Delta Kernel](https://github.com/delta-incubator/delta-kernel-rs). The extension offers **read support** for Delta tables, both local and remote.

For implementation details, see the [announcement blog post](https://duckdb.org/2024/06/10/delta).

> **Warning.** Both the `unity_catalog` and `delta` extensions are currently experimental and [only supported on given platforms](#::supported-duckdb-versions-and-platforms).

#### Installing and Loading {#docs:current:core_extensions:unity_catalog::installing-and-loading}

To install and load, run:

```sql
INSTALL unity_catalog;
LOAD unity_catalog;
```

#### Usage {#docs:current:core_extensions:unity_catalog::usage}

Given that you already have a Unity Catalog setup with either Databricks or Unity Catalog OSS, you will need to
configure your secret token, endpoint, and region; then attach to your catalog. For example an AWS configuration
would look like this:

```sql
CREATE SECRET uc (
    TYPE unity_catalog,
    TOKEN '⟨token⟩',
    ENDPOINT '⟨endpoint⟩',
    AWS_REGION '⟨region⟩'
);
ATTACH 'test_catalog' AS test_catalog (TYPE unity_catalog, DEFAULT_SCHEMA 'main');
```

Where `token` comes from your Databricks or OSS Unity Catalog deployment, and `endpoint` is your Unity Catalog REST API endpoint.

For more details on these deployments see [Databricks Unity Catalog Docs](https://docs.databricks.com/aws/en/data-governance/unity-catalog) and [OSS Unity Catalog Docs](https://docs.unitycatalog.io/).

To confirm correct attachment, try something like:

```sql
SHOW ALL TABLES;
SELECT * FROM test_catalog.test_schema.test_table LIMIT 10;
```

#### Features {#docs:current:core_extensions:unity_catalog::features}

This extension is still experimental and work-in-progress; it supports:

- Listing available tables (` SHOW ALL TABLES;`)
- Interacting with tables using standard SQL (` SELECT * FROM <catalog>.<schema>.<table>;`)
- Time travel (` SELECT * FROM .. AT (VERSION => ..);`)
- Inserts (` INSERT INTO .. VALUES (..);`)

It does not currently support:

- `DELETE` or `UPDATE`
- Creation or manipulation of `TABLE`s `VIEW`s or `SCHEMA`s

#### Supported DuckDB Versions and Platforms {#docs:current:core_extensions:unity_catalog::supported-duckdb-versions-and-platforms}

The `unity_catalog` (and `delta`) extension currently supports the following platforms:

- Linux AMD64 (x86_64 and ARM64): `linux_amd64` and `linux_arm64`
- macOS Intel and Apple Silicon: `osx_amd64` and `osx_arm64`
- Windows AMD64: `windows_amd64`

Support for the [other DuckDB platforms](#docs:current:extensions:extension_distribution::platforms) is work-in-progress.

## Vortex Extension {#docs:current:core_extensions:vortex}

The `vortex` extension allows you to read and write files using the [Vortex file format](https://vortex.dev/). It is currently available for the Linux (` linux_amd64`, `linux_arm64`) and macOS (` osx_amd64`, `osx_arm64`) distributions.

#### Installing and Loading {#docs:current:core_extensions:vortex::installing-and-loading}

To install and load the extension, run:

```sql
INSTALL vortex;
LOAD vortex;
```

#### Reading Vortex Files {#docs:current:core_extensions:vortex::reading-vortex-files}

Using the `read_vortex` function to read Vortex files:

```sql
SELECT * FROM read_vortex('my.vortex');
```

```text
┌───────┐
│   i   │
│ int64 │
├───────┤
│     0 │
│     1 │
│     2 │
└───────┘
```

#### Writing Vortex Files {#docs:current:core_extensions:vortex::writing-vortex-files}

You can write Vortex files as follows:

```sql
COPY (SELECT * FROM generate_series(0, 3) t(i))
TO 'my.vortex' (FORMAT vortex);
```

> **Warning.** Make sure to add the `FORMAT vortex` option. If the `vortex` extension is not loaded, using `COPY ... TO 'my.vortex` without the `FORMAT vortex` specifier will result in a CSV file.

## Vector Similarity Search Extension {#docs:current:core_extensions:vss}

The `vss` extension is an experimental extension for DuckDB that adds indexing support to accelerate vector similarity search queries using DuckDB's new fixed-size `ARRAY` type.

See the [announcement blog post](https://duckdb.org/2024/05/03/vector-similarity-search-vss) and the [“What's New in the Vector Similarity Search Extension?” post](https://duckdb.org/2024/10/23/whats-new-in-the-vss-extension).

#### Usage {#docs:current:core_extensions:vss::usage}

To create a new HNSW (Hierarchical Navigable Small Worlds) index on a table with an `ARRAY` column, use the `CREATE INDEX` statement with the `USING HNSW` clause. For example:

```sql
INSTALL vss;
LOAD vss;

CREATE TABLE my_vector_table (vec FLOAT[3]);
INSERT INTO my_vector_table
    SELECT array_value(a, b, c)
    FROM range(1, 10) ra(a), range(1, 10) rb(b), range(1, 10) rc(c);
CREATE INDEX my_hnsw_index ON my_vector_table USING HNSW (vec);
```

The index will then be used to accelerate queries that use an `ORDER BY` clause evaluating one of the supported distance metric functions against the indexed columns and a constant vector, followed by a `LIMIT` clause. For example:

```sql
SELECT *
FROM my_vector_table
ORDER BY array_distance(vec, [1, 2, 3]::FLOAT[3])
LIMIT 3;
```

Additionally, the overloaded `min_by(col, arg, n)` can also be accelerated with the `HNSW` index if the `arg` argument is a matching distance metric function. This can be used to do quick one-shot nearest neighbor searches. For example, to get the top 3 rows with the closest vectors to `[1, 2, 3]`:

```sql
SELECT min_by(my_vector_table, array_distance(vec, [1, 2, 3]::FLOAT[3]), 3 ORDER BY vec) AS result
FROM my_vector_table;
```

```text
[{'vec': [1.0, 2.0, 3.0]}, {'vec': [2.0, 2.0, 3.0]}, {'vec': [1.0, 2.0, 4.0]}]
```

Note how we pass the table name as the first argument to [`min_by`](#docs:current:sql:functions:aggregates::min_byarg-val-n) to return a struct containing the entire matched row.

We can verify that the index is being used by checking the `EXPLAIN` output and looking for the `HNSW_INDEX_SCAN` node in the plan:

```sql
EXPLAIN
SELECT *
FROM my_vector_table
ORDER BY array_distance(vec, [1, 2, 3]::FLOAT[3])
LIMIT 3;
```

```text
┌───────────────────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             #0            │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│            vec            │
│array_distance(vec, [1.0, 2│
│         .0, 3.0])         │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      HNSW_INDEX_SCAN      │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│   t1 (HNSW INDEX SCAN :   │
│           my_idx)         │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│            vec            │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           EC: 3           │
└───────────────────────────┘
```

By default the HNSW index will be created using the euclidean distance `l2sq` (L2-norm squared) metric, matching DuckDBs `array_distance` function, but other distance metrics can be used by specifying the `metric` option during index creation. For example:

```sql
CREATE INDEX my_hnsw_cosine_index
ON my_vector_table
USING HNSW (vec)
WITH (metric = 'cosine');
```

The following table shows the supported distance metrics and their corresponding DuckDB functions

| Metric   | Function                       | Description                |
|----------|--------------------------------|----------------------------|
| `l2sq`   | `array_distance`               | Euclidean distance         |
| `cosine` | `array_cosine_distance`        | Cosine similarity distance |
| `ip`     | `array_negative_inner_product` | Negative inner product     |

Note that while each `HNSW` index only applies to a single column you can create multiple `HNSW` indexes on the same table each individually indexing a different column. Additionally, you can also create multiple `HNSW` indexes to the same column, each supporting a different distance metric.

#### Index Options {#docs:current:core_extensions:vss::index-options}

Besides the `metric` option, the `HNSW` index creation statement also supports the following options to control the hyperparameters of the index construction and search process:

| Option | Default | Description |
|-------|--:|----------------------------|
| `ef_construction` | 128     | The number of candidate vertices to consider during the construction of the index. A higher value will result in a more accurate index, but will also increase the time it takes to build the index.                           |
| `ef_search`       | 64      | The number of candidate vertices to consider during the search phase of the index. A higher value will result in a more accurate index, but will also increase the time it takes to perform a search.                          |
| `M`               | 16      | The maximum number of neighbors to keep for each vertex in the graph. A higher value will result in a more accurate index, but will also increase the time it takes to build the index.                                        |
| `M0`              | 2 * `M` | The base connectivity, or the number of neighbors to keep for each vertex in the zero-th level of the graph. A higher value will result in a more accurate index, but will also increase the time it takes to build the index. |

Additionally, you can also override the `ef_search` parameter set at index construction time by setting the `SET hnsw_ef_search = ⟨int⟩`{:.language-sql .highlight} configuration option at runtime. This can be useful if you want to trade search performance for accuracy or vice-versa on a per-connection basis. You can also unset the override by calling `RESET hnsw_ef_search`{:.language-sql .highlight}.

#### Persistence {#docs:current:core_extensions:vss::persistence}

Due to some known issues related to persistence of custom extension indexes, the `HNSW` index can only be created on tables in in-memory databases by default, unless the `SET hnsw_enable_experimental_persistence = ⟨bool⟩`{:.language-sql .highlight} configuration option is set to `true`.

The reasoning for locking this feature behind an experimental flag is that “WAL” recovery is not yet properly implemented for custom indexes, meaning that if a crash occurs or the database is shut down unexpectedly while there are uncommitted changes to a `HNSW`-indexed table, you can end up with **data loss or corruption of the index**.

If you enable this option and experience an unexpected shutdown, you can try to recover the index by first starting DuckDB separately, loading the `vss` extension and then `ATTACH`ing the database file, which ensures that the `HNSW` index functionality is available during WAL-playback, allowing DuckDB's recovery process to proceed without issues. But we still recommend that you do not use this feature in production environments.

With the `hnsw_enable_experimental_persistence` option enabled, the index will be persisted into the DuckDB database file (if you run DuckDB with a disk-backed database file), which means that after a database restart, the index can be loaded back into memory from disk instead of having to be re-created. With that in mind, there are no incremental updates to persistent index storage, so every time DuckDB performs a checkpoint the entire index will be serialized to disk and overwrite itself. Similarly, after a restart of the database, the index will be deserialized back into main memory in its entirety. Although this will be deferred until you first access the table associated with the index. Depending on how large the index is, the deserialization process may take some time, but it should still be faster than simply dropping and re-creating the index.

#### Inserts, Updates, Deletes and Re-Compaction {#docs:current:core_extensions:vss::inserts-updates-deletes-and-re-compaction}

The HNSW index does support inserting, updating and deleting rows from the table after index creation. However, there are two things to keep in mind:

* It's faster to create the index after the table has been populated with data as the initial bulk load can make better use of parallelism on large tables.
* Deletes are not immediately reflected in the index, but are instead “marked” as deleted, which can cause the index to grow stale over time and negatively impact query quality and performance.

To remedy the last point, you can call the `PRAGMA hnsw_compact_index('⟨index_name⟩')`{:.language-sql .highlight} pragma function to trigger a re-compaction of the index pruning deleted items, or re-create the index after a significant number of updates.

#### Bonus: Vector Similarity Search Joins {#docs:current:core_extensions:vss::bonus-vector-similarity-search-joins}

The `vss` extension also provides a couple of table macros to simplify matching multiple vectors against each other, so called "fuzzy joins". These are:

* `vss_join(left_table, right_table, left_col, right_col, k, metric := 'l2sq')`
* `vss_match(right_table", left_col, right_col, k, metric := 'l2sq')`

These **do not** currently make use of the `HNSW` index but are provided as convenience utility functions for users who are ok with performing brute-force vector similarity searches without having to write out the join logic themselves. In the future these might become targets for index-based optimizations as well.

These functions can be used as follows:

```sql
CREATE TABLE haystack (id int, vec FLOAT[3]);
CREATE TABLE needle (search_vec FLOAT[3]);

INSERT INTO haystack
    SELECT row_number() OVER (), array_value(a, b, c)
    FROM range(1, 10) ra(a), range(1, 10) rb(b), range(1, 10) rc(c);

INSERT INTO needle
    VALUES ([5, 5, 5]), ([1, 1, 1]);

SELECT *
FROM vss_join(needle, haystack, search_vec, vec, 3) res;
```

```text
┌───────┬─────────────────────────────────┬─────────────────────────────────────┐
│ score │            left_tbl             │              right_tbl              │
│ float │   struct(search_vec float[3])   │  struct(id integer, vec float[3])   │
├───────┼─────────────────────────────────┼─────────────────────────────────────┤
│   0.0 │ {'search_vec': [5.0, 5.0, 5.0]} │ {'id': 365, 'vec': [5.0, 5.0, 5.0]} │
│   1.0 │ {'search_vec': [5.0, 5.0, 5.0]} │ {'id': 364, 'vec': [5.0, 4.0, 5.0]} │
│   1.0 │ {'search_vec': [5.0, 5.0, 5.0]} │ {'id': 356, 'vec': [4.0, 5.0, 5.0]} │
│   0.0 │ {'search_vec': [1.0, 1.0, 1.0]} │ {'id': 1, 'vec': [1.0, 1.0, 1.0]}   │
│   1.0 │ {'search_vec': [1.0, 1.0, 1.0]} │ {'id': 10, 'vec': [2.0, 1.0, 1.0]}  │
│   1.0 │ {'search_vec': [1.0, 1.0, 1.0]} │ {'id': 2, 'vec': [1.0, 2.0, 1.0]}   │
└───────┴─────────────────────────────────┴─────────────────────────────────────┘
```

Alternatively, we can use the `vss_match` macro as a “lateral join” to get the matches already grouped by the left table.
Note that this requires us to specify the left table first, and then the `vss_match` macro which references the search column from the left
table (in this case, `search_vec`):

```sql
SELECT *
FROM needle, vss_match(haystack, search_vec, vec, 3) res;
```

```text
┌─────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│   search_vec    │                                                                                       matches                                                                                        │
│    float[3]     │                                                            struct(score float, "row" struct(id integer, vec float[3]))[]                                                             │
├─────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ [5.0, 5.0, 5.0] │ [{'score': 0.0, 'row': {'id': 365, 'vec': [5.0, 5.0, 5.0]}}, {'score': 1.0, 'row': {'id': 364, 'vec': [5.0, 4.0, 5.0]}}, {'score': 1.0, 'row': {'id': 356, 'vec': [4.0, 5.0, 5.0]}}] │
│ [1.0, 1.0, 1.0] │ [{'score': 0.0, 'row': {'id': 1, 'vec': [1.0, 1.0, 1.0]}}, {'score': 1.0, 'row': {'id': 10, 'vec': [2.0, 1.0, 1.0]}}, {'score': 1.0, 'row': {'id': 2, 'vec': [1.0, 2.0, 1.0]}}]      │
└─────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
```

#### Limitations {#docs:current:core_extensions:vss::limitations}

* Only vectors consisting of `FLOAT`s (32-bit, single precision) are supported at the moment.
* The index itself is not buffer managed and must be able to fit into RAM memory.
* The size of the index in memory does not count towards DuckDB's `memory_limit` configuration parameter.
* `HNSW` indexes can only be created on tables in in-memory databases, unless the `SET hnsw_enable_experimental_persistence = ⟨bool⟩`{:.language-sql .highlight} configuration option is set to `true`, see [Persistence](#::persistence) for more information.
* The vector join table macros (` vss_join` and `vss_match`) do not require or make use of the `HNSW` index.

# Guides {#guides}

## Guides {#docs:current:guides:overview}

The guides section contains compact how-to guides that are focused on achieving a single goal.
For API references and examples, see the rest of the documentation.

Note that there are many tools using DuckDB, which are not covered in the official guides.
To find a list of these tools, check out the [Awesome DuckDB repository](https://github.com/davidgasquez/awesome-duckdb).

> **Tip.** For a short introductory tutorial, check out the [“Analyzing Railway Traffic in the Netherlands”](https://duckdb.org/2024/05/31/analyzing-railway-traffic-in-the-netherlands) tutorial.

#### Data Import and Export {#docs:current:guides:overview::data-import-and-export}

* [Data import overview](#docs:current:guides:file_formats:overview)
* [File access with the `file:` protocol](#docs:current:guides:file_formats:file_access)
* [Reading DuckDB databases](#docs:current:guides:file_formats:read_duckdb)

##### CSV Files {#docs:current:guides:overview::csv-files}

* [How to load a CSV file into a table](#docs:current:guides:file_formats:csv_import)
* [How to export a table to a CSV file](#docs:current:guides:file_formats:csv_export)

##### Parquet Files {#docs:current:guides:overview::parquet-files}

* [How to load a Parquet file into a table](#docs:current:guides:file_formats:parquet_import)
* [How to export a table to a Parquet file](#docs:current:guides:file_formats:parquet_export)
* [How to run a query directly on a Parquet file](#docs:current:guides:file_formats:query_parquet)

##### HTTP(S), S3 and GCP {#docs:current:guides:overview::https-s3-and-gcp}

* [How to load a Parquet file directly from HTTP(S)](#docs:current:guides:network_cloud_storage:http_import)
* [How to load a Parquet file directly from S3](#docs:current:guides:network_cloud_storage:s3_import)
* [How to export a Parquet file to S3](#docs:current:guides:network_cloud_storage:s3_export)
* [How to load a Parquet file from S3 Express One](#docs:current:guides:network_cloud_storage:s3_express_one)
* [How to load a Parquet file directly from GCS](#docs:current:guides:network_cloud_storage:gcs_import)
* [How to load a Parquet file directly from Cloudflare R2](#docs:current:guides:network_cloud_storage:cloudflare_r2_import)
* [How to load an Iceberg table directly from S3](#docs:current:guides:network_cloud_storage:s3_iceberg_import)

##### JSON Files {#docs:current:guides:overview::json-files}

* [How to load a JSON file into a table](#docs:current:guides:file_formats:json_import)
* [How to export a table to a JSON file](#docs:current:guides:file_formats:json_export)

##### Excel Files with the Spatial Extension {#docs:current:guides:overview::excel-files-with-the-spatial-extension}

* [How to load an Excel file into a table](#docs:current:guides:file_formats:excel_import)
* [How to export a table to an Excel file](#docs:current:guides:file_formats:excel_export)

##### Querying Other Database Systems {#docs:current:guides:overview::querying-other-database-systems}

* [How to directly query a MySQL database](#docs:current:guides:database_integration:mysql)
* [How to directly query a PostgreSQL database](#docs:current:guides:database_integration:postgres)
* [How to directly query a SQLite database](#docs:current:guides:database_integration:sqlite)

##### Directly Reading Files {#docs:current:guides:overview::directly-reading-files}

* [How to directly read a binary file](#docs:current:guides:file_formats:read_file::read_blob)
* [How to directly read a text file](#docs:current:guides:file_formats:read_file::read_text)

#### Performance {#docs:current:guides:overview::performance}

* [My workload is slow (troubleshooting guide)](#docs:current:guides:performance:my_workload_is_slow)
* [How to design the schema for optimal performance](#docs:current:guides:performance:schema)
* [What is the ideal hardware environment for DuckDB](#docs:current:guides:performance:environment)
* [What performance implications do Parquet files and (compressed) CSV files have](#docs:current:guides:performance:file_formats)
* [How to tune workloads](#docs:current:guides:performance:how_to_tune_workloads)
* [Benchmarks](#docs:current:guides:performance:benchmarks)

#### Meta Queries {#docs:current:guides:overview::meta-queries}

* [How to list all tables](#docs:current:guides:meta:list_tables)
* [How to view the schema of the result of a query](#docs:current:guides:meta:describe)
* [How to quickly get a feel for a dataset using summarize](#docs:current:guides:meta:summarize)
* [How to view the query plan of a query](#docs:current:guides:meta:explain)
* [How to profile a query](#docs:current:guides:meta:explain_analyze)

#### ODBC {#docs:current:guides:overview::odbc}

* [How to set up an ODBC application (and more!)]({% link docs/current/guides/odbc/general.md %})

#### Python Client {#docs:current:guides:overview::python-client}

* [How to install the Python client](#docs:current:guides:python:install)
* [How to execute SQL queries](#docs:current:guides:python:execute_sql)
* [How to easily query DuckDB in Jupyter Notebooks](#docs:current:guides:python:jupyter)
* [How to easily query DuckDB in marimo Notebooks](#docs:current:guides:python:marimo)
* [How to use Multiple Python Threads with DuckDB](#docs:current:guides:python:multiple_threads)
* [How to use fsspec filesystems with DuckDB](#docs:current:guides:python:filesystems)

##### Pandas {#docs:current:guides:overview::pandas}

* [How to execute SQL on a Pandas DataFrame](#docs:current:guides:python:sql_on_pandas)
* [How to create a table from a Pandas DataFrame](#docs:current:guides:python:import_pandas)
* [How to export data to a Pandas DataFrame](#docs:current:guides:python:export_pandas)

##### Apache Arrow {#docs:current:guides:overview::apache-arrow}

* [How to execute SQL on Apache Arrow](#docs:current:guides:python:sql_on_arrow)
* [How to create a DuckDB table from Apache Arrow](#docs:current:guides:python:import_arrow)
* [How to export data to Apache Arrow](#docs:current:guides:python:export_arrow)

##### Relational API {#docs:current:guides:overview::relational-api}

* [How to query Pandas DataFrames with the Relational API](#docs:current:guides:python:relational_api_pandas)

##### Python Library Integrations {#docs:current:guides:overview::python-library-integrations}

* [How to use Ibis to query DuckDB with or without SQL](#docs:current:guides:python:ibis)
* [How to use DuckDB with Polars DataFrames via Apache Arrow](#docs:current:guides:python:polars)

#### SQL Features {#docs:current:guides:overview::sql-features}

* [Friendly SQL](#docs:current:sql:dialect:friendly_sql)
* [As-of join](#docs:current:guides:sql_features:asof_join)
* [Full-text search](#docs:current:guides:sql_features:full_text_search)
* [Graph queries](#docs:current:guides:sql_features:graph_queries)
* [`query` and `query_table` functions](#docs:current:guides:sql_features:query_and_query_table_functions)

#### SQL Editors and IDEs {#docs:current:guides:overview::sql-editors-and-ides}

* [How to set up the DBeaver SQL IDE](#docs:current:guides:sql_editors:dbeaver)

#### Data Viewers {#docs:current:guides:overview::data-viewers}

* [How to visualize DuckDB databases with Tableau](#docs:current:guides:data_viewers:tableau)
* [How to draw command-line plots with DuckDB and YouPlot](#docs:current:guides:data_viewers:youplot)

## Data Viewers {#guides:data_viewers}

### Tableau – A Data Visualization Tool {#docs:current:guides:data_viewers:tableau}

[Tableau](https://www.tableau.com/) is a popular commercial data visualization tool.
In addition to a large number of built-in connectors,
it also provides generic database connectivity via ODBC and JDBC connectors.

Tableau has two main versions: Desktop and Online (Server).

* For Desktop, connecting to a DuckDB database is similar to working in an embedded environment like Python.
* For Online, since DuckDB is in-process, the data needs to be either on the server itself or in a remote data bucket that is accessible from the server.

#### Database Creation {#docs:current:guides:data_viewers:tableau::database-creation}

When using a DuckDB database file
the datasets do not actually need to be imported into DuckDB tables;
it suffices to create views of the data.
For example, this will create a view of the `h2oai` Parquet test file in the current DuckDB code base:

```sql
CREATE VIEW h2oai AS (
    FROM read_parquet('/Users/username/duckdb/data/parquet-testing/h2oai/h2oai_group_small.parquet')
);
```

Note that you should use full path names to local files so that they can be found from inside Tableau.
Also note that you will need to use a version of the driver that is compatible (i.e., from the same release)
as the database format used by the DuckDB tool (e.g., Python module, command line) that was used to create the file.

#### Installing the JDBC Driver {#docs:current:guides:data_viewers:tableau::installing-the-jdbc-driver}

Tableau provides documentation on how to [install a JDBC driver](https://help.tableau.com/current/pro/desktop/en-gb/jdbc_tableau.htm)
for Tableau to use.

> Both Tableau Desktop and Server need to be restarted any time you add or modify drivers.

##### Driver Links {#docs:current:guides:data_viewers:tableau::driver-links}

The link here is for a recent version of the JDBC driver that is compatible with Tableau.
If you wish to connect to a database file,
you will need to make sure the file was created with a file-compatible version of DuckDB.
Also, check that there is only one version of the driver installed as there are multiple filenames in use.


Download the [JAR file](https://repo1.maven.org/maven2/org/duckdb/duckdb_jdbc/1.5.2.0/duckdb_jdbc-1.5.2.0.jar).


* macOS: Copy it to `~/Library/Tableau/Drivers/`.
* Windows: Copy it to `C:\Program Files\Tableau\Drivers`.
* Linux: Copy it to `/opt/tableau/tableau_driver/jdbc`.

#### Using the PostgreSQL Dialect {#docs:current:guides:data_viewers:tableau::using-the-postgresql-dialect}

If you just want to do something simple, you can try connecting directly to the JDBC driver
and using the Tableau-provided PostgreSQL dialect.

1. Create a DuckDB file containing your views and/or data.
2. Launch Tableau.
3. Under Connect > To a Server > More…, click on "Other Databases (JDBC)". This brings up the connection dialog box. For the URL, enter `jdbc:duckdb:/User/username/path/to/database.db`. For the Dialect, choose PostgreSQL. The rest of the fields can be ignored:

![Tableau PostgreSQL](../images/guides/tableau/tableau-osx-jdbc.png)

However, functionality will be missing such as `median` and `percentile` aggregate functions.
To make the data source connection more compatible with the PostgreSQL dialect,
please use the DuckDB taco connector as described below.

#### Installing the Tableau DuckDB Connector {#docs:current:guides:data_viewers:tableau::installing-the-tableau-duckdb-connector}

While it is possible to use the Tableau-provided PostgreSQL dialect to communicate with the DuckDB JDBC driver,
we strongly recommend using the [DuckDB Taco connector](https://github.com/motherduckdb/duckdb-tableau-connector).
This connector has been fully tested against the Tableau dialect generator
and [is more compatible](https://github.com/motherduckdb/duckdb-tableau-connector/blob/main/tableau_connectors/duckdb_jdbc/dialect.tdd)
than the provided PostgreSQL dialect.

The documentation on how to install and use the connector is in its repository,
but essentially you will need the
[`duckdb_jdbc.taco`](https://github.com/motherduckdb/duckdb-tableau-connector/raw/main/packaged-connector/duckdb_jdbc-v1.0.0-signed.taco) file.
(Despite what the Tableau documentation says, the real security risk is in the JDBC driver code,
not the small amount of JavaScript in the Taco file.)

##### Server (Online) {#docs:current:guides:data_viewers:tableau::server-online}

On Linux, copy the Taco file to `/opt/tableau/connectors`.
On Windows, copy the Taco file to `C:\Program Files\Tableau\Connectors`.
Then issue these commands to disable signature validation:

```batch
tsm configuration set -k native_api.disable_verify_connector_plugin_signature -v true
```

```batch
tsm pending-changes apply
```

The last command will restart the server with the new settings.

##### macOS {#docs:current:guides:data_viewers:tableau::macos}

Copy the Taco file to the `/Users/[User]/Documents/My Tableau Repository/Connectors` folder.
Then launch Tableau Desktop from the Terminal with the command line argument to disable signature validation:

```batch
/Applications/Tableau\ Desktop\ ⟨year⟩.⟨quarter⟩.app/Contents/MacOS/Tableau -DDisableVerifyConnectorPluginSignature=true
```

You can also package this up with AppleScript by using the following script:

```tableau
do shell script "\"/Applications/Tableau Desktop 2023.2.app/Contents/MacOS/Tableau\" -DDisableVerifyConnectorPluginSignature=true"
quit
```

Create this file with the [Script Editor](https://support.apple.com/guide/script-editor/welcome/mac)
(located in `/Applications/Utilities`)
and [save it as a packaged application](https://support.apple.com/guide/script-editor/save-a-script-as-an-app-scpedt1072/mac):

![tableau-applescript](../images/guides/tableau/applescript.png)

You can then double-click it to launch Tableau.
You will need to change the application name in the script when you get upgrades.

##### Windows Desktop {#docs:current:guides:data_viewers:tableau::windows-desktop}

Copy the Taco file to the `C:\Users\[Windows User]\Documents\My Tableau Repository\Connectors` directory.
Then launch Tableau Desktop from a shell with the `-DDisableVerifyConnectorPluginSignature=true` argument
to disable signature validation.

#### Output {#docs:current:guides:data_viewers:tableau::output}

Once loaded, you can run queries against your data!
Here is the result of the first H2O.ai benchmark query from the Parquet test file:

![tableau-parquet](../images/guides/tableau/h2oai-group-by-1.png)

### CLI Charting with YouPlot {#docs:current:guides:data_viewers:youplot}

DuckDB can be used with CLI graphing tools to quickly pipe input to stdout to graph your data in one line.

[YouPlot](https://github.com/red-data-tools/YouPlot) is a Ruby-based CLI tool for drawing visually pleasing plots on the terminal. It can accept input from other programs by piping data from `stdin`. It takes tab-separated (or delimiter of your choice) data and can easily generate various types of plots including bar, line, histogram and scatter.

With DuckDB, you can write to the console (` stdout`) by using the `TO '/dev/stdout'` command. And you can also write comma-separated values by using `WITH (FORMAT csv, HEADER)`.

#### Installing YouPlot {#docs:current:guides:data_viewers:youplot::installing-youplot}

Installation instructions for YouPlot can be found on the main [YouPlot repository](https://github.com/red-data-tools/YouPlot#installation). If you're on a Mac, you can use:

```batch
brew install youplot
```

Run `uplot --help` to ensure you've installed it successfully!

#### Piping DuckDB Queries to stdout {#docs:current:guides:data_viewers:youplot::piping-duckdb-queries-to-stdout}

By combining the [`COPY...TO`](#docs:current:sql:statements:copy::copy-to) function with a CSV output file, you can read data from any format DuckDB supports and pipe it to YouPlot. Follow these three steps:

1. First, read all data from `input.json`:

   ```batch
   duckdb -s "SELECT * FROM read_json_auto('input.json')"
   ```

2. To prepare the data for YouPlot, write a simple aggregate:

   ```batch
   duckdb -s "SELECT date, sum(purchases) AS total_purchases FROM read_json_auto('input.json') GROUP BY 1 ORDER BY 2 DESC LIMIT 10"
   ```

3. Finally, wrap the `SELECT` in the `COPY ... TO` function with an output location of `/dev/stdout`.

   The syntax looks like this:

   ```sql
   COPY (⟨query⟩) TO '/dev/stdout' WITH (FORMAT csv, HEADER);
   ```

   The full DuckDB command below outputs the query in CSV format with a header:

   ```batch
   duckdb -s "COPY (SELECT date, sum(purchases) AS total_purchases FROM read_json_auto('input.json') GROUP BY 1 ORDER BY 2 DESC LIMIT 10) TO '/dev/stdout' WITH (FORMAT csv, HEADER)"
   ```

#### Connecting DuckDB to YouPlot {#docs:current:guides:data_viewers:youplot::connecting-duckdb-to-youplot}

Finally, the data can now be piped to YouPlot! Let's assume we have an `input.json` file with dates and number of purchases made by somebody on that date. Using the query above, we'll pipe the data to the `uplot` command to draw a plot of the Top 10 Purchase Dates.

```batch
duckdb -s "COPY (SELECT date, sum(purchases) AS total_purchases FROM read_json_auto('input.json') GROUP BY 1 ORDER BY 2 DESC LIMIT 10) TO '/dev/stdout' WITH (FORMAT csv, HEADER)" \
     | uplot bar -d, -H -t "Top 10 Purchase Dates"
```

This tells `uplot` to draw a bar plot, use a comma-separated delimiter (` -d,`), that the data has a header (` -H`), and give the plot a title (` -t`).

![youplot-top-10](../images/guides/youplot/top-10-plot.png)

#### Additional Example: stdin + stdout {#docs:current:guides:data_viewers:youplot::additional-example-stdin--stdout}

You might be piping data through `jq` or downloading a JSON file from somewhere. You can also tell DuckDB to read data from another process by changing the filename to `/dev/stdin`.

Let's combine this with a quick `curl` from GitHub to see what a certain user has been up to lately.

```batch
curl -sL "https://api.github.com/users/dacort/events?per_page=100" \
     | duckdb -s "COPY (SELECT type, count(*) AS event_count FROM read_json_auto('/dev/stdin') GROUP BY 1 ORDER BY 2 DESC LIMIT 10) TO '/dev/stdout' WITH (FORMAT csv, HEADER)" \
     | uplot bar -d, -H -t "GitHub Events for @dacort"
```

![github-events](../images/guides/youplot/github-events.png)

## Database Integration {#guides:database_integration}

### Database Integration {#docs:current:guides:database_integration:overview}


### MySQL Import {#docs:current:guides:database_integration:mysql}

To run a query directly on a running MySQL database, the [`mysql` extension](#docs:current:core_extensions:mysql) is required.

#### Installation and Loading {#docs:current:guides:database_integration:mysql::installation-and-loading}

The extension can be installed using the `INSTALL` SQL command. This only needs to be run once.

```sql
INSTALL mysql;
```

To load the `mysql` extension for usage, use the `LOAD` SQL command:

```sql
LOAD mysql;
```

#### Usage {#docs:current:guides:database_integration:mysql::usage}

After the `mysql` extension is installed, you can attach to a MySQL database using the following command:

```sql
ATTACH 'host=localhost user=root port=0 database=mysqlscanner' AS mysql_db (TYPE mysql, READ_ONLY);
USE mysql_db;
```

The string used by `ATTACH` is a PostgreSQL-style connection string (_not_ a MySQL connection string!). It is a list of connection arguments provided in `{key}={value}` format. Below is a list of valid arguments. Any options not provided are replaced by their default values.

|  Setting   |   Default    |
|------------|--------------|
| `database` | `NULL`       |
| `host`     | `localhost`  |
| `password` |              |
| `port`     | `0`          |
| `socket`   | `NULL`       |
| `user`     | current user |

You can directly read and write the MySQL database:

```sql
CREATE TABLE tbl (id INTEGER, name VARCHAR);
INSERT INTO tbl VALUES (42, 'DuckDB');
```

For a list of supported operations, see the [MySQL extension documentation](#docs:current:core_extensions:mysql::supported-operations).

### PostgreSQL Import {#docs:current:guides:database_integration:postgres}

To run a query directly on a running PostgreSQL database, the [`postgres` extension](#docs:current:core_extensions:postgres) is required.

#### Installation and Loading {#docs:current:guides:database_integration:postgres::installation-and-loading}

The extension can be installed using the `INSTALL` SQL command. This only needs to be run once.

```sql
INSTALL postgres;
```

To load the `postgres` extension for usage, use the `LOAD` SQL command:

```sql
LOAD postgres;
```

#### Usage {#docs:current:guides:database_integration:postgres::usage}

After the `postgres` extension is installed, tables can be queried from PostgreSQL using the `postgres_scan` function:

```sql
-- Scan the table "mytable" from the schema "public" in the database "mydb"
SELECT * FROM postgres_scan('host=localhost port=5432 dbname=mydb', 'public', 'mytable');
```

The first parameter to the `postgres_scan` function is the [PostgreSQL connection string](https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING), a list of connection arguments provided in `{key}={value}` format. Below is a list of valid arguments.

| Name       | Description                          | Default        |
| ---------- | ------------------------------------ | -------------- |
| `host`     | Name of host to connect to           | `localhost`    |
| `hostaddr` | Host IP address                      | `localhost`    |
| `port`     | Port number                          | `5432`         |
| `user`     | PostgreSQL user name                 | [OS user name] |
| `password` | PostgreSQL password                  |                |
| `dbname`   | Database name                        | [user]         |
| `passfile` | Name of file passwords are stored in | `~/.pgpass`    |

Alternatively, the entire database can be attached using the `ATTACH` command. This allows you to query all tables stored within the PostgreSQL database as if it was a regular database.

```sql
-- Attach the PostgreSQL database using the given connection string
ATTACH 'host=localhost port=5432 dbname=mydb' AS test (TYPE postgres);
-- The table "tbl_name" can now be queried as if it is a regular table
SELECT * FROM test.tbl_name;
-- Switch the active database to "test"
USE test;
-- List all tables in the file
SHOW TABLES;
```

For more information see the [PostgreSQL extension documentation](#docs:current:core_extensions:postgres).

### SQLite Import {#docs:current:guides:database_integration:sqlite}

To run a query directly on a SQLite file, the `sqlite` extension is required.

#### Installation and Loading {#docs:current:guides:database_integration:sqlite::installation-and-loading}

The extension can be installed using the `INSTALL` SQL command. This only needs to be run once.

```sql
INSTALL sqlite;
```

To load the `sqlite` extension for usage, use the `LOAD` SQL command:

```sql
LOAD sqlite;
```

#### Usage {#docs:current:guides:database_integration:sqlite::usage}

After the SQLite extension is installed, tables can be queried from SQLite using the `sqlite_scan` function:

```sql
-- Scan the table "tbl_name" from the SQLite file "test.db"
SELECT * FROM sqlite_scan('test.db', 'tbl_name');
```

Alternatively, the entire file can be attached using the `ATTACH` command. This allows you to query all tables stored within a SQLite database file as if they were a regular database.

```sql
-- Attach the SQLite file "test.db"
ATTACH 'test.db' AS test (TYPE sqlite);
-- The table "tbl_name" can now be queried as if it is a regular table
SELECT * FROM test.tbl_name;
-- Switch the active database to "test"
USE test;
-- List all tables in the file
SHOW TABLES;
```

For more information see the [SQLite extension documentation](#docs:current:core_extensions:sqlite).

## File Formats {#guides:file_formats}

### File Formats {#docs:current:guides:file_formats:overview}


### CSV Import {#docs:current:guides:file_formats:csv_import}

To read data from a CSV file, use the `read_csv` function in the `FROM` clause of a query:

```sql
SELECT * FROM read_csv('input.csv');
```

Alternatively, you can omit the `read_csv` function and let DuckDB infer it from the extension:

```sql
SELECT * FROM 'input.csv';
```

To create a new table using the result from a query, use [`CREATE TABLE ... AS SELECT` statement](#docs:current:sql:statements:create_table::create-table--as-select-ctas):

```sql
CREATE TABLE new_tbl AS
    SELECT * FROM read_csv('input.csv');
```

We can use DuckDB's [optional `FROM`-first syntax](#docs:current:sql:query_syntax:from) to omit `SELECT *`:

```sql
CREATE TABLE new_tbl AS
    FROM read_csv('input.csv');
```

To load data into an existing table from a query, use `INSERT INTO` from a `SELECT` statement:

```sql
INSERT INTO tbl
    SELECT * FROM read_csv('input.csv');
```

Alternatively, the `COPY` statement can also be used to load data from a CSV file into an existing table:

```sql
COPY tbl FROM 'input.csv';
```

For additional options, see the [CSV import reference](#docs:current:data:csv:overview) and the [`COPY` statement documentation](#docs:current:sql:statements:copy).

### CSV Export {#docs:current:guides:file_formats:csv_export}

To export the data from a table to a CSV file, use the `COPY` statement:

```sql
COPY tbl TO 'output.csv' (HEADER, DELIMITER ',');
```

The result of queries can also be directly exported to a CSV file:

```sql
COPY (SELECT * FROM tbl) TO 'output.csv' (HEADER, DELIMITER ',');
```

For additional options, see the [`COPY` statement documentation](#docs:current:sql:statements:copy::csv-options).

### Directly Reading Files {#docs:current:guides:file_formats:read_file}

DuckDB allows directly reading files via the [`read_text`](#::read_text) and [`read_blob`](#::read_blob) functions.
These functions accept a filename, a list of filenames, or a glob pattern. They output the content of each file as a `VARCHAR` or `BLOB`, respectively, along with metadata such as the file size and last modified time.

#### `read_text` {#docs:current:guides:file_formats:read_file::read_text}

The `read_text` table function reads from the selected source(s) to a `VARCHAR`. Each file results in a single row with the `content` field holding the entire content of the respective file.

```sql
SELECT size, parse_path(filename), content
FROM read_text('test/sql/table_function/files/*.txt');
```



| size |             parse_path(filename)              |      content     |
|-----:|-----------------------------------------------|------------------|
| 12   | [test, sql, table_function, files, one.txt]   | Hello World!     |
| 2    | [test, sql, table_function, files, three.txt] | 42               |
| 10   | [test, sql, table_function, files, two.txt]   | Foo Bar\nFöö Bär |

DuckDB first validates the file content as valid UTF-8. If `read_text` attempts to read a file with invalid UTF-8, DuckDB throws an error suggesting to use [`read_blob`](#::read_blob) instead.

`read_text` also supports reading from pipes (e.g., `/dev/stdin`).

> The maximum allowed file size for `read_text` is 3.9 GiB.

#### `read_blob` {#docs:current:guides:file_formats:read_file::read_blob}

The `read_blob` table function reads from the selected source(s) to a `BLOB`:

```sql
SELECT size, content, filename
FROM read_blob('test/sql/table_function/files/*');
```



| size |                              content                         |                filename                 |
|-----:|--------------------------------------------------------------|-----------------------------------------|
| 178  |  PK\x03\x04\x0A\x00\x00\x00\x00\x00\xACi=X\x14t\xCE\xC7\x0A… | test/sql/table_function/files/four.blob |
| 12   | Hello World!                                                 | test/sql/table_function/files/one.txt   |
| 2    | 42                                                           | test/sql/table_function/files/three.txt |
| 10   | F\xC3\xB6\xC3\xB6 B\xC3\xA4r                                 | test/sql/table_function/files/two.txt   |

> The maximum allowed file size for `read_blob` is 3.9 GiB.

#### Schema {#docs:current:guides:file_formats:read_file::schema}

The schemas of the tables returned by `read_text` and `read_blob` are identical:

```sql
DESCRIBE FROM read_text('README.md');
```



|  column_name  | column_type | null | key  | default | extra |
|---------------|-------------|------|------|---------|-------|
| filename      | VARCHAR     | YES  | NULL | NULL    | NULL  |
| content       | VARCHAR     | YES  | NULL | NULL    | NULL  |
| size          | BIGINT      | YES  | NULL | NULL    | NULL  |
| last_modified | TIMESTAMP   | YES  | NULL | NULL    | NULL  |

#### Hive Partitioning {#docs:current:guides:file_formats:read_file::hive-partitioning}

Data can be read from [Hive partitioned](#docs:current:data:partitioning:hive_partitioning) datasets.

```sql
SELECT *
FROM read_blob('data/parquet-testing/hive-partitioning/simple/**/*.parquet')
WHERE part IN ('a', 'b') AND date >= '2012-01-01';
```



|             filename                  |           content             | size |      last_modified     |    date    |  part   |
|---------------------------------------|-------------------------------|------|------------------------|------------|---------|
| …/part=a/date=2012-01-01/test.parquet | PAR1\x15\x00\x15\x14\x15\x18… | 266  | 2024-11-12 02:23:20+00 | 2012-01-01 | a       |
| …/part=b/date=2013-01-01/test.parquet | PAR1\x15\x00\x15\x14\x15\x18… | 266  | 2024-11-12 02:23:20+00 | 2013-01-01 | b       |


#### Handling Missing Metadata {#docs:current:guides:file_formats:read_file::handling-missing-metadata}

When the underlying filesystem cannot provide this data (e.g., HTTPFS may not always return a valid timestamp), the cell is set to `NULL` instead.

#### Support for Projection Pushdown {#docs:current:guides:file_formats:read_file::support-for-projection-pushdown}

These table functions also use projection pushdown to avoid computing properties unnecessarily. For example, you can glob a directory of large files to get file sizes in the `size` column. As long as you omit the `content` column, DuckDB won't read the file data.

### Directly Read DuckDB Databases {#docs:current:guides:file_formats:read_duckdb}

DuckDB allows directly reading DuckDB files through the `read_duckdb` function:

```sql
read_duckdb(⟨'path_to_database'⟩, table_name = ⟨'table_to_read'⟩);
```

Using this function is equivalent to performing the following steps:

* Attaching to the database using a read-only connection.
* Querying the table specified through the `table_name` argument.
* Closing the connection to the database database.

#### Examples {#docs:current:guides:file_formats:read_duckdb::examples}

##### Reading a Specific Table {#docs:current:guides:file_formats:read_duckdb::reading-a-specific-table}

To read the `region` table from the TPC-H dataset, run:

```sql
SELECT r_regionkey, r_name
FROM read_duckdb('https://blobs.duckdb.org/data/tpch-sf10.db', table_name = 'region');
```

```text
┌─────────────┬─────────────┐
│ r_regionkey │   r_name    │
│    int32    │   varchar   │
├─────────────┼─────────────┤
│           0 │ AFRICA      │
│           1 │ AMERICA     │
│           2 │ ASIA        │
│           3 │ EUROPE      │
│           4 │ MIDDLE EAST │
└─────────────┴─────────────┘
```

##### Reading from Multiple Databases {#docs:current:guides:file_formats:read_duckdb::reading-from-multiple-databases}

You can use [globbing](#docs:current:sql:functions:pattern_matching::globbing) to read from multiple databases.
Two illustrate this, let's create two tables:

```bash
duckdb my-1.duckdb \
    -c "CREATE TABLE numbers AS SELECT 42 AS x;" \
    -c "CREATE TABLE letters AS SELECT 'm' AS a;"

duckdb my-2.duckdb \
    -c "CREATE TABLE numbers AS SELECT 43 AS x;"
```

Then, in DuckDB, you can run:

```sql
SELECT x FROM read_duckdb('my-*.duckdb', table_name = 'numbers');
```

```text
┌───────┐
│   x   │
│ int32 │
├───────┤
│    42 │
│    43 │
└───────┘
```

##### Reading from Databases with a Single Table {#docs:current:guides:file_formats:read_duckdb::reading-from-databases-with-a-single-table}

If all databases in `read_duckdb`'s argument have a single table, the `table_name` argument is optional:

```sql
FROM read_duckdb('my-2.duckdb');
```

```text
┌───────┐
│   x   │
│ int32 │
├───────┤
│     3 │
└───────┘
```

If the extension is `.db` or `.duckdb`, you can also omit the `read_duckdb` call (similarly to how you can omit `read_csv` and `read_parquet`):

```sql
FROM 'my-2.duckdb';
```

#### Limitations {#docs:current:guides:file_formats:read_duckdb::limitations}

`read_duckdb` currently only supports reading from tables.
Reading from views is not yet supported.

### Excel Import {#docs:current:guides:file_formats:excel_import}

DuckDB supports reading Excel `.xlsx` files. However, `.xls` files are not supported.

#### Importing Excel Sheets {#docs:current:guides:file_formats:excel_import::importing-excel-sheets}

Use the `read_xlsx` function in the `FROM` clause of a query:

```sql
SELECT * FROM read_xlsx('test_excel.xlsx');
```

Alternatively, you can omit the `read_xlsx` function and let DuckDB infer it from the extension:

```sql
SELECT * FROM 'test_excel.xlsx';
```

However, if you want to be able to pass options to control the import behavior, you should use the `read_xlsx` function.

One such option is the `sheet` parameter, which allows specifying the name of the Excel worksheet:

```sql
SELECT * FROM read_xlsx('test_excel.xlsx', sheet = 'Sheet1');
```

By default, the first sheet is loaded if no sheet is specified.

#### Importing a Specific Range {#docs:current:guides:file_formats:excel_import::importing-a-specific-range}

To select a specific range of cells, use the `range` parameter with a string in the format `A1:B2`, where `A1` is the top-left cell and `B2` is the bottom-right cell:

```sql
SELECT * FROM read_xlsx('test_excel.xlsx', range = 'A1:B2');
```

For example, to skip the first 5 rows:

```sql
SELECT * FROM read_xlsx('test_excel.xlsx', range = 'A5:Z');
```

To skip the first 5 columns:

```sql
SELECT * FROM read_xlsx('test_excel.xlsx', range = 'E:Z');
```

If no range parameter is provided, DuckDB automatically infers the range as the rectangular region of cells between the first row of consecutive non-empty cells and the first empty row spanning the same columns.

By default, if no range is provided, DuckDB will stop reading the Excel file when encountering an empty row. But when a range is provided, the default is to read until the end of the range. This behavior can be controlled with the `stop_at_empty` parameter:

```sql
-- Read the first 100 rows, or until the first empty row, whichever comes first
SELECT * FROM read_xlsx('test_excel.xlsx', range = '1:100', stop_at_empty = true);

-- Always read the whole sheet, even if it contains empty rows
SELECT * FROM read_xlsx('test_excel.xlsx', stop_at_empty = false);
```

#### Creating a New Table {#docs:current:guides:file_formats:excel_import::creating-a-new-table}

To create a new table using the result from a query, use `CREATE TABLE ... AS` from a `SELECT` statement:

```sql
CREATE TABLE new_tbl AS
    SELECT * FROM read_xlsx('test_excel.xlsx', sheet = 'Sheet1');
```

#### Loading to an Existing Table {#docs:current:guides:file_formats:excel_import::loading-to-an-existing-table}

To load data into an existing table from a query, use `INSERT INTO` from a `SELECT` statement:

```sql
INSERT INTO tbl
    SELECT * FROM read_xlsx('test_excel.xlsx', sheet = 'Sheet1');
```

Alternatively, you can use the `COPY` statement with the `XLSX` format option to import an Excel file into an existing table:

```sql
COPY tbl FROM 'test_excel.xlsx' (FORMAT xlsx, SHEET 'Sheet1');
```

When using the `COPY` statement to load an Excel file into an existing table, the types of the columns in the target table will be used to coerce the types of the cells in the Excel sheet.

#### Importing a Sheet with/without a Header {#docs:current:guides:file_formats:excel_import::importing-a-sheet-withwithout-a-header}

To treat the first row as containing the names of the resulting columns, use the `header` parameter:

```sql
SELECT * FROM read_xlsx('test_excel.xlsx', header = true);
```

By default, the first row is treated as a header if all the cells in the first row (within the inferred or supplied range) are non-empty strings. To disable this behavior, set `header` to `false`.

#### Detecting Types {#docs:current:guides:file_formats:excel_import::detecting-types}

When not importing into an existing table, DuckDB will attempt to infer the types of the columns in the Excel sheet based on their contents and/or "number format".

- `TIMESTAMP`, `TIME`, `DATE` and `BOOLEAN` types are inferred when possible based on the "number format" applied to the cell.
- Text cells containing `TRUE` and `FALSE` are inferred as `BOOLEAN`.
- Empty cells are considered to be of type `DOUBLE` by default.
- Otherwise cells are inferred as `VARCHAR` or `DOUBLE` based on their contents.

You can adjust this behavior in several ways.

To treat all empty cells as `VARCHAR` instead of `DOUBLE`, set `empty_as_varchar` to `true`:

```sql
SELECT * FROM read_xlsx('test_excel.xlsx', empty_as_varchar = true);
```

To disable type inference completely and treat all cells as `VARCHAR`, set `all_varchar` to `true`:

```sql
SELECT * FROM read_xlsx('test_excel.xlsx', all_varchar = true);
```

Additionally, if the `ignore_errors` parameter is set to `true`, DuckDB will silently replace cells that can't be cast to the corresponding inferred column type with `NULL`s.

```sql
SELECT * FROM read_xlsx('test_excel.xlsx', ignore_errors = true);
```

#### See Also {#docs:current:guides:file_formats:excel_import::see-also}

DuckDB can also [export Excel files](#docs:current:guides:file_formats:excel_export).
For additional details on Excel support, see the [excel extension page](#docs:current:core_extensions:excel).

### Excel Export {#docs:current:guides:file_formats:excel_export}

DuckDB supports exporting data to Excel `.xlsx` files via the `excel` extension. Please note that `.xls` files are not supported.

To install and load the extension, run:

```sql
INSTALL excel;
LOAD excel;
```

#### Exporting Excel Sheets {#docs:current:guides:file_formats:excel_export::exporting-excel-sheets}

To export a table to an Excel file, use the `COPY` statement with the `FORMAT xlsx` option:

```sql
COPY tbl TO 'output.xlsx' WITH (FORMAT xlsx);
```

The result of a query can also be directly exported to an Excel file:

```sql
COPY (SELECT * FROM tbl) TO 'output.xlsx' WITH (FORMAT xlsx);
```

Or:

```sql
COPY (SELECT * FROM tbl) TO 'output.xlsx';
```

To write the column names as the first row in the Excel file, use the `HEADER` option:

```sql
COPY tbl TO 'output.xlsx' WITH (FORMAT xlsx, HEADER true);
```

To name the worksheet in the resulting Excel file, use the `SHEET` option:

```sql
COPY tbl TO 'output.xlsx' WITH (FORMAT xlsx, SHEET 'Sheet1');
```

#### Type Conversions {#docs:current:guides:file_formats:excel_export::type-conversions}

Because Excel only really supports storing numbers or strings – the equivalent of `VARCHAR` and `DOUBLE`, the following type conversions are automatically applied when writing XLSX files:

* Numeric types are cast to `DOUBLE`.
* Temporal types (` TIMESTAMP`, `DATE`, `TIME`, etc.) are converted to Excel "serial" numbers, that is the number of days since 1900-01-01 for dates and the fraction of a day for times. These are then styled with a "number format" so that they appear as dates or times when opened in Excel.
* `TIMESTAMP_TZ` and `TIME_TZ` are cast to UTC `TIMESTAMP` and `TIME` respectively, with the timezone information being lost.
* `BOOLEAN`s are converted to `1` and `0`, with a "number format" applied to make them appear as `TRUE` and `FALSE` in Excel.
* All other types are cast to `VARCHAR` and then written as text cells.

But you can of course also explicitly cast columns to a different type before exporting them to Excel:

```sql
COPY (SELECT CAST(a AS VARCHAR), b FROM tbl) TO 'output.xlsx' WITH (FORMAT xlsx);
```

#### See Also {#docs:current:guides:file_formats:excel_export::see-also}

DuckDB can also [import Excel files](#docs:current:guides:file_formats:excel_import).
For additional details on Excel support, see the [`excel` extension page](#docs:current:core_extensions:excel).

### JSON Import {#docs:current:guides:file_formats:json_import}

To read data from a JSON file, use the `read_json_auto` function in the `FROM` clause of a query:

```sql
SELECT *
FROM read_json_auto('input.json');
```

To create a new table using the result from a query, use `CREATE TABLE AS` from a `SELECT` statement:

```sql
CREATE TABLE new_tbl AS
    SELECT *
    FROM read_json_auto('input.json');
```

To load data into an existing table from a query, use `INSERT INTO` from a `SELECT` statement:

```sql
INSERT INTO tbl
    SELECT *
    FROM read_json_auto('input.json');
```

Alternatively, the `COPY` statement can also be used to load data from a JSON file into an existing table:

```sql
COPY tbl FROM 'input.json';
```

For additional options, see the [JSON Loading reference](#docs:current:data:json:overview) and the [`COPY` statement documentation](#docs:current:sql:statements:copy).

### JSON Export {#docs:current:guides:file_formats:json_export}

To export the data from a table to a JSON file, use the `COPY` statement:

```sql
COPY tbl TO 'output.json';
```

The result of queries can also be directly exported to a JSON file:

```sql
COPY (SELECT * FROM range(3) tbl(n)) TO 'output.json';
```

```text
{"n":0}
{"n":1}
{"n":2}
```

The JSON export writes JSON lines by default, standardized as [Newline-delimited JSON](https://en.wikipedia.org/wiki/JSON_streaming#NDJSON).
The `ARRAY` option can be used to write a single JSON array object instead.

```sql
COPY (SELECT * FROM range(3) tbl(n)) TO 'output.json' (ARRAY);
```

```text
[
        {"n":0},
        {"n":1},
        {"n":2}
]
```

For additional options, see the [`COPY` statement documentation](#docs:current:sql:statements:copy).

### Parquet Import {#docs:current:guides:file_formats:parquet_import}

To read data from a Parquet file, use the `read_parquet` function in the `FROM` clause of a query:

```sql
SELECT * FROM read_parquet('input.parquet');
```

Alternatively, you can omit the `read_parquet` function and let DuckDB infer it from the extension:

```sql
SELECT * FROM 'input.parquet';
```

To create a new table using the result from a query, use the [`CREATE TABLE ... AS SELECT` statement](#docs:current:sql:statements:create_table::create-table--as-select-ctas):

```sql
CREATE TABLE new_tbl AS
    SELECT * FROM read_parquet('input.parquet');
```

To load data into an existing table from a query, use `INSERT INTO` from a `SELECT` statement:

```sql
INSERT INTO tbl
    SELECT * FROM read_parquet('input.parquet');
```

Alternatively, use the `COPY` statement to load data from a Parquet file into an existing table:

```sql
COPY tbl FROM 'input.parquet' (FORMAT parquet);
```

#### Adjusting the Schema on the Fly {#docs:current:guides:file_formats:parquet_import::adjusting-the-schema-on-the-fly}

You can load a Parquet file into a slightly different schema (e.g., different number of columns, more relaxed types) using the following trick.

Suppose you have a Parquet file with two columns, `c1` and `c2`:

```sql
COPY (FROM (VALUES (42, 43)) t(c1, c2))
TO 'f.parquet';
```

To add another column `c3` that is not present in the file, run:

```sql
FROM (VALUES (NULL::VARCHAR, NULL, NULL)) t(c1, c2, c3)
WHERE false
UNION ALL BY NAME
FROM 'f.parquet';
```

The first `FROM` clause generates an empty table with *three* columns where `c1` is a `VARCHAR`.
Then, use `UNION ALL BY NAME` to union the Parquet file. The result is:

```text
┌─────────┬───────┬───────┐
│   c1    │  c2   │  c3   │
│ varchar │ int32 │ int32 │
├─────────┼───────┼───────┤
│ 42      │  43   │ NULL  │
└─────────┴───────┴───────┘
```

For additional options, see the [Parquet loading reference](#docs:current:data:parquet:overview).

### Parquet Export {#docs:current:guides:file_formats:parquet_export}

To export the data from a table to a Parquet file, use the `COPY` statement:

```sql
COPY tbl TO 'output.parquet' (FORMAT parquet);
```

The result of queries can also be directly exported to a Parquet file:

```sql
COPY (SELECT * FROM tbl) TO 'output.parquet' (FORMAT parquet);
```

The flags for setting compression, row group size, etc. are listed in the [Reading and Writing Parquet files](#docs:current:data:parquet:overview) page.

### Querying Parquet Files {#docs:current:guides:file_formats:query_parquet}

To run a query directly on a Parquet file, use the `read_parquet` function in the `FROM` clause of a query.

```sql
SELECT * FROM read_parquet('input.parquet');
```

The Parquet file will be processed in parallel. Filters will be automatically pushed down into the Parquet scan, and only the relevant columns will be read automatically.

For more information see the blog post [“Querying Parquet with Precision using DuckDB”](https://duckdb.org/2021/06/25/querying-parquet).

### File Access with the file: Protocol {#docs:current:guides:file_formats:file_access}

DuckDB supports using the `file:` protocol. It currently supports the following formats:

* `file:/some/path` (host omitted completely)
* `file:///some/path` (empty host)
* `file://localhost/some/path` (` localhost` as host)

Note that the following formats are *not* supported because they are non-standard:

* `file:some/relative/path` (relative path)
* `file://some/path` (double-slash path)

Additionally, the `file:` protocol currently does not support remote (non-localhost) hosts.

## Network and Cloud Storage {#guides:network_cloud_storage}

### Network and Cloud Storage {#docs:current:guides:network_cloud_storage:overview}


### HTTP Parquet Import {#docs:current:guides:network_cloud_storage:http_import}

To load a Parquet file over HTTP(S), the [`httpfs` extension](#docs:current:core_extensions:httpfs:overview) is required. This can be installed using the `INSTALL` SQL command. This only needs to be run once.

```sql
INSTALL httpfs;
```

To load the `httpfs` extension for usage, use the `LOAD` SQL command:

```sql
LOAD httpfs;
```

After the `httpfs` extension is set up, Parquet files can be read over `http(s)`:

```sql
SELECT * FROM read_parquet('https://⟨domain⟩/path/to/file.parquet');
```

For example:

```sql
SELECT * FROM read_parquet('https://duckdb.org/data/prices.parquet');
```

Moreover, the `read_parquet` function itself can also be omitted thanks to DuckDB's [replacement scan mechanism](#docs:current:clients:c:replacement_scans):

```sql
SELECT * FROM 'https://duckdb.org/data/holdings.parquet';
```

### S3 Parquet Import {#docs:current:guides:network_cloud_storage:s3_import}

#### Prerequisites {#docs:current:guides:network_cloud_storage:s3_import::prerequisites}

To load a Parquet file from S3, the [`httpfs` extension](#docs:current:core_extensions:httpfs:overview) is required. This can be installed using the `INSTALL` SQL command. This only needs to be run once.

```sql
INSTALL httpfs;
```

To load the `httpfs` extension for usage, use the `LOAD` SQL command:

```sql
LOAD httpfs;
```

#### Credentials and Configuration {#docs:current:guides:network_cloud_storage:s3_import::credentials-and-configuration}

After loading the `httpfs` extension, set up the credentials and S3 region to read data:

```sql
CREATE SECRET (
    TYPE s3,
    KEY_ID '⟨AKIAIOSFODNN7EXAMPLE⟩',
    SECRET '⟨wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY⟩',
    REGION '⟨us-east-1⟩'
);
```

> **Tip.** If you get an IO Error (` Connection error for HTTP HEAD`), configure the endpoint explicitly via `ENDPOINT 's3.⟨your-region⟩.amazonaws.com'`{:.language-sql .highlight}.

Alternatively, use the [`aws` extension](#docs:current:core_extensions:aws) to retrieve the credentials automatically:

```sql
CREATE SECRET (
    TYPE s3,
    PROVIDER credential_chain
);
```

#### Querying {#docs:current:guides:network_cloud_storage:s3_import::querying}

After the `httpfs` extension is set up and the S3 configuration is set correctly, Parquet files can be read from S3 using the following command:

```sql
SELECT * FROM read_parquet('s3://⟨bucket⟩/⟨file⟩');
```

#### Google Cloud Storage (GCS) and Cloudflare R2 {#docs:current:guides:network_cloud_storage:s3_import::google-cloud-storage-gcs-and-cloudflare-r2}

DuckDB can also handle [Google Cloud Storage (GCS)](#docs:current:guides:network_cloud_storage:gcs_import) and [Cloudflare R2](#docs:current:guides:network_cloud_storage:cloudflare_r2_import) via the S3 API.
See the relevant guides for details.

### S3 Parquet Export {#docs:current:guides:network_cloud_storage:s3_export}

To write a Parquet file to S3, the [`httpfs` extension](#docs:current:core_extensions:httpfs:overview) is required. This can be installed using the `INSTALL` SQL command. This only needs to be run once.

```sql
INSTALL httpfs;
```

To load the `httpfs` extension for usage, use the `LOAD` SQL command:

```sql
LOAD httpfs;
```

After loading the `httpfs` extension, set up the credentials to write data. Note that the `region` parameter should match the region of the bucket you want to access.

```sql
CREATE SECRET (
    TYPE s3,
    KEY_ID '⟨AKIAIOSFODNN7EXAMPLE⟩',
    SECRET '⟨wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY⟩',
    REGION '⟨us-east-1⟩'
);
```

> **Tip.** If you get an IO Error (` Connection error for HTTP HEAD`), configure the endpoint explicitly via `ENDPOINT 's3.⟨your-region⟩.amazonaws.com'`{:.language-sql .highlight}.

Alternatively, use the [`aws` extension](#docs:current:core_extensions:aws) to retrieve the credentials automatically:

```sql
CREATE SECRET (
    TYPE s3,
    PROVIDER credential_chain
);
```

After the `httpfs` extension is set up and the S3 credentials are correctly configured, Parquet files can be written to S3 using the following command:

```sql
COPY ⟨table_name⟩ TO 's3://⟨s3-bucket⟩/⟨filename⟩.parquet';
```

Similarly, Google Cloud Storage (GCS) is supported through the Interoperability API.
You need to create [HMAC keys](https://console.cloud.google.com/storage/settings;tab=interoperability) and provide the credentials as follows:

```sql
CREATE SECRET (
    TYPE gcs,
    KEY_ID '⟨AKIAIOSFODNN7EXAMPLE⟩',
    SECRET '⟨wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY⟩'
);
```

After setting up the GCS credentials, you can export using:

```sql
COPY ⟨table_name⟩ TO 'gs://⟨gcs_bucket⟩/⟨filename⟩.parquet';
```

### S3 Iceberg Import {#docs:current:guides:network_cloud_storage:s3_iceberg_import}

#### Prerequisites {#docs:current:guides:network_cloud_storage:s3_iceberg_import::prerequisites}

Loading an Iceberg file from S3 requires both the [`httpfs`](#docs:current:core_extensions:httpfs:overview) and [`iceberg`](#docs:current:core_extensions:iceberg:overview) extensions. Install them using the `INSTALL` SQL command. You only need to install extensions once.

```sql
INSTALL httpfs;
INSTALL iceberg;
```

To load the extensions, use the `LOAD` command:

```sql
LOAD httpfs;
LOAD iceberg;
```

#### Credentials {#docs:current:guides:network_cloud_storage:s3_iceberg_import::credentials}

After loading the extensions, set up the credentials and S3 region to read data. You may either use an access key and secret, or a token.

```sql
CREATE SECRET (
    TYPE s3,
    KEY_ID '⟨AKIAIOSFODNN7EXAMPLE⟩',
    SECRET '⟨wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY⟩',
    REGION '⟨us-east-1⟩'
);
```

Alternatively, use the [`aws` extension](#docs:current:core_extensions:aws) to retrieve the credentials automatically:

```sql
CREATE SECRET (
    TYPE s3,
    PROVIDER credential_chain
);
```

#### Loading Iceberg Tables from S3 {#docs:current:guides:network_cloud_storage:s3_iceberg_import::loading-iceberg-tables-from-s3}

After the extensions are set up and the S3 credentials are correctly configured, Iceberg tables can be read from S3 using the following command:

```sql
SELECT *
FROM iceberg_scan('s3://⟨bucket⟩/⟨iceberg_table_folder⟩/metadata/⟨id⟩.metadata.json');
```

Note that you need to link directly to the manifest file. Otherwise, you'll get an error like this:

```console
IO Error:
Cannot open file "s3://bucket/iceberg_table_folder/metadata/version-hint.text": No such file or directory
```

### S3 Express One {#docs:current:guides:network_cloud_storage:s3_express_one}

In late 2023, AWS [announced](https://aws.amazon.com/about-aws/whats-new/2023/11/amazon-s3-express-one-zone-storage-class/) the [S3 Express One Zone](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-express-one-zone.html), a high-speed variant of traditional S3 buckets.
DuckDB can read S3 Express One buckets using the [`httpfs` extension](#docs:current:core_extensions:httpfs:overview).

#### Credentials and Configuration {#docs:current:guides:network_cloud_storage:s3_express_one::credentials-and-configuration}

The configuration of S3 Express One buckets is similar to [regular S3 buckets](#docs:current:guides:network_cloud_storage:s3_import) with one exception:
you must specify the endpoint according to the following pattern:

```sql
s3express-⟨availability_zone⟩.⟨region⟩.amazonaws.com
```

where the `⟨availability_zone⟩`{:.language-sql .highlight} (e.g., `use-az5`) can be obtained from the S3 Express One bucket's configuration page and the `⟨region⟩`{:.language-sql .highlight} is the AWS region (e.g., `us-east-1`).

For example, to allow DuckDB to use an S3 Express One bucket, configure the [Secrets manager](#docs:current:sql:statements:create_secret) as follows:

```sql
CREATE SECRET (
    TYPE s3,
    KEY_ID '⟨AKIAIOSFODNN7EXAMPLE⟩',
    SECRET '⟨wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY⟩',
    REGION '⟨us-east-1⟩',
    ENDPOINT 's3express-⟨use1-az5⟩.⟨us-east-1⟩.amazonaws.com'
);
```

#### Instance Location {#docs:current:guides:network_cloud_storage:s3_express_one::instance-location}

For best performance, ensure the EC2 instance is in the same availability zone as the S3 Express One bucket you are querying.
To determine the mapping between zone names and zone IDs, use the `aws ec2 describe-availability-zones` command.

* Zone name to zone ID mapping:

  ```bash
  aws ec2 describe-availability-zones --output json \
      | jq -r '.AvailabilityZones[] | select(.ZoneName == "us-east-1f") | .ZoneId'
  ```

  ```text
  use1-az5
  ```

* Zone ID to zone name mapping:

  ```bash
  aws ec2 describe-availability-zones --output json \
      | jq -r '.AvailabilityZones[] | select(.ZoneId == "use1-az5") | .ZoneName'
  ```

  ```text
  us-east-1f
  ```

#### Querying {#docs:current:guides:network_cloud_storage:s3_express_one::querying}

You can query the S3 Express One bucket like any other S3 bucket:

```sql
SELECT *
FROM 's3://express-bucket-name--use1-az5--x-s3/my-file.parquet';
```

#### Performance {#docs:current:guides:network_cloud_storage:s3_express_one::performance}

The following experiments were run on a `c7gd.12xlarge` instance using the [LDBC SF300 Comments `creationDate` Parquet file](https://blobs.duckdb.org/data/ldbc-sf300-comments-creationDate.parquet) (also used in the [microbenchmarks of the performance guide](#docs:current:guides:performance:benchmarks::data-sets)).

| Experiment | File size | Runtime |
|:-----|--:|--:|
| Loading only from Parquet | 4.1 GB | 3.5 s |
| Creating local table from Parquet | 4.1 GB | 5.1 s |

The “loading only” variant is running the load as part of an [`EXPLAIN ANALYZE`](#docs:current:guides:meta:explain_analyze) statement to measure the runtime without actually creating a local table, while the “creating local table” variant uses [`CREATE TABLE ... AS SELECT`](#docs:current:sql:statements:create_table::create-table--as-select-ctas) to create a persistent table on the local disk.

### Google Cloud Storage Import {#docs:current:guides:network_cloud_storage:gcs_import}

#### Prerequisites {#docs:current:guides:network_cloud_storage:gcs_import::prerequisites}

The Google Cloud Storage (GCS) can be used via the [`httpfs` extension](#docs:current:core_extensions:httpfs:overview).
This can be installed with the `INSTALL httpfs` SQL command. This only needs to be run once.

#### Credentials and Configuration {#docs:current:guides:network_cloud_storage:gcs_import::credentials-and-configuration}

You need to create [HMAC keys](https://console.cloud.google.com/storage/settings;tab=interoperability) and declare them:

```sql
CREATE SECRET (
    TYPE gcs,
    KEY_ID '⟨AKIAIOSFODNN7EXAMPLE⟩',
    SECRET '⟨wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY⟩'
);
```

#### Querying {#docs:current:guides:network_cloud_storage:gcs_import::querying}

After setting up the GCS credentials, you can query the GCS data using:

```sql
SELECT *
FROM read_parquet('gs://⟨gcs_bucket⟩/⟨file.parquet⟩');
```

#### Attaching to a Database {#docs:current:guides:network_cloud_storage:gcs_import::attaching-to-a-database}

You can [attach to a database file](#docs:current:guides:network_cloud_storage:duckdb_over_https_or_s3) in read-only mode:

```sql
LOAD httpfs;
ATTACH 'gs://⟨gcs_bucket⟩/⟨file.duckdb⟩' AS ⟨duckdb_database⟩ (READ_ONLY);
```

> Databases in Google Cloud Storage can only be attached in read-only mode.

### Cloudflare R2 Import {#docs:current:guides:network_cloud_storage:cloudflare_r2_import}

#### Prerequisites {#docs:current:guides:network_cloud_storage:cloudflare_r2_import::prerequisites}

For Cloudflare R2, the [S3 Compatibility API](https://developers.cloudflare.com/r2/api/s3/api/) allows you to use DuckDB's S3 support to read and write from R2 buckets.

This requires the [`httpfs` extension](#docs:current:core_extensions:httpfs:overview), which can be installed using the `INSTALL` SQL command. This only needs to be run once.

#### Credentials and Configuration {#docs:current:guides:network_cloud_storage:cloudflare_r2_import::credentials-and-configuration}

You will need to [generate an S3 auth token](https://developers.cloudflare.com/r2/api/s3/tokens/) and create an `R2` secret in DuckDB:

```sql
CREATE SECRET (
    TYPE r2,
    KEY_ID '⟨AKIAIOSFODNN7EXAMPLE⟩',
    SECRET '⟨wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY⟩',
    ACCOUNT_ID '⟨your-33-character-hexadecimal-account-ID⟩'
);
```

#### Querying {#docs:current:guides:network_cloud_storage:cloudflare_r2_import::querying}

After setting up the R2 credentials, you can query the R2 data using DuckDB's built-in methods, such as `read_csv` or `read_parquet`:

```sql
SELECT * FROM read_parquet('r2://⟨r2-bucket-name⟩/⟨file⟩');
```

### Attach to a DuckDB Database over HTTPS or S3 {#docs:current:guides:network_cloud_storage:duckdb_over_https_or_s3}

You can establish a read-only connection to a DuckDB instance via HTTPS or the S3 API.

#### Prerequisites {#docs:current:guides:network_cloud_storage:duckdb_over_https_or_s3::prerequisites}

This guide requires the [`httpfs` extension](#docs:current:core_extensions:httpfs:overview), which can be installed using the `INSTALL httpfs` SQL command. This only needs to be run once.

#### Attaching to a Database over HTTPS {#docs:current:guides:network_cloud_storage:duckdb_over_https_or_s3::attaching-to-a-database-over-https}

To connect to a DuckDB database via HTTPS, use the [`ATTACH` statement](#docs:current:sql:statements:attach) as follows:

```sql
ATTACH 'https://blobs.duckdb.org/databases/stations.duckdb' AS stations_db;
```


Then, the database can be queried using:

```sql
SELECT count(*) AS num_stations
FROM stations_db.stations;
```

| num_stations |
|-------------:|
| 578          |

#### Attaching to a Database over the S3 API {#docs:current:guides:network_cloud_storage:duckdb_over_https_or_s3::attaching-to-a-database-over-the-s3-api}

To connect to a DuckDB database via the S3 API, [configure the authentication](#docs:current:guides:network_cloud_storage:s3_import::credentials-and-configuration) for your bucket (if required).
Then, use the [`ATTACH` statement](#docs:current:sql:statements:attach) as follows:

```sql
ATTACH 'https://blobs.duckdb.org/databases/stations.duckdb' AS stations_db;
```


The database can be queried using:

```sql
SELECT count(*) AS num_stations
FROM stations_db.stations;
```

| num_stations |
|-------------:|
| 578          |

> Connecting to S3-compatible APIs such as the [Google Cloud Storage (` gs://`)](#docs:current:guides:network_cloud_storage:gcs_import::attaching-to-a-database) is also supported.

#### Limitations {#docs:current:guides:network_cloud_storage:duckdb_over_https_or_s3::limitations}

* Only read-only connections are allowed, writing the database via the HTTPS protocol or the S3 API is not possible.

### Fastly Object Storage Import {#docs:current:guides:network_cloud_storage:fastly_object_storage_import}

#### Prerequisites {#docs:current:guides:network_cloud_storage:fastly_object_storage_import::prerequisites}

For Fastly Object Storage, the [S3 Compatibility API](https://docs.fastly.com/products/object-storage) allows you to use DuckDB's S3 support to read and write from Fastly buckets.

This requires the [`httpfs` extension](#docs:current:core_extensions:httpfs:overview), which can be installed using the `INSTALL` SQL command. This only needs to be run once.

#### Credentials and Configuration {#docs:current:guides:network_cloud_storage:fastly_object_storage_import::credentials-and-configuration}

You will need to [generate an S3 auth token](https://docs.fastly.com/en/guides/working-with-object-storage#creating-an-object-storage-access-key) and create an `S3` secret in DuckDB:

```sql
CREATE SECRET my_secret (
    TYPE s3,
    KEY_ID '⟨AKIAIOSFODNN7EXAMPLE⟩',
    SECRET '⟨wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY⟩',
	URL_STYLE 'path',
    REGION '⟨us-east⟩',
    ENDPOINT '⟨us-east⟩.object.fastlystorage.app' -- see note below
);
```

* The `ENDPOINT` needs to point to the [Fastly endpoint for the region](https://docs.fastly.com/en/guides/working-with-object-storage#working-with-the-s3-compatible-api) you want to use (e.g., `eu-central.object.fastlystorage.app`).
* `REGION` must use the same region mentioned in `ENDPOINT`.
* `URL_STYLE` needs to use `path`.

#### Querying {#docs:current:guides:network_cloud_storage:fastly_object_storage_import::querying}

After setting up the Fastly Object Storage credentials, you can query the data there using DuckDB's built-in methods, such as `read_csv` or `read_parquet`:

```sql
SELECT * FROM 's3://⟨fastly-bucket-name⟩/(file).csv';
SELECT * FROM read_parquet('s3://⟨fastly-bucket-name⟩/⟨file⟩.parquet');
```

### Tigris Import {#docs:current:guides:network_cloud_storage:tigris_import}

#### Prerequisites {#docs:current:guides:network_cloud_storage:tigris_import::prerequisites}

For [Tigris](https://www.tigrisdata.com/), the [S3-compatible API](https://www.tigrisdata.com/docs/api/s3/) allows you to use DuckDB's S3 support to read and write from Tigris buckets.

This requires the [`httpfs` extension](#docs:current:core_extensions:httpfs:overview), which can be installed using the `INSTALL` SQL command. This only needs to be run once.

#### Credentials and Configuration {#docs:current:guides:network_cloud_storage:tigris_import::credentials-and-configuration}

You will need to [generate an access key pair](https://www.tigrisdata.com/docs/iam/) and create an `S3` secret in DuckDB:

```sql
CREATE SECRET my_secret (
    TYPE s3,
    KEY_ID '⟨tid_xxxxxxxxxxxx⟩',
    SECRET '⟨tsec_xxxxxxxxxxxxxxxxxxxxxxxx⟩',
    REGION 'auto',
    ENDPOINT 'fly.storage.tigris.dev'
);
```

* A single endpoint (` fly.storage.tigris.dev`) serves all regions; requests are routed to the Tigris region nearest the caller. `REGION` is required for request signing but is not used for routing — set it to `auto`.
* `URL_STYLE` does not need to be set. Tigris uses virtual-hosted-style URLs, which is DuckDB's default for `TYPE s3`.

> **Tip.** When DuckDB runs on a [Fly.io](https://fly.io/) Machine, requests to `fly.storage.tigris.dev` stay on Fly's internal network and are served from the same region as the Machine when possible.

#### Querying {#docs:current:guides:network_cloud_storage:tigris_import::querying}

After setting up the Tigris credentials, you can query the data using DuckDB's built-in methods, such as `read_csv` or `read_parquet`:

```sql
SELECT * FROM 's3://⟨tigris-bucket-name⟩/⟨file⟩.csv';
SELECT * FROM read_parquet('s3://⟨tigris-bucket-name⟩/⟨file⟩.parquet');
```

## Meta Queries {#guides:meta}

### Describe {#docs:current:guides:meta:describe}

#### Describing a Table {#docs:current:guides:meta:describe::describing-a-table}

To view the schema of a table, use the `DESCRIBE` statement (or its aliases `DESC` and `SHOW`) followed by the table name.

```sql
CREATE TABLE tbl (i INTEGER PRIMARY KEY, j VARCHAR);
DESCRIBE tbl;
SHOW tbl; -- equivalent to DESCRIBE tbl;
```

| column_name | column_type | null | key  | default | extra |
|-------------|-------------|------|------|---------|-------|
| i           | INTEGER     | NO   | PRI  | NULL    | NULL  |
| j           | VARCHAR     | YES  | NULL | NULL    | NULL  |

#### Describing a Query {#docs:current:guides:meta:describe::describing-a-query}

To view the schema of the result of a query, prepend `DESCRIBE` to a query.

```sql
DESCRIBE SELECT * FROM tbl;
```

| column_name | column_type | null | key  | default | extra |
|-------------|-------------|------|------|---------|-------|
| i           | INTEGER     | YES  | NULL | NULL    | NULL  |
| j           | VARCHAR     | YES  | NULL | NULL    | NULL  |

Note that there are subtle differences: compared to the result when [describing a table](#::describing-a-table), nullability (` null`) and key information (` key`) are lost.

#### Using `DESCRIBE` in a Subquery {#docs:current:guides:meta:describe::using-describe-in-a-subquery}

`DESCRIBE` can be used as a subquery. This allows creating a table from the description, for example:

```sql
CREATE TABLE tbl_description AS SELECT * FROM (DESCRIBE tbl);
```

#### Describing Remote Tables {#docs:current:guides:meta:describe::describing-remote-tables}

It is possible to describe remote tables via the [`httpfs` extension](#docs:current:core_extensions:httpfs:overview) using the `DESCRIBE TABLE` statement. For example:

```sql
DESCRIBE TABLE 'https://blobs.duckdb.org/data/Star_Trek-Season_1.csv';
```

|               column_name               | column_type | null | key  | default | extra |
|-----------------------------------------|-------------|------|------|---------|-------|
| season_num                              | BIGINT      | YES  | NULL | NULL    | NULL  |
| episode_num                             | BIGINT      | YES  | NULL | NULL    | NULL  |
| aired_date                              | DATE        | YES  | NULL | NULL    | NULL  |
| cnt_kirk_hookups                        | BIGINT      | YES  | NULL | NULL    | NULL  |
| cnt_downed_redshirts                    | BIGINT      | YES  | NULL | NULL    | NULL  |
| bool_aliens_almost_took_over_planet     | BIGINT      | YES  | NULL | NULL    | NULL  |
| bool_aliens_almost_took_over_enterprise | BIGINT      | YES  | NULL | NULL    | NULL  |
| cnt_vulcan_nerve_pinch                  | BIGINT      | YES  | NULL | NULL    | NULL  |
| cnt_warp_speed_orders                   | BIGINT      | YES  | NULL | NULL    | NULL  |
| highest_warp_speed_issued               | BIGINT      | YES  | NULL | NULL    | NULL  |
| bool_hand_phasers_fired                 | BIGINT      | YES  | NULL | NULL    | NULL  |
| bool_ship_phasers_fired                 | BIGINT      | YES  | NULL | NULL    | NULL  |
| bool_ship_photon_torpedos_fired         | BIGINT      | YES  | NULL | NULL    | NULL  |
| cnt_transporter_pax                     | BIGINT      | YES  | NULL | NULL    | NULL  |
| cnt_damn_it_jim_quote                   | BIGINT      | YES  | NULL | NULL    | NULL  |
| cnt_im_givin_her_all_shes_got_quote     | BIGINT      | YES  | NULL | NULL    | NULL  |
| cnt_highly_illogical_quote              | BIGINT      | YES  | NULL | NULL    | NULL  |
| bool_enterprise_saved_the_day           | BIGINT      | YES  | NULL | NULL    | NULL  |

### EXPLAIN: Inspect Query Plans {#docs:current:guides:meta:explain}

```sql
EXPLAIN SELECT * FROM tbl;
```

The `EXPLAIN` statement displays the physical plan, i.e., the query plan that will get executed,
and is enabled by prepending the query with `EXPLAIN`.
The physical plan is a tree of operators that are executed in a specific order to produce the result of the query.
To generate an efficient physical plan, the query optimizer transforms the existing physical plan into a better physical plan.

To demonstrate, see the below example:

```sql
CREATE TABLE students (name VARCHAR, sid INTEGER);
CREATE TABLE exams (eid INTEGER, subject VARCHAR, sid INTEGER);
INSERT INTO students VALUES ('Mark', 1), ('Joe', 2), ('Matthew', 3);
INSERT INTO exams VALUES (10, 'Physics', 1), (20, 'Chemistry', 2), (30, 'Literature', 3);

EXPLAIN
    SELECT name
    FROM students
    JOIN exams USING (sid)
    WHERE name LIKE 'Ma%';
```

```text
┌─────────────────────────────┐
│┌───────────────────────────┐│
││       Physical Plan       ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│            name           │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         HASH_JOIN         │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           INNER           │
│         sid = sid         ├──────────────┐
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │              │
│           EC: 1           │              │
└─────────────┬─────────────┘              │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         SEQ_SCAN          ││           FILTER          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           exams           ││     prefix(name, 'Ma')    │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│            sid            ││           EC: 1           │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││                           │
│           EC: 3           ││                           │
└───────────────────────────┘└─────────────┬─────────────┘
                             ┌─────────────┴─────────────┐
                             │         SEQ_SCAN          │
                             │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
                             │          students         │
                             │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
                             │            sid            │
                             │            name           │
                             │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
                             │ Filters: name>=Ma AND name│
                             │  <Mb AND name IS NOT NULL │
                             │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
                             │           EC: 1           │
                             └───────────────────────────┘
```

Note that the query is not actually executed – therefore, we can only see the estimated cardinality (` EC`) for each operator, which is calculated by using the statistics of the base tables and applying heuristics for each operator.

Table scan operators display the fully qualified table name including catalog and schema (e.g., `memory.myschema.mytable`).

#### Additional Explain Settings {#docs:current:guides:meta:explain::additional-explain-settings}

The `EXPLAIN` statement supports additional settings that can be used to control the output. The following settings are available:

The default setting. Only shows the physical plan.

```sql
PRAGMA explain_output = 'physical_only';
```

Shows only the optimized plan.

```sql
PRAGMA explain_output = 'optimized_only';
```

Shows both the physical and optimized plans.

```sql
PRAGMA explain_output = 'all';
```

#### See Also {#docs:current:guides:meta:explain::see-also}

For more information, see the [”Profiling” page](#docs:current:dev:profiling).

### EXPLAIN ANALYZE: Profile Queries {#docs:current:guides:meta:explain_analyze}

Prepending a query with `EXPLAIN ANALYZE` both pretty-prints the query plan,
and executes it, providing run-time performance numbers for every operator, as well as the estimated cardinality (` EC`) and the actual cardinality.

```sql
EXPLAIN ANALYZE SELECT * FROM tbl;
```

Note that the **cumulative** wall-clock time that is spent on every operator is shown. When multiple threads are processing the query in parallel, the total processing time of the query may be lower than the sum of all the times spent on the individual operators.

For multi-file reads (e.g., reading multiple Parquet files), the output includes the file names being read.

Below is an example of running `EXPLAIN ANALYZE` on a query:

```sql
CREATE TABLE students (name VARCHAR, sid INTEGER);
CREATE TABLE exams (eid INTEGER, subject VARCHAR, sid INTEGER);
INSERT INTO students VALUES ('Mark', 1), ('Joe', 2), ('Matthew', 3);
INSERT INTO exams VALUES (10, 'Physics', 1), (20, 'Chemistry', 2), (30, 'Literature', 3);

EXPLAIN ANALYZE
    SELECT name
    FROM students
    JOIN exams USING (sid)
    WHERE name LIKE 'Ma%';
```

```text
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││        Total Time: 0.0008s        ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
┌───────────────────────────┐
│      EXPLAIN_ANALYZE      │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             0             │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│            name           │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             2             │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         HASH_JOIN         │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           INNER           │
│         sid = sid         │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ├──────────────┐
│           EC: 1           │              │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │              │
│             2             │              │
│          (0.00s)          │              │
└─────────────┬─────────────┘              │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         SEQ_SCAN          ││           FILTER          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           exams           ││     prefix(name, 'Ma')    │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│            sid            ││           EC: 1           │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           EC: 3           ││             2             │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││          (0.00s)          │
│             3             ││                           │
│          (0.00s)          ││                           │
└───────────────────────────┘└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         SEQ_SCAN          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│          students         │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│            sid            │
│            name           │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│ Filters: name>=Ma AND name│
│  <Mb AND name IS NOT NULL │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           EC: 1           │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             2             │
│          (0.00s)          │
└───────────────────────────┘
```

#### See Also {#docs:current:guides:meta:explain_analyze::see-also}

For more information, see the [”Profiling” page](#docs:current:dev:profiling).

### List Tables {#docs:current:guides:meta:list_tables}

The `SHOW TABLES` command can be used to obtain a list of all tables within the selected schema.

```sql
CREATE TABLE tbl (i INTEGER);
SHOW TABLES;
```

| name |
|------|
| tbl  |

`SHOW` or `SHOW ALL TABLES` can be used to obtain a list of all tables within **all** attached databases and schemas.

```sql
CREATE TABLE tbl (i INTEGER);
CREATE SCHEMA s1;
CREATE TABLE s1.tbl (v VARCHAR);
SHOW ALL TABLES;
```

| database | schema | table_name | column_names | column_types | temporary |
|----------|--------|------------|--------------|--------------|-----------|
| memory   | main   | tbl        | [i]          | [INTEGER]    | false     |
| memory   | s1     | tbl        | [v]          | [VARCHAR]    | false     |

`SHOW TABLES FROM db` can be used to list all tables in a given database or schema.

```sql
ATTACH 'db.duckdb';
CREATE TABLE db.main_tbl (u VARCHAR);
CREATE SCHEMA db.s1;
CREATE TABLE db.s1.schema_tbl (v VARCHAR);
SHOW TABLES FROM db;
```

| name       |
|------------|
| main_tbl   |
| schema_tbl |

Or a specific schema.

```sql
SHOW TABLES FROM db.s1;
```

| name       |
|------------|
| schema_tbl |

To view the schema of an individual table, use the [`DESCRIBE` command](#docs:current:guides:meta:describe).

#### See Also {#docs:current:guides:meta:list_tables::see-also}

The SQL-standard [`information_schema`](#docs:current:sql:meta:information_schema) views are also defined. Moreover, DuckDB defines `sqlite_master` and many [PostgreSQL system catalog tables](https://www.postgresql.org/docs/16/catalogs.html) for compatibility with SQLite and PostgreSQL respectively.

### Summarize {#docs:current:guides:meta:summarize}

The `SUMMARIZE` command can be used to easily compute a number of aggregates over a table or a query.
The `SUMMARIZE` command launches a query that computes a number of aggregates over all columns (` min`, `max`, `approx_unique`, `avg`, `std`, `q25`, `q50`, `q75`, `count`), and return these along the column name, column type, and the percentage of `NULL` values in the column.
Note that the quantiles and percentiles are **approximate values**.

#### Usage {#docs:current:guides:meta:summarize::usage}

To summarize the contents of a table, use `SUMMARIZE` followed by the table name.

```sql
SUMMARIZE tbl;
```

To summarize a query, prepend `SUMMARIZE` to a query.

```sql
SUMMARIZE SELECT * FROM tbl;
```

#### Example {#docs:current:guides:meta:summarize::example}

Below is an example of `SUMMARIZE` on the `lineitem` table of TPC-H `SF1` table, generated using the [`tpch` extension](#docs:current:core_extensions:tpch).

```sql
INSTALL tpch;
LOAD tpch;
CALL dbgen(sf = 1);
```

```sql
SUMMARIZE lineitem;
```

|   column_name   |  column_type  |     min     |         max         | approx_unique |         avg         |         std          |   q25   |   q50   |   q75   |  count  | null_percentage |
|-----------------|---------------|-------------|---------------------|---------------|---------------------|----------------------|---------|---------|---------|---------|-----------------|
| l_orderkey      | INTEGER       | 1           | 6000000             | 1508227       | 3000279.604204982   | 1732187.8734803519   | 1509447 | 2989869 | 4485232 | 6001215 | 0.0%            |
| l_partkey       | INTEGER       | 1           | 200000              | 202598        | 100017.98932999402  | 57735.69082650496    | 49913   | 99992   | 150039  | 6001215 | 0.0%            |
| l_suppkey       | INTEGER       | 1           | 10000               | 10061         | 5000.602606138924   | 2886.9619987306114   | 2501    | 4999    | 7500    | 6001215 | 0.0%            |
| l_linenumber    | INTEGER       | 1           | 7                   | 7             | 3.0005757167506912  | 1.7324314036519328   | 2       | 3       | 4       | 6001215 | 0.0%            |
| l_quantity      | DECIMAL(15,2) | 1.00        | 50.00               | 50            | 25.507967136654827  | 14.426262537016918   | 13      | 26      | 38      | 6001215 | 0.0%            |
| l_extendedprice | DECIMAL(15,2) | 901.00      | 104949.50           | 923139        | 38255.138484656854  | 23300.43871096221    | 18756   | 36724   | 55159   | 6001215 | 0.0%            |
| l_discount      | DECIMAL(15,2) | 0.00        | 0.10                | 11            | 0.04999943011540163 | 0.03161985510812596  | 0       | 0       | 0       | 6001215 | 0.0%            |
| l_tax           | DECIMAL(15,2) | 0.00        | 0.08                | 9             | 0.04001350893110812 | 0.025816551798842728 | 0       | 0       | 0       | 6001215 | 0.0%            |
| l_returnflag    | VARCHAR       | A           | R                   | 3             | NULL                | NULL                 | NULL    | NULL    | NULL    | 6001215 | 0.0%            |
| l_linestatus    | VARCHAR       | F           | O                   | 2             | NULL                | NULL                 | NULL    | NULL    | NULL    | 6001215 | 0.0%            |
| l_shipdate      | DATE          | 1992-01-02  | 1998-12-01          | 2516          | NULL                | NULL                 | NULL    | NULL    | NULL    | 6001215 | 0.0%            |
| l_commitdate    | DATE          | 1992-01-31  | 1998-10-31          | 2460          | NULL                | NULL                 | NULL    | NULL    | NULL    | 6001215 | 0.0%            |
| l_receiptdate   | DATE          | 1992-01-04  | 1998-12-31          | 2549          | NULL                | NULL                 | NULL    | NULL    | NULL    | 6001215 | 0.0%            |
| l_shipinstruct  | VARCHAR       | COLLECT COD | TAKE BACK RETURN    | 4             | NULL                | NULL                 | NULL    | NULL    | NULL    | 6001215 | 0.0%            |
| l_shipmode      | VARCHAR       | AIR         | TRUCK               | 7             | NULL                | NULL                 | NULL    | NULL    | NULL    | 6001215 | 0.0%            |
| l_comment       | VARCHAR       |  Tiresias   | zzle? furiously iro | 3558599       | NULL                | NULL                 | NULL    | NULL    | NULL    | 6001215 | 0.0%            |

#### Using `SUMMARIZE` in a Subquery {#docs:current:guides:meta:summarize::using-summarize-in-a-subquery}

`SUMMARIZE` can be used as a subquery. This allows creating a table from the summary, for example:

```sql
CREATE TABLE tbl_summary AS SELECT * FROM (SUMMARIZE tbl);
```

#### Summarizing Remote Tables {#docs:current:guides:meta:summarize::summarizing-remote-tables}

It is possible to summarize remote tables via the [`httpfs` extension](#docs:current:core_extensions:httpfs:overview) using the `SUMMARIZE TABLE` statement. For example:

```sql
SUMMARIZE TABLE 'https://blobs.duckdb.org/data/Star_Trek-Season_1.csv';
```

### DuckDB Environment {#docs:current:guides:meta:duckdb_environment}

DuckDB provides a number of functions and `PRAGMA` options to retrieve information on the running DuckDB instance and its environment.

#### Version {#docs:current:guides:meta:duckdb_environment::version}

The `version()` function returns the version number of DuckDB.

```sql
SELECT version() AS version;
```



| version |
|-----------|
| v1.5.2 |

Using a `PRAGMA`:

```sql
PRAGMA version;
```



| library_version | source_id  |
|-----------------|------------|
| v1.5.2 | 8a5851971f |

#### Platform {#docs:current:guides:meta:duckdb_environment::platform}

The platform information consists of the operating system, system architecture, and, optionally, the compiler.
The platform is used when [installing extensions](#docs:current:extensions:extension_distribution::platforms).
To retrieve the platform, use the following `PRAGMA`:

```sql
PRAGMA platform;
```

On macOS, running on Apple Silicon architecture, the result is:

| platform  |
|-----------|
| osx_arm64 |

On Windows, running on an AMD64 architecture, the platform is `windows_amd64`.
On Ubuntu Linux, running on the ARM64 architecture, the platform is `linux_arm64`.

#### Extensions {#docs:current:guides:meta:duckdb_environment::extensions}

To get a list of DuckDB extensions and their status (e.g., `loaded`, `installed`), use the [`duckdb_extensions()` function](#docs:current:extensions:overview::listing-extensions):

```sql
SELECT *
FROM duckdb_extensions();
```

#### Meta Table Functions {#docs:current:guides:meta:duckdb_environment::meta-table-functions}

DuckDB has the following built-in table functions to obtain metadata about available catalog objects:

* [`duckdb_columns()`](#docs:current:sql:meta:duckdb_table_functions::duckdb_columns): columns
* [`duckdb_constraints()`](#docs:current:sql:meta:duckdb_table_functions::duckdb_constraints): constraints
* [`duckdb_databases()`](#docs:current:sql:meta:duckdb_table_functions::duckdb_databases): lists the databases that are accessible from within the current DuckDB process
* [`duckdb_dependencies()`](#docs:current:sql:meta:duckdb_table_functions::duckdb_dependencies): dependencies between objects
* [`duckdb_extensions()`](#docs:current:sql:meta:duckdb_table_functions::duckdb_extensions): extensions
* [`duckdb_functions()`](#docs:current:sql:meta:duckdb_table_functions::duckdb_functions): functions
* [`duckdb_indexes()`](#docs:current:sql:meta:duckdb_table_functions::duckdb_indexes): secondary indexes
* [`duckdb_keywords()`](#docs:current:sql:meta:duckdb_table_functions::duckdb_keywords): DuckDB's keywords and reserved words
* [`duckdb_optimizers()`](#docs:current:sql:meta:duckdb_table_functions::duckdb_optimizers): the available optimization rules in the DuckDB instance
* [`duckdb_schemas()`](#docs:current:sql:meta:duckdb_table_functions::duckdb_schemas): schemas
* [`duckdb_sequences()`](#docs:current:sql:meta:duckdb_table_functions::duckdb_sequences): sequences
* [`duckdb_settings()`](#docs:current:sql:meta:duckdb_table_functions::duckdb_settings): settings
* [`duckdb_tables()`](#docs:current:sql:meta:duckdb_table_functions::duckdb_tables): base tables
* [`duckdb_temporary_files()`](#docs:current:sql:meta:duckdb_table_functions::duckdb_temporary_files): the temporary files DuckDB has written to disk, to offload data from memory
* [`duckdb_types()`](#docs:current:sql:meta:duckdb_table_functions::duckdb_types): data types
* [`duckdb_views()`](#docs:current:sql:meta:duckdb_table_functions::duckdb_views): views

## ODBC {#guides:odbc}

### ODBC 101: A Duck Themed Guide to ODBC {#docs:current:guides:odbc:general}

#### What is ODBC? {#docs:current:guides:odbc:general::what-is-odbc}

[ODBC](https://learn.microsoft.com/en-us/sql/odbc/microsoft-open-database-connectivity-odbc?view=sql-server-ver16) which stands for Open Database Connectivity, is a standard that allows different programs to talk to different databases including, of course, DuckDB. This makes it easier to build programs that work with many different databases, which saves time as developers don't have to write custom code to connect to each database. Instead, they can use the standardized ODBC interface, which reduces development time and costs, and programs are easier to maintain. However, ODBC can be slower than other methods of connecting to a database, such as using a native driver, as it adds an extra layer of abstraction between the application and the database. Furthermore, because DuckDB is column-based and ODBC is row-based, there can be some inefficiencies when using ODBC with DuckDB.

> There are links throughout this page to the official [Microsoft ODBC documentation](https://learn.microsoft.com/en-us/sql/odbc/reference/odbc-programmer-s-reference?view=sql-server-ver16), which is a great resource for learning more about ODBC.

#### General Concepts {#docs:current:guides:odbc:general::general-concepts}

* [Handles](#::handles)
* [Connecting](#::connecting)
* [Error Handling and Diagnostics](#::error-handling-and-diagnostics)
* [Buffers and Binding](#::buffers-and-binding)

##### Handles {#docs:current:guides:odbc:general::handles}

A [handle](https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/handles?view=sql-server-ver16) is a pointer to a specific ODBC object which is used to interact with the database. There are several different types of handles, each with a different purpose, these are the environment handle, the connection handle, the statement handle, and the descriptor handle. Handles are allocated using the [`SQLAllocHandle`](https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqlallochandle-function?view=sql-server-ver16) which takes as input the type of handle to allocate, and a pointer to the handle, the driver then creates a new handle of the specified type, which it returns to the application.

The DuckDB ODBC driver has the following handle types.

###### Environment {#docs:current:guides:odbc:general::environment}



|   |   |
|:--|:--------|
| **Handle name** |[Environment](https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/environment-handles?view=sql-server-ver16) |
| **Type name** | `SQL_HANDLE_ENV` |
|   |   |
|:--|:--------|
| **Description** |Manages the environment settings for ODBC operations, and provides a global context in which to access data. |
| **Use case** | Initializing ODBC, managing driver behavior, resource allocation. |
| **Additional information** | Must be [allocated](https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/allocating-the-environment-handle?view=sql-server-ver16) once per application upon starting, and freed at the end. |

###### Connection {#docs:current:guides:odbc:general::connection}



|   |   |
|:--|:--------|
| **Handle name** |[Connection](https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/connection-handles?view=sql-server-ver16) |
| **Type name** | `SQL_HANDLE_DBC` |
|   |   |
|:--|:--------|
| **Description** |Represents a connection to a data source. Used to establish, manage, and terminate connections. Defines both the driver and the data source to use within the driver. |
| **Use case** | Establishing a connection to a database, managing the connection state. |
| **Additional information** | Multiple connection handles can be [created](https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/allocating-a-connection-handle-odbc?view=sql-server-ver16) as needed, allowing simultaneous connections to multiple data sources. *Note:* Allocating a connection handle does not establish a connection, but must be allocated first, and then used once the connection has been established. |

###### Statement {#docs:current:guides:odbc:general::statement}



|   |   |
|:--|:--------|
| **Handle name** |[Statement](https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/statement-handles?view=sql-server-ver16) |
| **Type name** | `SQL_HANDLE_STMT` |
|   |   |
|:--|:--------|
| **Description** |Handles the execution of SQL statements, as well as the returned result sets. |
| **Use case** | Executing SQL queries, fetching result sets, managing statement options. |
| **Additional information** | To facilitate the execution of concurrent queries, multiple handles can be [allocated](https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/allocating-a-statement-handle-odbc?view=sql-server-ver16) per connection. |

###### Descriptor {#docs:current:guides:odbc:general::descriptor}



|   |   |
|:--|:--------|
| **Handle name** |[Descriptor](https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/descriptor-handles?view=sql-server-ver16) |
| **Type name** | `SQL_HANDLE_DESC` |
|   |   |
|:--|:--------|
| **Description** |Describes the attributes of a data structure or parameter, and allows the application to specify the structure of data to be bound/retrieved. |
| **Use case** | Describing table structures, result sets, binding columns to application buffers. |
| **Additional information** | Used in situations where data structures need to be explicitly defined, for example during parameter binding or result set fetching. They are automatically allocated when a statement is allocated, but can also be allocated explicitly. |

##### Connecting {#docs:current:guides:odbc:general::connecting}

The first step is to connect to the data source so that the application can perform database operations. First the application must allocate an environment handle, and then a connection handle. The connection handle is then used to connect to the data source. There are two functions which can be used to connect to a data source, [`SQLDriverConnect`](https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqldriverconnect-function?view=sql-server-ver16) and [`SQLConnect`](https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqlconnect-function?view=sql-server-ver16). The former is used to connect to a data source using a connection string, while the latter is used to connect to a data source using a DSN.

###### Connection String {#docs:current:guides:odbc:general::connection-string}

A [connection string](https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/connection-strings?view=sql-server-ver16) is a string which contains the information needed to connect to a data source. It is formatted as a semicolon separated list of key-value pairs, however DuckDB currently only utilizes the DSN and ignores the rest of the parameters.

###### DSN {#docs:current:guides:odbc:general::dsn}

A DSN (_Data Source Name_) is a string that identifies a database. It can be a file path, URL, or a database name. For example, `C:\Users\me\duckdb.db` and `DuckDB` are both valid DSNs. More information on DSNs can be found on the [“Choosing a Data Source or Driver” page of the SQL Server documentation](https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/choosing-a-data-source-or-driver?view=sql-server-ver16).

##### Error Handling and Diagnostics {#docs:current:guides:odbc:general::error-handling-and-diagnostics}

All functions in ODBC return a code which represents the success or failure of the function. This allows for easy error handling, as the application can simply check the return code of each function call to determine if it was successful. When unsuccessful, the application can then use the [`SQLGetDiagRec`](https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqlgetdiagrec-function?view=sql-server-ver16) function to retrieve the error information. The following table defines the [return codes](https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/return-codes-odbc?view=sql-server-ver16):

| Return code             | Description                                        |
|-------------------------|----------------------------------------------------|
| `SQL_SUCCESS`           | The function completed successfully.                                                                                                           |
| `SQL_SUCCESS_WITH_INFO` | The function completed successfully, but additional information is available, including a warning.                                             |
| `SQL_ERROR`             | The function failed.                                                                                                                           |
| `SQL_INVALID_HANDLE`    | The handle provided was invalid, indicating a programming error, i.e., when a handle is not allocated before it is used, or is the wrong type. |
| `SQL_NO_DATA`           | The function completed successfully, but no more data is available.                                                                            |
| `SQL_NEED_DATA`         | More data is needed, such as when a parameter data is sent at execution time, or additional connection information is required.                |
| `SQL_STILL_EXECUTING`   | A function that was asynchronously executed is still executing.                                                                                |

##### Buffers and Binding {#docs:current:guides:odbc:general::buffers-and-binding}

A buffer is a block of memory used to store data. Buffers are used to store data retrieved from the database, or to send data to the database. Buffers are allocated by the application, and then bound to a column in a result set, or a parameter in a query, using the [`SQLBindCol`](https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqlbindcol-function?view=sql-server-ver16) and [`SQLBindParameter`](https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqlbindparameter-function?view=sql-server-ver16) functions. When the application fetches a row from the result set, or executes a query, the data is stored in the buffer. When the application sends a query to the database, the data in the buffer is sent to the database.

#### Setting up an Application {#docs:current:guides:odbc:general::setting-up-an-application}

The following is a step-by-step guide to setting up an application that uses ODBC to connect to a database, execute a query, and fetch the results in `C++`.

> To install the driver as well as anything else you will need follow these [instructions](#docs:current:clients:odbc:overview).

##### 1. Include the SQL Header Files {#docs:current:guides:odbc:general::1-include-the-sql-header-files}

The first step is to include the SQL header files:

```cpp
#include <sql.h>
#include <sqlext.h>
```

These files contain the definitions of the ODBC functions, as well as the data types used by ODBC. To use these header files you have to have the `unixodbc` package installed:

On macOS:

```batch
brew install unixodbc
```

On Ubuntu and Debian:

```batch
sudo apt-get install -y unixodbc-dev
```

On Fedora, CentOS, and Red Hat:

```batch
sudo yum install -y unixODBC-devel
```

Remember to include the header file location in your `CFLAGS`.

For `MAKEFILE`:

```make
CFLAGS=-I/usr/local/include
# or
CFLAGS=-I/opt/homebrew/Cellar/unixodbc/2.3.11/include
```

For `CMAKE`:

```cmake
include_directories(/usr/local/include)
# or
include_directories(/opt/homebrew/Cellar/unixodbc/2.3.11/include)
```

You also have to link the library in your `CMAKE` or `MAKEFILE`.
For `CMAKE`:

```cmake
target_link_libraries(ODBC_application /path/to/duckdb_odbc/libduckdb_odbc.dylib)
```

For `MAKEFILE`:

```make
LDLIBS=-L/path/to/duckdb_odbc/libduckdb_odbc.dylib
```

##### 2. Define the ODBC Handles and Connect to the Database {#docs:current:guides:odbc:general::2-define-the-odbc-handles-and-connect-to-the-database}

###### 2.a. Connecting with SQLConnect {#docs:current:guides:odbc:general::2a-connecting-with-sqlconnect}

Then set up the ODBC handles, allocate them, and connect to the database. First the environment handle is allocated, then the environment is set to ODBC version 3, then the connection handle is allocated, and finally the connection is made to the database. The following code snippet shows how to do this:

```cpp
SQLHANDLE env;
SQLHANDLE dbc;

SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &env);

SQLSetEnvAttr(env, SQL_ATTR_ODBC_VERSION, (void*)SQL_OV_ODBC3, 0);

SQLAllocHandle(SQL_HANDLE_DBC, env, &dbc);

std::string dsn = "DSN=duckdbmemory";
SQLConnect(dbc, (SQLCHAR*)dsn.c_str(), SQL_NTS, NULL, 0, NULL, 0);

std::cout << "Connected!" << std::endl;
```

###### 2.b. Connecting with SQLDriverConnect {#docs:current:guides:odbc:general::2b-connecting-with-sqldriverconnect}

Alternatively, you can connect to the ODBC driver using [`SQLDriverConnect`](https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqldriverconnect-function?view=sql-server-ver16).
`SQLDriverConnect` accepts a connection string in which you can configure the database using any of the available [DuckDB configuration options](#docs:current:configuration:overview).

```cpp
SQLHANDLE env;
SQLHANDLE dbc;

SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &env);

SQLSetEnvAttr(env, SQL_ATTR_ODBC_VERSION, (void*)SQL_OV_ODBC3, 0);

SQLAllocHandle(SQL_HANDLE_DBC, env, &dbc);

SQLCHAR str[1024];
SQLSMALLINT strl;
std::string dsn = "DSN=DuckDB;access_mode=READ_ONLY"
SQLDriverConnect(dbc, nullptr, (SQLCHAR*)dsn.c_str(), SQL_NTS, str, sizeof(str), &strl, SQL_DRIVER_COMPLETE)

std::cout << "Connected!" << std::endl;
```

##### 3. Adding a Query {#docs:current:guides:odbc:general::3-adding-a-query}

Now that the application is set up, we can add a query to it. First, we need to allocate a statement handle:

```cpp
SQLHANDLE stmt;
SQLAllocHandle(SQL_HANDLE_STMT, dbc, &stmt);
```

Then we can execute a query:

```cpp
SQLExecDirect(stmt, (SQLCHAR*)"SELECT * FROM integers", SQL_NTS);
```

##### 4. Fetching Results {#docs:current:guides:odbc:general::4-fetching-results}

Now that we have executed a query, we can fetch the results. First, we need to bind the columns in the result set to buffers:

```cpp
SQLLEN int_val;
SQLLEN null_val;
SQLBindCol(stmt, 1, SQL_C_SLONG, &int_val, 0, &null_val);
```

Then we can fetch the results:

```cpp
SQLFetch(stmt);
```

##### 5. Process the Results {#docs:current:guides:odbc:general::5-process-the-results}

Now that we have the results, we can do whatever we want with them. For example, we can print them:

```cpp
std::cout << "Value: " << int_val << std::endl;
```

You can also execute additional queries and perform other database operations such as inserting, updating, or deleting data.

##### 6. Free the Handles and Disconnecting {#docs:current:guides:odbc:general::6-free-the-handles-and-disconnecting}

Finally, we need to free the handles and disconnect from the database. First, we need to free the statement handle:

```cpp
SQLFreeHandle(SQL_HANDLE_STMT, stmt);
```

Then we need to disconnect from the database:

```cpp
SQLDisconnect(dbc);
```

And finally, we need to free the connection handle and the environment handle:

```cpp
SQLFreeHandle(SQL_HANDLE_DBC, dbc);
SQLFreeHandle(SQL_HANDLE_ENV, env);
```

Freeing the connection and environment handles can only be done after the connection to the database has been closed. Trying to free them before disconnecting from the database will result in an error.

#### Sample Application {#docs:current:guides:odbc:general::sample-application}

The following is a sample application that includes a `cpp` file that connects to the database, executes a query, fetches the results, and prints them. It also disconnects from the database and frees the handles, and includes a function to check the return value of ODBC functions. It also includes a `CMakeLists.txt` file that can be used to build the application.

##### Sample `.cpp` File {#docs:current:guides:odbc:general::sample-cpp-file}

```cpp
#include <iostream>
#include <sql.h>
#include <sqlext.h>

void check_ret(SQLRETURN ret, std::string msg) {
    if (ret != SQL_SUCCESS && ret != SQL_SUCCESS_WITH_INFO) {
        std::cout << ret << ": " << msg << " failed" << std::endl;
        exit(1);
    }
    if (ret == SQL_SUCCESS_WITH_INFO) {
        std::cout << ret << ": " << msg << " succeeded with info" << std::endl;
    }
}

int main() {
    SQLHANDLE env;
    SQLHANDLE dbc;
    SQLRETURN ret;

    ret = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &env);
    check_ret(ret, "SQLAllocHandle(env)");

    ret = SQLSetEnvAttr(env, SQL_ATTR_ODBC_VERSION, (void*)SQL_OV_ODBC3, 0);
    check_ret(ret, "SQLSetEnvAttr");

    ret = SQLAllocHandle(SQL_HANDLE_DBC, env, &dbc);
    check_ret(ret, "SQLAllocHandle(dbc)");

    std::string dsn = "DSN=duckdbmemory";
    ret = SQLConnect(dbc, (SQLCHAR*)dsn.c_str(), SQL_NTS, NULL, 0, NULL, 0);
    check_ret(ret, "SQLConnect");

    std::cout << "Connected!" << std::endl;

    SQLHANDLE stmt;
    ret = SQLAllocHandle(SQL_HANDLE_STMT, dbc, &stmt);
    check_ret(ret, "SQLAllocHandle(stmt)");

    ret = SQLExecDirect(stmt, (SQLCHAR*)"SELECT * FROM integers", SQL_NTS);
    check_ret(ret, "SQLExecDirect(SELECT * FROM integers)");

    SQLLEN int_val;
    SQLLEN null_val;
    ret = SQLBindCol(stmt, 1, SQL_C_SLONG, &int_val, 0, &null_val);
    check_ret(ret, "SQLBindCol");

    ret = SQLFetch(stmt);
    check_ret(ret, "SQLFetch");

    std::cout << "Value: " << int_val << std::endl;

    ret = SQLFreeHandle(SQL_HANDLE_STMT, stmt);
    check_ret(ret, "SQLFreeHandle(stmt)");

    ret = SQLDisconnect(dbc);
    check_ret(ret, "SQLDisconnect");

    ret = SQLFreeHandle(SQL_HANDLE_DBC, dbc);
    check_ret(ret, "SQLFreeHandle(dbc)");

    ret = SQLFreeHandle(SQL_HANDLE_ENV, env);
    check_ret(ret, "SQLFreeHandle(env)");
}
```

##### Sample `CMakeLists.txt` File {#docs:current:guides:odbc:general::sample-cmakeliststxt-file}

```cmake
cmake_minimum_required(VERSION 3.25)
project(ODBC_Tester_App)

set(CMAKE_CXX_STANDARD 17)
include_directories(/opt/homebrew/Cellar/unixodbc/2.3.11/include)

add_executable(ODBC_Tester_App main.cpp)
target_link_libraries(ODBC_Tester_App /duckdb_odbc/libduckdb_odbc.dylib)
```

## Performance {#guides:performance}

### Performance Guide {#docs:current:guides:performance:overview}

DuckDB aims to automatically achieve high performance by using well-chosen default configurations and having a forgiving architecture. Of course, there are still opportunities for tuning the system for specific workloads. The Performance Guide's pages contain guidelines and tips for achieving good performance when loading and processing data with DuckDB.

The guides include several microbenchmarks. You may find details about these on the [Benchmarks page](#docs:current:guides:performance:benchmarks).

### Environment {#docs:current:guides:performance:environment}

The environment where DuckDB is run has an obvious impact on performance. This page focuses on the effects of the hardware configuration and the operating system used.

#### Hardware Configuration {#docs:current:guides:performance:environment::hardware-configuration}

##### CPU {#docs:current:guides:performance:environment::cpu}

DuckDB's officially supported architectures are AMD64 (x86_64) and ARM64 (AArch64) CPU architectures. DuckDB works efficiently on both of these architectures.

> DuckDB can be compiled to other architecture such as [LoongArch](#_everywhere:morefine-m700s) and [RISC-V](#docs:current:dev:building:unofficial_and_unsupported_platforms::risc-v-architectures). However, there are no performance guarantees for these platforms.

##### Memory {#docs:current:guides:performance:environment::memory}

> **Best practice.** Aim for 1-4 GB memory per thread.

###### Minimum Required Memory {#docs:current:guides:performance:environment::minimum-required-memory}

As a rule of thumb, DuckDB requires a _minimum_ of 125 MB of memory per thread.
For example, if you use 8 threads, you need at least 1 GB of memory.
If you are working in a memory-constrained environment, consider [limiting the number of threads](#docs:current:configuration:pragmas::threads), e.g., by issuing:

```sql
SET threads = 4;
```

###### Memory for Ideal Performance {#docs:current:guides:performance:environment::memory-for-ideal-performance}

The amount of memory required for ideal performance depends on several factors, including the dataset size and the queries to execute.
Maybe surprisingly, the _queries_ have a larger effect on the memory requirement.
Workloads containing large joins over many-to-many tables yield large intermediate datasets and thus require more memory for their evaluation to fully fit into the memory.
As an approximation, aggregation-heavy workloads require 1-2 GB memory per thread and join-heavy workloads require 3-4 GB memory per thread.

###### Larger-than-Memory Workloads {#docs:current:guides:performance:environment::larger-than-memory-workloads}

DuckDB can process larger-than-memory workloads by spilling to disk.
This is possible thanks to _out-of-core_ support for grouping, joining, sorting and windowing operators.
Note that larger-than-memory workloads can be processed both in persistent mode and in in-memory mode as DuckDB still spills to disk in both modes.

##### Local Disk {#docs:current:guides:performance:environment::local-disk}

**Disk type.**
DuckDB's disk-based mode is designed to work best with SSD and NVMe disks. While HDDs are supported, they will result in low performance, especially for write operations.

**Disk-based vs. in-memory storage.**
Counter-intuitively, using a disk-based DuckDB instance can be faster than an in-memory instance due to compression.
Read more in the [“How to Tune Workloads” page](#docs:current:guides:performance:how_to_tune_workloads::persistent-vs-in-memory-tables).

**File systems.**
On Linux, [DuckDB performs best with the XFS file system](https://www.phoronix.com/review/linux-70-filesystems/4) but it also performs reasonably well with other file systems such as ext4.
On Windows, we recommend using NTFS and avoiding FAT32.

> **Note.** that DuckDB databases have built-in checksums, so integrity checks from the file system are not required to prevent data corruption.

##### Network-Attached Disks {#docs:current:guides:performance:environment::network-attached-disks}

Special care needs to be taken when using network-attached disks:

* If you are writing to disk, it is important that the disks is reliable. As a general rule of thumb, this is true for locally attached disks, and block storage in the cloud.
* If your workload is larger than memory and/or fast data loading is important, you need fast disks, preferrably SSD or NVMe with a fast connection.

With these in mind, here are two common architectures and the related considerations when you are using DuckDB's [native database format](#docs:lts:internals:storage):

**Clock storage in the colud.** DuckDB runs well on network-backed cloud disks such as [AWS EBS](https://aws.amazon.com/ebs/) for both read-only and read-write workloads.

**Network-attached storage.**
Network-attached storage can serve DuckDB for read-only workloads.
However, _it is recommended to avoid using DuckDB's native database format in read-write mode on network-attached storage (NAS)._
These setups include [NFS](https://en.wikipedia.org/wiki/Network_File_System),
network drives such as [SMB](https://en.wikipedia.org/wiki/Server_Message_Block) and
[Samba](https://en.wikipedia.org/wiki/Samba_(software)).
Based on user reports, running read-write workloads on network-attached storage can result in slow and unpredictable performance,
as well as spurious errors caused by the underlying file system.
Instead of using DuckDB's native database format, consider using the [DuckLake lakehouse format](https://ducklake.select/).

#### Operating System {#docs:current:guides:performance:environment::operating-system}

We recommend using the latest stable version of operating systems: macOS, Windows, and Linux are all well-tested and DuckDB can run on them with high performance.

##### Linux {#docs:current:guides:performance:environment::linux}

DuckDB runs on all mainstream Linux distributions released in the last ≈5 years.
If you don't have a particular preference, we recommend using Ubuntu Linux LTS due to its stability and the fact that most of DuckDB’s Linux test suite jobs run on Ubuntu workers.

###### glibc vs. musl libc {#docs:current:guides:performance:environment::glibc-vs-musl-libc}

DuckDB can be built with both [glibc](https://www.gnu.org/software/libc/) (default) and [musl libc](https://www.musl-libc.org/) (see the [build guide](#docs:current:dev:building:linux)).
However, note that DuckDB binaries built with musl libc have lower performance.
In practice, this can lead to a slowdown of more than 5× on compute-intensive workloads.
Therefore, it's recommended to use a Linux distribution with glibc for performance-oriented workloads when running DuckDB.

#### Memory Allocator {#docs:current:guides:performance:environment::memory-allocator}

If you have a many-core CPU running on a system where DuckDB ships with [`jemalloc`](#docs:current:core_extensions:jemalloc) as the default memory allocator, consider [enabling the allocator's background threads](#docs:current:core_extensions:jemalloc::background-threads).

### Data Import {#docs:current:guides:performance:import}

#### Recommended Import Methods {#docs:current:guides:performance:import::recommended-import-methods}

When importing data from other systems to DuckDB, there are several considerations to take into account.
We recommend importing using the following order:

1. For systems which are supported by a DuckDB scanner extension, it's preferable to use the scanner. DuckDB currently offers scanners for [MySQL](#docs:current:guides:database_integration:mysql), [PostgreSQL](#docs:current:guides:database_integration:postgres) and [SQLite](#docs:current:guides:database_integration:sqlite), as well as a generic [ODBC scanner](#docs:current:core_extensions:odbc:overview).
2. If there is a bulk export feature in the data source system, export the data to Parquet or CSV format, then load it using DuckDB's [Parquet](#docs:current:guides:file_formats:parquet_import) or [CSV loader](#docs:current:guides:file_formats:csv_import).
3. If the approaches above are not applicable, consider using the DuckDB [appender](#docs:current:data:appender), currently available in the C, C++, Go, Java, and Rust APIs.

#### Methods to Avoid {#docs:current:guides:performance:import::methods-to-avoid}

If possible, avoid looping row-by-row (tuple-at-a-time) in favor of bulk operations.
Performing row-by-row inserts (even with prepared statements) is detrimental to performance and will result in slow load times.

> **Best practice.** Unless your data is small (<100k rows), avoid using inserts in loops.

### Schema {#docs:current:guides:performance:schema}

#### Types {#docs:current:guides:performance:schema::types}

It is important to use the correct type for encoding columns (e.g., `BIGINT`, `DATE`, `DATETIME`). While it is always possible to use string types (` VARCHAR`, etc.) to encode more specific values, this is not recommended. Strings use more space and are slower to process in operations such as filtering, join, and aggregation.

When loading CSV files, you may leverage the CSV reader's [auto-detection mechanism](#docs:current:data:csv:auto_detection) to get the correct types for CSV inputs.

If you run in a memory-constrained environment, using smaller data types (e.g., `TINYINT`) can reduce the amount of memory and disk space required to complete a query. DuckDB’s [bitpacking compression](https://duckdb.org/2022/10/28/lightweight-compression#bit-packing) means small values stored in larger data types will not take up larger sizes on disk, but they will take up more memory during processing.

> **Best practice.** Use the most restrictive types possible when creating columns. Avoid using strings for encoding more specific data items.

##### Microbenchmark: Using Timestamps {#docs:current:guides:performance:schema::microbenchmark-using-timestamps}

We illustrate the difference in aggregation speed using the [`creationDate` column of the LDBC Comment table on scale factor 300](https://blobs.duckdb.org/data/ldbc-sf300-comments-creationDate.parquet). This table has approx. 554 million unordered timestamp values. We run a simple aggregation query that returns the average day-of-the month from the timestamps in two configurations.

First, we use a `DATETIME` to encode the values and run the query using the [`extract` datetime function](#docs:current:sql:functions:timestamp):

```sql
SELECT avg(extract('day' FROM creationDate)) FROM Comment;
```

Second, we use the `VARCHAR` type and use string operations:

```sql
SELECT avg(CAST(creationDate[9:10] AS INTEGER)) FROM Comment;
```

The results of the microbenchmark are as follows:

| Column type | Storage size | Query time |
| ----------- | -----------: | ---------: |
| `DATETIME`  |       3.3 GB |      0.9 s |
| `VARCHAR`   |       5.2 GB |      3.9 s |

The results show that using the `DATETIME` value yields smaller storage sizes and faster processing.

##### Microbenchmark: Joining on Strings {#docs:current:guides:performance:schema::microbenchmark-joining-on-strings}

We illustrate the difference caused by joining on different types by computing a self-join on the [LDBC Comment table at scale factor 100](https://blobs.duckdb.org/data/ldbc-sf100-comments.tar.zst). The table has 64-bit integer identifiers used as the `id` attribute of each row. We perform the following join operation:

```sql
SELECT count(*) AS count
FROM Comment c1
JOIN Comment c2 ON c1.ParentCommentId = c2.id;
```

In the first experiment, we use the correct (most restrictive) types, i.e., both the `id` and the `ParentCommentId` columns are defined as `BIGINT`.
In the second experiment, we define all columns with the `VARCHAR` type.
While the results of the queries are the same for both experiments, their runtimes vary significantly.
The results below show that joining on `BIGINT` columns is approx. 1.8× faster than performing the same join on `VARCHAR`-typed columns encoding the same value.

| Join column payload type | Join column schema type | Example value      | Query time |
| ------------------------ | ----------------------- | ------------------ | ---------: |
| `BIGINT`                 | `BIGINT`                | `70368755640078`   |      1.2 s |
| `BIGINT`                 | `VARCHAR`               | `'70368755640078'` |      2.1 s |

> **Best practice.** Avoid representing numeric values as strings, especially if you intend to perform e.g., join operations on them.

#### Constraints {#docs:current:guides:performance:schema::constraints}

DuckDB allows defining [constraints](#docs:current:sql:constraints) such as `UNIQUE`, `PRIMARY KEY`, and `FOREIGN KEY`. These constraints can be beneficial for ensuring data integrity but they have a negative effect on load performance as they necessitate building indexes and performing checks. Moreover, they _very rarely improve the performance of queries_ as DuckDB does not rely on these indexes for join and aggregation operators (see [indexing](#docs:current:guides:performance:indexing) for more details).

> **Best practice.** Do not define constraints unless your goal is to ensure data integrity.

##### Microbenchmark: The Effect of Primary Keys {#docs:current:guides:performance:schema::microbenchmark-the-effect-of-primary-keys}

We illustrate the effect of using primary keys with the [LDBC Comment table at scale factor 300](https://blobs.duckdb.org/data/ldbc-sf300-comments.tar.zst).
This table has approx. 554 million entries.
In the first experiment, we create the schema *with* a primary key, then load the data.
In the second experiment, we create the schema *without* a primary key, then load the data.
In the third experiment, we create the schema *without* a primary key, load the data and then add the primary key constraint.
In all cases, we take the data from `.csv.gz` files, and measure the time required to perform the loading.

|                  Operation                    | Execution time |
|-----------------------------------------------|---------------:|
| Load with primary key                         |        461.6 s |
| Load without primary key                      |        121.0 s |
| Load without primary key then add primary key |        242.0 s |

For this dataset, primary keys will only have a (small) positive effect on highly selective queries such as when filtering on a single identifier.
Defining primary keys (or indexes) will not have an effect on join and aggregation operators.

> **Best practice.** For best bulk load performance, avoid primary key constraints.
> If they are required, define them after the bulk loading step.

### Indexing {#docs:current:guides:performance:indexing}

DuckDB has two types of indexes: zonemaps and ART indexes.

#### Zonemaps {#docs:current:guides:performance:indexing::zonemaps}

DuckDB automatically creates [zonemaps](https://en.wikipedia.org/wiki/Block_Range_Index) (also known as min-max indexes) for the columns of all [general-purpose data types](#docs:current:sql:data_types:overview::general-purpose-data-types).
Operations like predicate pushdown into scan operators and computing aggregations use zonemaps.
If a filter criterion (like `WHERE column1 = 123`) is in use, DuckDB can skip any row group whose min-max range does not contain that filter value (e.g., it can omit a block with a min-max range of 1000 to 2000 when comparing for `= 123` or `< 400`).

##### The Effect of Ordering on Zonemaps {#docs:current:guides:performance:indexing::the-effect-of-ordering-on-zonemaps}

The more ordered the data within a column, the more valuable the zonemap indexes will be.
For example, a column could contain a random number on every row in the worst case.
Then, DuckDB will likely be unable to skip any row groups.
If you query specific columns with selective filters, it is best to pre-order data by those columns when inserting it.
Even an imperfect ordering will still be helpful.
The best case of ordered data commonly arises with `DATETIME` columns.

##### Microbenchmark: The Effect of Ordering {#docs:current:guides:performance:indexing::microbenchmark-the-effect-of-ordering}

For an example, let’s repeat the [microbenchmark for timestamps](#docs:current:guides:performance:schema::microbenchmark-using-timestamps) with an ordered timestamp column using an ascending order vs. an unordered one.

| Column type | Ordered | Storage size | Query time |
|---|---|--:|--:|
| `DATETIME` | yes | 1.3 GB | 0.6 s |
| `DATETIME` | no  | 3.3 GB | 0.9 s |

The results show that simply keeping the column order allows for improved compression, yielding a 2.5× smaller storage size.
It also allows the computation to be 1.5× faster.

##### Ordered Integers {#docs:current:guides:performance:indexing::ordered-integers}

Another practical way to exploit ordering is to use the `INTEGER` type with automatic increments rather than `UUID` for columns queried using selective filters.
In a scenario where a table contains out-of-order `UUID`s, DuckDB has to scan many row groups to find a specific `UUID` value.
An ordered `INTEGER` column allows skipping all row groups except those containing the value.

#### ART Indexes {#docs:current:guides:performance:indexing::art-indexes}

DuckDB allows defining [Adaptive Radix Tree (ART) indexes](https://db.in.tum.de/~leis/papers/ART.pdf) in two ways.
First, such an index is created implicitly for columns with `PRIMARY KEY`, `FOREIGN KEY`, and `UNIQUE` [constraints](#docs:current:guides:performance:schema::constraints).
Second, explicitly running the [`CREATE INDEX`](#docs:current:sql:indexes) statement creates an ART index on the target column(s).

The tradeoffs of having an ART index on a column are as follows:

1. ART indexes enable constraint checking during changes (inserts, updates, and deletes).
2. Changes on indexed tables perform worse than their non-indexed counterparts.
That is because of index maintenance for these operations.
3. For some use cases, _single-column ART indexes_ improve the performance of highly selective queries using the indexed column.

An ART index does not affect the performance of join, aggregation, and sorting queries.

##### ART Index Scans {#docs:current:guides:performance:indexing::art-index-scans}

ART index scans probe a single-column ART index for the requested data instead of scanning a table sequentially.
Probing can improve the performance of some queries.
DuckDB will try to use an index scan for equality and `IN(...)` conditions.
It also pushes dynamic filters, e.g., from hash joins, into the scan, allowing dynamic index scans on these filters.

Indexes are only eligible for index scans if they index a single column without expressions.
E.g., the following index is eligible for index scans:

```sql
CREATE INDEX idx ON tbl (col1);
```

E.g., the following two indexes are **NOT** eligible for index scans:

```sql
CREATE INDEX idx_multi_column ON tbl (col1, col2);
CREATE INDEX idx_expr ON tbl (col1 + 1);
```

The default threshold for index scans is `MAX(2048, 0.001 * table_cardinality)`.
You can configure this threshold via `index_scan_percentage` and `index_scan_max_count`, or disable them by setting these values to zero.
When in doubt, use [`EXPLAIN ANALYZE`](#docs:current:guides:meta:explain_analyze) to verify that your query plan uses the index scan.

##### Indexes and Memory {#docs:current:guides:performance:indexing::indexes-and-memory}

DuckDB registers index memory through its buffer manager.
However, these index buffers are not yet buffer-managed.
That means DuckDB does not yet destroy any index buffers if it has to evict memory.
Thus, indexes can take up a significant portion of DuckDB's available memory, potentially affecting the performance of memory-intensive queries.
Re-attaching (` DETACH` + `ATTACH`) the database containing indexes can mitigate this effect, as we deserialize index memory lazily.
Disabling index scans and re-attaching after changes can further decrease the impact of indexes on DuckDB's available memory.

##### Indexes and Opening Databases {#docs:current:guides:performance:indexing::indexes-and-opening-databases}

Indexes are serialized to disk and deserialized lazily, i.e., when reopening the database.
Operations using the index will only load the required parts of the index.
Therefore, having an index will not cause any slowdowns when opening an existing database.

> **Best practice.** We recommend following these guidelines:
>
> * Only use primary keys, foreign keys, or unique constraints, if these are necessary for enforcing constraints on your data.
> * Do not define explicit indexes unless you have highly selective queries and enough memory available.
> * If you define an ART index, do so after bulk loading the data to the table. Adding an index prior to loading, either explicitly or via primary/foreign keys, is [detrimental to load performance](#docs:current:guides:performance:schema::microbenchmark-the-effect-of-primary-keys).

### Join Operations {#docs:current:guides:performance:join_operations}

#### How to Force a Join Order {#docs:current:guides:performance:join_operations::how-to-force-a-join-order}

DuckDB has a cost-based query optimizer, which uses statistics in the base tables (stored in a DuckDB database or Parquet files) to estimate the cardinality of operations.

##### Turn off the Join Order Optimizer {#docs:current:guides:performance:join_operations::turn-off-the-join-order-optimizer}

To turn off the join order optimizer, set the following [`PRAGMA`s](#docs:current:configuration:pragmas):

```sql
SET disabled_optimizers = 'join_order,build_side_probe_side';
```

This disables both the join order optimizer and left/right swapping for joins.
This way, DuckDB builds a left-deep join tree following the order of `JOIN` clauses.

```sql
SELECT ...
FROM ...
JOIN ...  -- this join is performed first
JOIN ...; -- this join is performed second
```

Once the query in question has been executed, turn back the optimizers with the following command:

```sql
SET disabled_optimizers = '';
```

##### Create Temporary Tables {#docs:current:guides:performance:join_operations::create-temporary-tables}

To force a particular join order, you can break up the query into multiple queries, with each creating a temporary table:

```sql
CREATE OR REPLACE TEMPORARY TABLE t1 AS
    ...;

-- join on the result of the first query, t1
CREATE OR REPLACE TEMPORARY TABLE t2 AS
    SELECT * FROM t1 ...;

-- compute the final result using t2
SELECT * FROM t1 ...
```

To clean up, drop the interim tables:

```sql
DROP TABLE IF EXISTS t1;
DROP TABLE IF EXISTS t2;
```

### File Formats {#docs:current:guides:performance:file_formats}

#### Handling Parquet Files {#docs:current:guides:performance:file_formats::handling-parquet-files}

DuckDB has advanced support for Parquet files, which includes [directly querying Parquet files](https://duckdb.org/2021/06/25/querying-parquet).
When deciding on whether to query these files directly or to first load them to the database, you need to consider several factors.

##### Reasons for Querying Parquet Files {#docs:current:guides:performance:file_formats::reasons-for-querying-parquet-files}

**Availability of basic statistics:** Parquet files use a columnar storage format and contain basic statistics such as [zonemaps](#docs:current:guides:performance:indexing::zonemaps). Thanks to these features, DuckDB can leverage optimizations such as projection and filter pushdown on Parquet files. Therefore, workloads that combine projection, filtering, and aggregation tend to perform quite well when run on Parquet files.

**Storage considerations:** Loading the data from Parquet files will require approximately the same amount of space for the DuckDB database file. Therefore, if the available disk space is constrained, it is worth running the queries directly on Parquet files.

##### Reasons against Querying Parquet Files {#docs:current:guides:performance:file_formats::reasons-against-querying-parquet-files}

**Lack of advanced statistics:** The DuckDB database format has the [hyperloglog statistics](https://en.wikipedia.org/wiki/HyperLogLog) that Parquet files do not have. These improve the accuracy of cardinality estimates, and are especially important if the queries contain a large number of join operators.

**Tip.** If you find that DuckDB produces a suboptimal join order on Parquet files, try loading the Parquet files to DuckDB tables. The improved statistics likely help obtain a better join order.

**Repeated queries:** If you plan to run multiple queries on the same dataset, it is worth loading the data into DuckDB. The queries will always be somewhat faster, which over time amortizes the initial load time.

**High decompression times:** Some Parquet files are compressed using heavyweight compression algorithms such as gzip. In these cases, querying the Parquet files will necessitate an expensive decompression time every time the file is accessed. Meanwhile, lightweight compression methods like Snappy, LZ4, and zstd, are faster to decompress. You may use the [`parquet_metadata` function](#docs:current:data:parquet:metadata::parquet-metadata) to find out the compression algorithm used.

###### Microbenchmark: Running TPC-H on a DuckDB Database vs. Parquet {#docs:current:guides:performance:file_formats::microbenchmark-running-tpc-h-on-a-duckdb-database-vs-parquet}

The queries on the [TPC-H benchmark](#docs:current:core_extensions:tpch) run approximately 1.1-5.0× slower on Parquet files than on a DuckDB database.

> **Best practice.** If you have the storage space available, and have a join-heavy workload and/or plan to run many queries on the same dataset, load the Parquet files into the database first. The compression algorithm and the row group sizes in the Parquet files have a large effect on performance: study these using the [`parquet_metadata` function](#docs:current:data:parquet:metadata::parquet-metadata).

##### The Effect of Row Group Sizes {#docs:current:guides:performance:file_formats::the-effect-of-row-group-sizes}

DuckDB works best on Parquet files with row groups of 100K-1M rows each. The reason for this is that DuckDB can only [parallelize over row groups](#docs:current:guides:performance:how_to_tune_workloads::parallelism-multi-core-processing) – so if a Parquet file has a single giant row group it can only be processed by a single thread. You can use the [`parquet_metadata` function](#docs:current:data:parquet:metadata::parquet-metadata) to figure out how many row groups a Parquet file has. When writing Parquet files, use the [`row_group_size`](#docs:current:sql:statements:copy::parquet-options) option.

###### Microbenchmark: Running Aggregation Query at Different Row Group Sizes {#docs:current:guides:performance:file_formats::microbenchmark-running-aggregation-query-at-different-row-group-sizes}

We run a simple aggregation query over Parquet files using different row group sizes, selected between 960 and 1,966,080. The results are as follows.

| Row group size | Execution time |
|---------------:|---------------:|
| 960            | 8.77 s         |
| 1920           | 8.95 s         |
| 3840           | 4.33 s         |
| 7680           | 2.35 s         |
| 15360          | 1.58 s         |
| 30720          | 1.17 s         |
| 61440          | 0.94 s         |
| 122880         | 0.87 s         |
| 245760         | 0.93 s         |
| 491520         | 0.95 s         |
| 983040         | 0.97 s         |
| 1966080        | 0.88 s         |

The results show that row group sizes <5,000 have a strongly detrimental effect, making runtimes more than 5-10× larger than ideally-sized row groups, while row group sizes between 5,000 and 20,000 are still 1.5-2.5× off from best performance. Above row group size of 100,000, the differences are small: the gap is about 10% between the best and the worst runtime.

##### Parquet File Sizes {#docs:current:guides:performance:file_formats::parquet-file-sizes}

DuckDB can also parallelize across multiple Parquet files. It is advisable to have at least as many total row groups across all files as there are CPU threads. For example, with a machine having 10 threads, both 10 files with 1 row group or 1 file with 10 row groups will achieve full parallelism. It is also beneficial to keep the size of individual Parquet files moderate.

> **Best practice.** The ideal range is between 100 MB and 10 GB per individual Parquet file.

##### Hive Partitioning for Filter Pushdown {#docs:current:guides:performance:file_formats::hive-partitioning-for-filter-pushdown}

When querying many files with filter conditions, performance can be improved by using a [Hive-format folder structure](#docs:current:data:partitioning:hive_partitioning) to partition the data along the columns used in the filter condition. DuckDB will only need to read the folders and files that meet the filter criteria. This can be especially helpful when querying remote files.

##### More Tips on Reading and Writing Parquet Files {#docs:current:guides:performance:file_formats::more-tips-on-reading-and-writing-parquet-files}

For tips on reading and writing Parquet files, see the [Parquet Tips page](#docs:current:data:parquet:tips).

#### Loading CSV Files {#docs:current:guides:performance:file_formats::loading-csv-files}

CSV files are often distributed in compressed format such as GZIP archives (` .csv.gz`). DuckDB can decompress these files on the fly. In fact, this is typically faster than decompressing the files first and loading them due to reduced IO.

| Schema | Load time |
|---|--:|
| Load from GZIP-compressed CSV files (` .csv.gz`) | 107.1 s |
| Decompressing (using parallel `gunzip`) and loading from decompressed CSV files | 121.3 s |

##### Loading Many Small CSV Files {#docs:current:guides:performance:file_formats::loading-many-small-csv-files}

The [CSV reader](#docs:current:data:csv:overview) runs the [CSV sniffer](https://duckdb.org/2023/10/27/csv-sniffer) on all files. For many small files, this may cause an unnecessarily high overhead.
A potential optimization to speed this up is to turn the sniffer off. Assuming that all files have the same CSV dialect and column names/types, get the sniffer options as follows:

```sql
.mode line
SELECT Prompt FROM sniff_csv('part-0001.csv');
```

```text
Prompt = FROM read_csv('file_path.csv', auto_detect=false, delim=',', quote='"', escape='"', new_line='\n', skip=0, header=true, columns={'hello': 'BIGINT', 'world': 'VARCHAR'});
```

Then, you can adjust `read_csv` command, by e.g., applying [filename expansion (globbing)](#docs:current:sql:functions:pattern_matching::globbing), and run with the rest of the options detected by the sniffer:

```sql
FROM read_csv('part-*.csv', auto_detect=false, delim=',', quote='"', escape='"', new_line='\n', skip=0, header=true, columns={'hello': 'BIGINT', 'world': 'VARCHAR'});
```

### Tuning Workloads {#docs:current:guides:performance:how_to_tune_workloads}

#### The `preserve_insertion_order` Option {#docs:current:guides:performance:how_to_tune_workloads::the-preserve_insertion_order-option}

When importing or exporting datasets (from/to the Parquet or CSV formats), which are much larger than the available memory, an out of memory error may occur:

```console
Out of Memory Error: failed to allocate data of size ... (.../... used)
```

In these cases, consider setting the [`preserve_insertion_order` configuration option](#docs:current:configuration:overview) to `false`:

```sql
SET preserve_insertion_order = false;
```

This allows the system to re-order any results that do not contain `ORDER BY` clauses, potentially reducing memory usage.

#### Parallelism (Multi-Core Processing) {#docs:current:guides:performance:how_to_tune_workloads::parallelism-multi-core-processing}

##### The Effect of Row Groups on Parallelism {#docs:current:guides:performance:how_to_tune_workloads::the-effect-of-row-groups-on-parallelism}

DuckDB parallelizes the workload based on _[row groups](#docs:current:internals:storage::row-groups),_ i.e., groups of rows that are stored together at the storage level.
The default row group size in DuckDB's database format is 122,880 rows.
Parallelism starts at the level of row groups, therefore, for a query to run on _k_ threads, it needs to scan at least _k_ \* 122,880 rows.

The row group size can be specified as an option of the `ATTACH` statement: 

```sql
ATTACH '/tmp/somefile.db' AS db (ROW_GROUP_SIZE 16384);
```

The [performance considerations when choosing `ROW_GROUP_SIZE` for Parquet files](#docs:current:data:parquet:tips::selecting-a-row_group_size) apply verbatim to DuckDB's own database format.

##### Too Many Threads {#docs:current:guides:performance:how_to_tune_workloads::too-many-threads}

Note that in certain cases DuckDB may launch _too many threads_ (e.g., due to HyperThreading), which can lead to slowdowns. In these cases, it’s worth manually limiting the number of threads using [`SET threads = X`](#docs:current:configuration:pragmas::threads).

#### Larger-than-Memory Workloads (Out-of-Core Processing) {#docs:current:guides:performance:how_to_tune_workloads::larger-than-memory-workloads-out-of-core-processing}

A key strength of DuckDB is support for larger-than-memory workloads, i.e., it is able to process datasets that are larger than the available system memory (also known as _out-of-core processing_).
It can also run queries where the intermediate results cannot fit into memory.
This section explains the prerequisites, scope, and known limitations of larger-than-memory processing in DuckDB.

##### Spilling to Disk {#docs:current:guides:performance:how_to_tune_workloads::spilling-to-disk}

Larger-than-memory workloads are supported by spilling to disk.
With the default configuration, DuckDB creates the `⟨database_file_name⟩.tmp`{:.language-sql .highlight} temporary directory (in persistent mode) or the `.tmp`{:.language-sql .highlight} directory (in in-memory mode). This directory can be changed using the [`temp_directory` configuration option](#docs:current:configuration:pragmas::temp-directory-for-spilling-data-to-disk), e.g.:

```sql
SET temp_directory = '/path/to/temp_dir.tmp/';
```

##### Blocking Operators {#docs:current:guides:performance:how_to_tune_workloads::blocking-operators}

Some operators cannot output a single row until the last row of their input has been seen.
These are called _blocking operators_ as they require their entire input to be buffered,
and are the most memory-intensive operators in relational database systems.
The main blocking operators are the following:

- _grouping:_ [`GROUP BY`](#docs:current:sql:query_syntax:groupby)
- _joining:_ [`JOIN`](#docs:current:sql:query_syntax:from::joins)
- _sorting:_ [`ORDER BY`](#docs:current:sql:query_syntax:orderby)
- _windowing:_ [`OVER ... (PARTITION BY ... ORDER BY ...)`](#docs:current:sql:functions:window_functions)

DuckDB supports larger-than-memory processing for all of these operators.

##### Limitations {#docs:current:guides:performance:how_to_tune_workloads::limitations}

DuckDB strives to always complete workloads even if they are larger-than-memory.
That said, there are some limitations at the moment:

- If multiple blocking operators appear in the same query, DuckDB may still throw an out-of-memory exception due to the complex interplay of these operators.
- Some [aggregate functions](#docs:current:sql:functions:aggregates), such as `list()` and `string_agg()`, do not support offloading to disk.
- [Aggregate functions that use sorting](#docs:current:sql:functions:aggregates::order-by-clause-in-aggregate-functions) are holistic, i.e., they need all inputs before the aggregation can start. As DuckDB cannot yet offload some complex intermediate aggregate states to disk, these functions can cause an out-of-memory exception when run on large datasets.
- The `PIVOT` operation [internally uses the `list()` function](#docs:current:sql:statements:pivot::internals), therefore it is subject to the same limitation.

#### Profiling {#docs:current:guides:performance:how_to_tune_workloads::profiling}

If your queries are not performing as well as expected, it’s worth studying their query plans:

- Use [`EXPLAIN`](#docs:current:guides:meta:explain) to print the physical query plan without running the query.
- Use [`EXPLAIN ANALYZE`](#docs:current:guides:meta:explain_analyze) to run and profile the query. This will show the CPU time that each step in the query takes. Note that due to multi-threading, adding up the individual times will be larger than the total query processing time.

Query plans can point to the root of performance issues. A few general directions:

- Avoid nested loop joins in favor of hash joins.
- A scan that does not include a filter pushdown for a filter condition that is later applied performs unnecessary IO. Try rewriting the query to apply a pushdown.
- Bad join orders where the cardinality of an operator explodes to billions of tuples should be avoided at all costs.

#### Prepared Statements {#docs:current:guides:performance:how_to_tune_workloads::prepared-statements}

[Prepared statements](#docs:current:sql:query_syntax:prepared_statements) can improve performance when running the same query many times, but with different parameters. When a statement is prepared, it completes several of the initial portions of the query execution process (parsing, planning, etc.) and caches their output. When it is executed, those steps can be skipped, improving performance. This is beneficial mostly for repeatedly running small queries (with a runtime of < 100ms) with different sets of parameters.

Note that it is not a primary design goal for DuckDB to quickly execute many small queries concurrently. Rather, it is optimized for running larger, less frequent queries.

#### Querying Remote Files {#docs:current:guides:performance:how_to_tune_workloads::querying-remote-files}

DuckDB uses synchronous IO when reading remote files. This means that each DuckDB thread can make at most one HTTP request at a time. If a query must make many small requests over the network, increasing DuckDB's [`threads` setting](#docs:current:configuration:pragmas::threads) to larger than the total number of CPU cores (approx. 2-5 times CPU cores) can improve parallelism and performance.

##### Avoid Reading Unnecessary Data {#docs:current:guides:performance:how_to_tune_workloads::avoid-reading-unnecessary-data}

The main bottleneck in workloads reading remote files is likely to be the IO. This means that minimizing the unnecessarily read data can be highly beneficial.

Some basic SQL tricks can help with this:

- Avoid `SELECT *`. Instead, only select columns that are actually used. DuckDB will try to only download the data it actually needs.
- Apply filters on remote Parquet files when possible. DuckDB can use these filters to reduce the amount of data that is scanned.
- Either [sort](#docs:current:sql:query_syntax:orderby) or [partition](#docs:current:data:partitioning:partitioned_writes) data by columns that are regularly used for filters: this increases the effectiveness of the filters in reducing IO.

To inspect how much remote data is transferred for a query, [`EXPLAIN ANALYZE`](#docs:current:guides:meta:explain_analyze) can be used to print out the total number of requests and total data transferred for queries on remote files.

##### Caching {#docs:current:guides:performance:how_to_tune_workloads::caching}

Starting with version 1.3.0, DuckDB supports caching remote data. To inspect the content of the external file cache, run:

```sql
FROM duckdb_external_file_cache();
```

#### Best Practices for Using Connections {#docs:current:guides:performance:how_to_tune_workloads::best-practices-for-using-connections}

DuckDB will perform best when reusing the same database connection many times. Disconnecting and reconnecting on every query will incur some overhead, which can reduce performance when running many small queries. DuckDB also caches some data and metadata in memory, and that cache is lost when the last open connection is closed. Frequently, a single connection will work best, but a connection pool may also be used.

Using multiple connections can parallelize some operations, although it is typically not necessary. DuckDB does attempt to parallelize as much as possible within each individual query, but it is not possible to parallelize in all cases. Making multiple connections can process more operations concurrently. This can be more helpful if DuckDB is not CPU limited, but instead bottlenecked by another resource like network transfer speed.

#### Persistent vs. In-Memory Tables {#docs:current:guides:performance:how_to_tune_workloads::persistent-vs-in-memory-tables}

DuckDB supports [lightweight compression techniques](https://duckdb.org/2022/10/28/lightweight-compression). By default, compression is only applied on persistent (on-disk) databases and not on in-memory tables.

In some cases, this can result in counter-intuitive performance results where queries are faster on on-disk tables compared to in-memory ones. Let's take Q1 of the [TPC-H workload](#docs:current:core_extensions:tpch) for example on the SF30 dataset:

```sql
CALL dbgen(sf = 30);
.timer on
PRAGMA tpch(1);
```

We run this script using three DuckDB prompts:

| Database setup              | DuckDB prompt                                               | Execution time |
| --------------------------- | ----------------------------------------------------------- | -------------: |
| In-memory DB (uncompressed) | `duckdb`                                                    |         4.22 s |
| In-memory DB (compressed)   | `duckdb -cmd "ATTACH ':memory:' AS db (COMPRESS); USE db;"` |         0.55 s |
| Persistent DB (compressed)  | `duckdb tpch-sf30.db`                                       |         0.56 s |

We can observe that the compressed databases are about 8× faster compared to the uncompressed in-memory database.

### My Workload Is Slow {#docs:current:guides:performance:my_workload_is_slow}

If you find that your workload in DuckDB is slow, we recommend performing the following checks. More detailed instructions are linked for each point.

1. Do you have enough memory? DuckDB works best if you have [1-4 GB memory per thread](#docs:current:guides:performance:environment::cpu-and-memory).
1. Is your system maybe overcommitting memory, forcing the operating system to swap? Try _lowering_ the amount of memory available from the default [80% of the total RAM](#docs:current:operations_manual:limits) using `SET memory_limit = '...';`. While this sounds counter-intuitive, it can sometimes improve query performance, especially in memory-constrained environments where other processes are likely using more than 20% of the total system memory.
1. Are you using a fast disk? Network-attached disks (such as cloud block storage) cause write-intensive and [larger than memory](#docs:current:guides:performance:how_to_tune_workloads::spilling-to-disk) workloads to slow down. For running such workloads in cloud environments, it is recommended to use instance-attached storage (NVMe SSDs).
1. Are you using indexes or constraints (primary key, unique, etc.)? If possible, try [disabling them](#docs:current:guides:performance:schema::indexing), which boosts load and update performance.
1. Are you using the correct types? For example, [use `TIMESTAMP` to encode datetime values](#docs:current:guides:performance:schema::types).
1. Are you reading from Parquet files? If so, do they have [row group sizes between 100k and 1M](#docs:current:guides:performance:file_formats::the-effect-of-row-group-sizes) and file sizes between 100 MB to 10 GB?
1. Does the query plan look right? Study it with [`EXPLAIN`](#docs:current:guides:performance:how_to_tune_workloads::profiling).
1. Is the workload running [in parallel](#docs:current:guides:performance:how_to_tune_workloads::parallelism)? Use `htop` or the operating system's task manager to observe this.
1. Is DuckDB using too many threads? Try [limiting the amount of threads](#docs:current:guides:performance:how_to_tune_workloads::parallelism-multi-core-processing).

Are you aware of other common issues? If so, please click the _Report content issue_ link below and describe them along with their workarounds.

### Benchmarks {#docs:current:guides:performance:benchmarks}

For several of the recommendations in our performance guide, we use microbenchmarks to back up our claims. For these benchmarks, we use datasets from the [TPC-H benchmark](#docs:current:core_extensions:tpch) and the [LDBC Social Network Benchmark’s BI workload](https://github.com/ldbc/ldbc_snb_bi/blob/main/snb-bi-pre-generated-data-sets.md#compressed-csvs-in-the-composite-merged-fk-format).



#### Datasets {#docs:current:guides:performance:benchmarks::datasets}

We use the [LDBC BI SF300 dataset's Comment table](https://blobs.duckdb.org/data/ldbc-sf300-comments.tar.zst) (20 GB `.tar.zst` archive, 21 GB when decompressed into `.csv.gz` files),
while others use the same table's [`creationDate` column](https://blobs.duckdb.org/data/ldbc-sf300-comments-creationDate.parquet) (4 GB `.parquet` file).

The TPC datasets used in the benchmark are generated with the DuckDB [tpch extension](#docs:current:core_extensions:tpch).

#### A Note on Benchmarks {#docs:current:guides:performance:benchmarks::a-note-on-benchmarks}

Running [fair benchmarks is difficult](https://hannes.muehleisen.org/publications/DBTEST2018-performance-testing.pdf), especially when performing system-to-system comparison.
When running benchmarks on DuckDB, please make sure you are using the latest version (preferably the [preview build](https://duckdb.org/install/index.html?version=main)).
If in doubt about your benchmark results, feel free to contact us at `gabor@duckdb.org`.

#### Disclaimer on Benchmarks {#docs:current:guides:performance:benchmarks::disclaimer-on-benchmarks}

Note that the benchmark results presented in this guide do not constitute official TPC or LDBC benchmark results. Instead, they merely use the datasets of and some queries provided by the TPC-H and the LDBC BI benchmark frameworks, and omit other parts of the workloads such as updates.

### Working with Huge Databases {#docs:current:guides:performance:working_with_huge_databases}

This page contains information for working with huge DuckDB database files.
While most DuckDB databases are well below 1 TB,
in our [2024 user survey](https://duckdb.org/2024/10/04/duckdb-user-survey-analysis#dataset-sizes), 1% of respondents used DuckDB files of 2 TB or more (corresponding to roughly 10 TB of CSV files).

DuckDB's [native database format](#docs:current:internals:storage) supports huge database files without any practical restrictions, however, there are a few things to keep in mind when working with huge database files.

1. Object storage systems have lower limits on file sizes than block-based storage systems. For example, [AWS S3 limits the file size to 5 TB](https://aws.amazon.com/s3/faqs/).

2. Checkpointing a DuckDB database can be slow. For example, checkpointing after adding a few rows to a table in the [TPC-H](#docs:current:core_extensions:tpch) SF1000 database takes approximately 5 seconds.

3. On block-based storage, the file system has a significant effect on performance when working with large files. On Linux, DuckDB performs best with XFS on large files.

For storing large amounts of data, consider using the [DuckLake lakehouse format](https://ducklake.select/).

## Python {#guides:python}

### Installing the Python Client {#docs:current:guides:python:install}

#### Installing via Pip {#docs:current:guides:python:install::installing-via-pip}

The latest release of the Python client can be installed using `pip`.

```batch
pip install duckdb
```

The pre-release Python client (known as the “preview” or “nightly” build) can be installed using `--pre`.

```batch
pip install duckdb --upgrade --pre
```

#### Installing from Source {#docs:current:guides:python:install::installing-from-source}

The latest Python client can be installed from source from the [`tools/pythonpkg` directory in the DuckDB GitHub repository](https://github.com/duckdb/duckdb/tree/main/tools/pythonpkg).

```bash
BUILD_PYTHON=1 GEN=ninja make
cd tools/pythonpkg
python setup.py install
```

For detailed instructions on how to compile DuckDB from source, see the [Building guide](#docs:current:dev:building:python).

### Executing SQL in Python {#docs:current:guides:python:execute_sql}

SQL queries can be executed using the `duckdb.sql` function.

```python
import duckdb

duckdb.sql("SELECT 42").show()
```

By default this will create a relation object. The result can be converted to various formats using the result conversion functions. For example, the `fetchall` method can be used to convert the result to Python objects.

```python
results = duckdb.sql("SELECT 42").fetchall()
print(results)
```

```text
[(42,)]
```

Several other result objects exist. For example, you can use `df` to convert the result to a Pandas DataFrame.

```python
results = duckdb.sql("SELECT 42").df()
print(results)
```

```text
    42
 0  42
```

By default, a global in-memory connection will be used. Any data stored in files will be lost after shutting down the program. A connection to a persistent database can be created using the `connect` function.

After connecting, SQL queries can be executed using the `sql` command.

```python
con = duckdb.connect("file.db")
con.sql("CREATE TABLE integers (i INTEGER)")
con.sql("INSERT INTO integers VALUES (42)")
con.sql("SELECT * FROM integers").show()
```

### Jupyter Notebooks {#docs:current:guides:python:jupyter}

DuckDB's Python client can be used directly in Jupyter notebooks with no additional configuration if desired.
However, additional libraries can be used to simplify SQL query development.
This guide will describe how to utilize those additional libraries.
See other guides in the Python section for how to use DuckDB and Python together.

In this example, we use the [JupySQL](https://github.com/ploomber/jupysql) package. This example workflow is also available as a [Google Colab notebook](https://colab.research.google.com/drive/1bNfU8xRTu8MQJnCbyyDRxvptklLb0ExH?usp=sharing).

#### Library Installation {#docs:current:guides:python:jupyter::library-installation}

Four additional libraries improve the DuckDB experience in Jupyter notebooks.

1. [jupysql](https://github.com/ploomber/jupysql): Convert a Jupyter code cell into a SQL cell
2. [Pandas](https://github.com/pandas-dev/pandas): Clean table visualizations and compatibility with other analysis
3. [matplotlib](https://github.com/matplotlib/matplotlib): Plotting with Python
4. [duckdb-engine (DuckDB SQLAlchemy driver)](https://github.com/Mause/duckdb_engine): Used by SQLAlchemy to connect to DuckDB (optional)

Run these `pip install` commands from the command line if Jupyter Notebook is not yet installed. Otherwise, see Google Colab link above for an in-notebook example:

```batch
pip install duckdb
```

Install Jupyter Notebook:

```batch
pip install notebook
```

Or JupyterLab:

```batch
pip install jupyterlab
```

Install supporting libraries:

```batch
pip install jupysql pandas matplotlib duckdb-engine
```

#### Library Import and Configuration {#docs:current:guides:python:jupyter::library-import-and-configuration}

Open a Jupyter Notebook and import the relevant libraries.

Set configurations on jupysql to directly output data to Pandas and to simplify the output that is printed to the notebook.

```python
%config SqlMagic.autopandas = True
%config SqlMagic.feedback = False
%config SqlMagic.displaycon = False
```

##### Connecting to DuckDB Natively {#docs:current:guides:python:jupyter::connecting-to-duckdb-natively}

To connect to DuckDB, run:

```python
import duckdb
import pandas as pd

%load_ext sql
conn = duckdb.connect()
%sql conn --alias duckdb
```

> **Warning.** [Variables](#docs:current:sql:statements:set_variable) are not recognized within a native DuckDB connection.

##### Connecting to DuckDB via SQLAlchemy {#docs:current:guides:python:jupyter::connecting-to-duckdb-via-sqlalchemy}

Alternatively, you can connect to DuckDB via SQLAlchemy using `duckdb_engine`. See the [performance and feature differences](https://jupysql.readthedocs.io/en/latest/tutorials/duckdb-native-sqlalchemy.html).

```python
import duckdb
import pandas as pd
# No need to import duckdb_engine
#  jupysql will auto-detect the driver needed based on the connection string!

# Import jupysql Jupyter extension to create SQL cells
%load_ext sql
```
Either connect to a new [in-memory DuckDB](#docs:current:clients:python:dbapi::in-memory-connection), the [default connection](#docs:current:clients:python:dbapi::default-connection), or a file-backed database:

```sql
%sql duckdb:///:memory:
```

```sql
%sql duckdb:///:default:
```

```sql
%sql duckdb:///path/to/file.db
```

> The `%sql` command and `duckdb.sql` share the same [default connection](#docs:current:clients:python:dbapi) if you provide `duckdb:///:default:` as the SQLAlchemy connection string.

#### Querying DuckDB {#docs:current:guides:python:jupyter::querying-duckdb}

Single line SQL queries can be run using `%sql` at the start of a line. Query results will be displayed as a Pandas DataFrame.

```sql
%sql SELECT 'Off and flying!' AS a_duckdb_column;
```

An entire Jupyter cell can be used as a SQL cell by placing `%%sql` at the start of the cell. Query results will be displayed as a Pandas DataFrame.

```sql
%%sql
SELECT
    schema_name,
    function_name
FROM duckdb_functions()
ORDER BY ALL DESC
LIMIT 5;
```

To store the query results in a Python variable, use `<<` as an assignment operator.
This can be used with both the `%sql` and `%%sql` Jupyter magics.

```sql
%sql res << SELECT 'Off and flying!' AS a_duckdb_column;
```

If the `%config SqlMagic.autopandas = True` option is set, the variable is a Pandas dataframe, otherwise, it is a `ResultSet` that can be converted to Pandas with the `DataFrame()` function.

#### Querying Pandas Dataframes {#docs:current:guides:python:jupyter::querying-pandas-dataframes}

DuckDB is able to find and query any dataframe stored as a variable in the Jupyter notebook.

```python
input_df = pd.DataFrame.from_dict({"i": [1, 2, 3],
                                   "j": ["one", "two", "three"]})
```

The dataframe being queried can be specified just like any other table in the `FROM` clause.

```sql
%sql output_df << SELECT sum(i) AS total_i FROM input_df;
```
> **Warning.** When using the SQLAlchemy connection, make sure to run `%sql SET python_scan_all_frames=true`, to make Pandas dataframes queryable.

#### Visualizing DuckDB Data {#docs:current:guides:python:jupyter::visualizing-duckdb-data}

The most common way to plot datasets in Python is to load them using Pandas and then use matplotlib or seaborn for plotting.
This approach requires loading all data into memory which is highly inefficient.
The plotting module in JupySQL runs computations in the SQL engine.
This delegates memory management to the engine and ensures that intermediate computations do not keep eating up memory, efficiently plotting massive datasets.

##### Boxplot & Histogram {#docs:current:guides:python:jupyter::boxplot--histogram}

To create a boxplot, call `%sqlplot boxplot`, passing the name of the table and the column to plot.
In this case, the name of the table is the path of the locally stored Parquet file.

```python
from urllib.request import urlretrieve

_ = urlretrieve(
    "https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet",
    "yellow_tripdata_2021-01.parquet",
)

%sqlplot boxplot --table yellow_tripdata_2021-01.parquet --column trip_distance
```

![](../images/trip-distance-boxplot.png)


##### Install and Load DuckDB httpfs Extension {#docs:current:guides:python:jupyter::install-and-load-duckdb-httpfs-extension}

DuckDB's [httpfs extension](#docs:current:core_extensions:httpfs:overview) allows Parquet and CSV files to be queried remotely over http.
These examples query a Parquet file that contains historical taxi data from NYC.
Using the Parquet format allows DuckDB to only pull the rows and columns into memory that are needed rather than downloading the entire file.
DuckDB can be used to process local [Parquet files](#docs:current:data:parquet:overview) as well, which may be desirable if querying the entire Parquet file, or running multiple queries that require large subsets of the file.

```sql
%%sql
INSTALL httpfs;
LOAD httpfs;
```

Now, create a query that filters by the 90th percentile.
Note the use of the `--save`, and `--no-execute` functions.
This tells JupySQL to store the query, but skips execution. It will be referenced in the next plotting call.

```sql
%%sql --save short_trips --no-execute
SELECT *
FROM 'https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet'
WHERE trip_distance < 6.3
```

To create a histogram, call `%sqlplot histogram` and pass the name of the table, the column to plot, and the number of bins.
This uses `--with short-trips` so JupySQL uses the query defined previously and therefore only plots a subset of the data.

```python
%sqlplot histogram --table short_trips --column trip_distance --bins 10 --with short_trips
```

![](../images/trip-distance-histogram.png)


#### Summary {#docs:current:guides:python:jupyter::summary}

You now have the ability to alternate between SQL and Pandas in a simple and highly performant way! You can plot massive datasets directly through the engine (avoiding both the download of the entire file and loading all of it into Pandas in memory). Dataframes can be read as tables in SQL, and SQL results can be output into Dataframes. Happy analyzing!

An alternative to `jupysql` is [`magic_duckdb`](https://github.com/iqmo-org/magic_duckdb).

### marimo Notebooks {#docs:current:guides:python:marimo}

[marimo](https://github.com/marimo-team/marimo) is an open-source reactive
notebook for Python and SQL that's tightly integrated with DuckDB's Python
client, letting you mix and match Python and SQL in a single git-versionable
notebook. Unlike traditional notebooks, when you run a cell or interact with a
UI element, marimo automatically (or lazily) runs affected cells, keeping code
and outputs consistent. Its integration with DuckDB makes it well-suited to
interactively working with data, and its representation as a Python file makes
it simple to run notebooks as scripts.

#### Installation {#docs:current:guides:python:marimo::installation}

To get started, install marimo and DuckDB from your terminal:

```batch
pip install "marimo[sql]" # or uv add "marimo[sql]"
```

Install supporting libraries:

```batch
pip install "polars[pyarrow]" # or uv add "polars[pyarrow]"
```

Run a tutorial:

```batch
marimo tutorial sql
```

#### SQL in marimo {#docs:current:guides:python:marimo::sql-in-marimo}

Create a notebook from your terminal with `marimo edit notebook.py`. Create SQL
cells in one of three ways:

1. Right-click the **+** button and pick **SQL cell**
2. Convert any empty cell to SQL via the cell menu
3. Hit the SQL button at the bottom of your notebook

![](../images/guides/marimo/marimo-sql-button.png)


In marimo, SQL cells give the appearance of writing SQL while being serialized as standard Python code using the `mo.sql()` function, which keeps your notebook as pure Python code without requiring special syntax or magic commands.

```python
df = mo.sql(f"SELECT 'Off and flying!' AS a_duckdb_column")
```

This is because marimo stores notebooks as pure Python, [for many reasons](https://marimo.io/blog/python-not-json), such as git-friendly diffs and running notebooks as Python scripts.

The SQL statement itself is an f-string, letting you interpolate Python values into the query with `{}` (shown later). In particular, this means your SQL queries can depend on the values of UI elements or other Python values, all part of marimo's dataflow graph.

> **Warning.** Heads up!
> If you have user-generated content going into the SQL queries, be sure to sanitize your inputs to prevent SQL injection.

#### Connecting a Custom DuckDB Connection {#docs:current:guides:python:marimo::connecting-a-custom-duckdb-connection}

To connect to a custom DuckDB connection instead of using the default global connection, create a cell and create a DuckDB connection as a Python variable:

```python
import duckdb

# Create a DuckDB connection
conn = duckdb.connect("path/to/my/duckdb.db")
```

marimo automatically discovers the connection and lets you select it in the SQL cell's connection dropdown.

<figure>
    ![](../images/guides/marimo/marimo-custom-connection.png)

    <figcaption>Custom connection</figcaption>
  </figure>


#### Database, Schema, and Table Auto-Discovery {#docs:current:guides:python:marimo::database-schema-and-table-auto-discovery}

marimo introspects connections and displays the database, schemas, tables, and columns in the Data Sources panel. This panel lets you quickly navigate your schemas to pull tables and columns into your SQL queries.

<figure>
    ![](../images/guides/marimo/marimo-datasource-discovery.png)

    <figcaption>Data Sources Panel</figcaption>
  </figure>


#### Reference a Local Dataframe {#docs:current:guides:python:marimo::reference-a-local-dataframe}

Reference a local dataframe in your SQL cell by using the name of the
Python variable that holds the dataframe. If you have a database connection
with a table of the same name, the database table will be used instead.

```python
import polars as pl
df = pl.DataFrame({"column": [1, 2, 3]})
```

```sql
SELECT * FROM df WHERE column > 2
```

#### Reference the Output of a SQL Cell {#docs:current:guides:python:marimo::reference-the-output-of-a-sql-cell}

Defining a non-private (non-underscored) output variable in the SQL cell allows you to reference the resulting dataframe in other Python and SQL cells.

<figure>
    ![](../images/guides/marimo/marimo-sql-result.png)

    <figcaption>Reference the SQL result in Python</figcaption>
  </figure>

#### Reactive SQL Cells {#docs:current:guides:python:marimo::reactive-sql-cells}

marimo allows you to create reactive SQL cells that automatically update when their dependencies change. **Working with expensive queries or large datasets?** You can configure marimo's runtime to be “lazy”. By doing so, dependent cells are only marked as stale, letting the user choose when they should be re-run.

```python
digits = mo.ui.slider(label="Digits", start=100, stop=10000, step=200)
digits
```

```sql
CREATE TABLE random_data AS
    SELECT i AS id, random() AS random_value,
    FROM range({digits.value}) AS t(i);

SELECT * FROM random_data;
```

Interacting with UI elements, like a slider, makes your data more tangible.

![](../images/guides/marimo/marimo-reactive-sql.gif)



#### DuckDB-Powered OLAP Analytics in marimo {#docs:current:guides:python:marimo::duckdb-powered-olap-analytics-in-marimo}

marimo provides several features that work well with DuckDB for analytical workflows:

* Seamless integration between Python and SQL
* Reactive execution that automatically updates dependent cells when queries change
* Interactive UI elements that can be used to parameterize SQL queries
* Ability to export notebooks as standalone applications or Python scripts, or even run entirely in the browser [with WebAssembly](https://docs.marimo.io/guides/wasm/).

#### Next Steps {#docs:current:guides:python:marimo::next-steps}

* Read the [marimo docs](https://docs.marimo.io/).
* Try the SQL tutorial: `marimo tutorial sql`.
* The code for this guide is [available on GitHub](https://github.com/marimo-team/marimo/blob/main/examples/sql/duckdb_example.py). Run it with `marimo edit ⟨github_url⟩`.

### SQL on Pandas {#docs:current:guides:python:sql_on_pandas}

Pandas DataFrames stored in local variables can be queried as if they are regular tables within DuckDB.

```python
import duckdb
import pandas

# Create a Pandas dataframe
my_df = pandas.DataFrame.from_dict({'a': [42]})

# query the Pandas DataFrame "my_df"
# Note: duckdb.sql connects to the default in-memory database connection
results = duckdb.sql("SELECT * FROM my_df").df()
```

The seamless integration of Pandas DataFrames into DuckDB SQL queries is allowed by [replacement scans](#docs:current:clients:c:replacement_scans), which replace instances of accessing the `my_df` table (which does not exist in DuckDB) with a table function that reads the `my_df` dataframe.

### Import from Pandas {#docs:current:guides:python:import_pandas}

[`CREATE TABLE ... AS`]({% link docs/current/sql/statements/create_table.md %}#create-table--as-select-ctas) and [`INSERT INTO`](#docs:current:sql:statements:insert) can be used to create a table from any query.
We can then create tables or insert into existing tables by referring to the [Pandas](https://pandas.pydata.org/) DataFrame in the query.
There is no need to register the DataFrames manually –
DuckDB can find them in the Python process by name thanks to [replacement scans](#docs:current:guides:glossary::replacement-scan).

```python
import duckdb
import pandas

# Create a Pandas dataframe
my_df = pandas.DataFrame.from_dict({'a': [42]})

# create the table "my_table" from the DataFrame "my_df"
# Note: duckdb.sql connects to the default in-memory database connection
duckdb.sql("CREATE TABLE my_table AS SELECT * FROM my_df")

# insert into the table "my_table" from the DataFrame "my_df"
duckdb.sql("INSERT INTO my_table SELECT * FROM my_df")
```

If the order of columns is different or not all columns are present in the DataFrame, use [`INSERT INTO ... BY NAME`](#docs:current:sql:statements:insert::insert-into--by-name):

```python
duckdb.sql("INSERT INTO my_table BY NAME SELECT * FROM my_df")
```

#### See Also {#docs:current:guides:python:import_pandas::see-also}

DuckDB also supports [exporting to Pandas](#docs:current:guides:python:export_pandas).

### Export to Pandas {#docs:current:guides:python:export_pandas}

The result of a query can be converted to a [Pandas](https://pandas.pydata.org/) DataFrame using the `df()` function.

```python
import duckdb

# read the result of an arbitrary SQL query to a Pandas DataFrame
results = duckdb.sql("SELECT 42").df()
results
```

```text
   42
0  42
```

#### See Also {#docs:current:guides:python:export_pandas::see-also}

DuckDB also supports [importing from Pandas](#docs:current:guides:python:import_pandas).

### Import from Numpy {#docs:current:guides:python:import_numpy}

It is possible to query Numpy arrays from DuckDB.
There is no need to register the arrays manually –
DuckDB can find them in the Python process by name thanks to [replacement scans](#docs:current:guides:glossary::replacement-scan).
For example:

```python
import duckdb
import numpy as np

my_arr = np.array([(1, 9.0), (2, 8.0), (3, 7.0)])

duckdb.sql("SELECT * FROM my_arr")
```

```text
┌─────────┬─────────┬─────────┐
│ column0 │ column1 │ column2 │
│ double  │ double  │ double  │
├─────────┼─────────┼─────────┤
│     1.0 │     2.0 │     3.0 │
│     9.0 │     8.0 │     7.0 │
└─────────┴─────────┴─────────┘
```

#### See Also {#docs:current:guides:python:import_numpy::see-also}

DuckDB also supports [exporting to Numpy](#docs:current:guides:python:export_numpy).

### Export to Numpy {#docs:current:guides:python:export_numpy}

The result of a query can be converted to a Numpy array using the `fetchnumpy()` function. For example:

```python
import duckdb
import numpy as np

my_arr = duckdb.sql("SELECT unnest([1, 2, 3]) AS x, 5.0 AS y").fetchnumpy()
my_arr
```

```text
{'x': array([1, 2, 3], dtype=int32), 'y': masked_array(data=[5.0, 5.0, 5.0],
             mask=[False, False, False],
       fill_value=1e+20)}
```

Then, the array can be processed using Numpy functions, e.g.:

```python
np.sum(my_arr['x'])
```

```text
6
```

#### See Also {#docs:current:guides:python:export_numpy::see-also}

DuckDB also supports [importing from Numpy](#docs:current:guides:python:import_numpy).

### SQL on Apache Arrow {#docs:current:guides:python:sql_on_arrow}

DuckDB can query multiple different types of Apache Arrow objects.

#### Apache Arrow Tables {#docs:current:guides:python:sql_on_arrow::apache-arrow-tables}

[Arrow Tables](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html) stored in local variables can be queried as if they are regular tables within DuckDB.

```python
import duckdb
import pyarrow as pa

# connect to an in-memory database
con = duckdb.connect()

my_arrow_table = pa.Table.from_pydict({'i': [1, 2, 3, 4],
                                       'j': ["one", "two", "three", "four"]})

# query the Apache Arrow Table "my_arrow_table" and return as an Arrow Table
results = con.execute("SELECT * FROM my_arrow_table WHERE i = 2").to_arrow_table()
```

#### Apache Arrow Datasets {#docs:current:guides:python:sql_on_arrow::apache-arrow-datasets}

[Arrow Datasets](https://arrow.apache.org/docs/python/dataset.html) stored as variables can also be queried as if they were regular tables.
Datasets are useful to point towards directories of Parquet files to analyze large datasets.
DuckDB will push column selections and row filters down into the dataset scan operation so that only the necessary data is pulled into memory.

```python
import duckdb
import pyarrow as pa
import tempfile
import pathlib
import pyarrow.parquet as pq
import pyarrow.dataset as ds

# connect to an in-memory database
con = duckdb.connect()

my_arrow_table = pa.Table.from_pydict({'i': [1, 2, 3, 4],
                                       'j': ["one", "two", "three", "four"]})

# create example Parquet files and save in a folder
base_path = pathlib.Path(tempfile.gettempdir())
(base_path / "parquet_folder").mkdir(exist_ok = True)
pq.write_to_dataset(my_arrow_table, str(base_path / "parquet_folder"))

# link to Parquet files using an Arrow Dataset
my_arrow_dataset = ds.dataset(str(base_path / 'parquet_folder/'))

# query the Apache Arrow Dataset "my_arrow_dataset" and return as an Arrow Table
results = con.execute("SELECT * FROM my_arrow_dataset WHERE i = 2").to_arrow_table()
```

#### Apache Arrow Scanners {#docs:current:guides:python:sql_on_arrow::apache-arrow-scanners}

[Arrow Scanners](https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Scanner.html) stored as variables can also be queried as if they were regular tables. Scanners read over a dataset and select specific columns or apply row-wise filtering. This is similar to how DuckDB pushes column selections and filters down into an Arrow Dataset, but using Arrow compute operations instead. Arrow can use asynchronous IO to quickly access files.

```python
import duckdb
import pyarrow as pa
import tempfile
import pathlib
import pyarrow.parquet as pq
import pyarrow.dataset as ds
import pyarrow.compute as pc

# connect to an in-memory database
con = duckdb.connect()

my_arrow_table = pa.Table.from_pydict({'i': [1, 2, 3, 4],
                                       'j': ["one", "two", "three", "four"]})

# create example Parquet files and save in a folder
base_path = pathlib.Path(tempfile.gettempdir())
(base_path / "parquet_folder").mkdir(exist_ok = True)
pq.write_to_dataset(my_arrow_table, str(base_path / "parquet_folder"))

# link to Parquet files using an Arrow Dataset
my_arrow_dataset = ds.dataset(str(base_path / 'parquet_folder/'))

# define the filter to be applied while scanning
# equivalent to "WHERE i = 2"
scanner_filter = (pc.field("i") == pc.scalar(2))

arrow_scanner = ds.Scanner.from_dataset(my_arrow_dataset, filter = scanner_filter)

# query the Apache Arrow scanner "arrow_scanner" and return as an Arrow Table
results = con.execute("SELECT * FROM arrow_scanner").to_arrow_table()
```

#### Apache Arrow RecordBatchReaders {#docs:current:guides:python:sql_on_arrow::apache-arrow-recordbatchreaders}

[Arrow RecordBatchReaders](https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatchReader.html) are a reader for Arrow's streaming binary format and can also be queried directly as if they were tables. This streaming format is useful when sending Arrow data for tasks like interprocess communication or communicating between language runtimes.

```python
import duckdb
import pyarrow as pa

# connect to an in-memory database
con = duckdb.connect()

my_recordbatch = pa.RecordBatch.from_pydict({'i': [1, 2, 3, 4],
                                             'j': ["one", "two", "three", "four"]})

my_recordbatchreader = pa.ipc.RecordBatchReader.from_batches(my_recordbatch.schema, [my_recordbatch])

# query the Apache Arrow RecordBatchReader "my_recordbatchreader" and return as an Arrow Table
results = con.execute("SELECT * FROM my_recordbatchreader WHERE i = 2").to_arrow_table()
```

### Import from Apache Arrow {#docs:current:guides:python:import_arrow}

`CREATE TABLE AS` and `INSERT INTO` can be used to create a table from any query. We can then create tables or insert into existing tables by referring to the Apache Arrow object in the query. This example imports from an [Arrow Table](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html), but DuckDB can query different Apache Arrow formats as seen in the [SQL on Arrow guide](#docs:current:guides:python:sql_on_arrow).

```python
import duckdb
import pyarrow as pa

# connect to an in-memory database
my_arrow = pa.Table.from_pydict({'a': [42]})

# create the table "my_table" from the DataFrame "my_arrow"
duckdb.sql("CREATE TABLE my_table AS SELECT * FROM my_arrow")

# insert into the table "my_table" from the DataFrame "my_arrow"
duckdb.sql("INSERT INTO my_table SELECT * FROM my_arrow")
```

### Export to Apache Arrow {#docs:current:guides:python:export_arrow}

All results of a query can be exported to an [Apache Arrow Table](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html) using the `to_arrow_table` function. Alternatively, results can be returned as a [RecordBatchReader](https://arrow.apache.org/docs/python/generated/pyarrow.ipc.RecordBatchStreamReader.html) using the `to_arrow_reader` function and results can be read one batch at a time. In addition, relations built using DuckDB's [Relational API](#docs:current:guides:python:relational_api_pandas) can also be exported.

> **Deprecated.** The `fetch_arrow_table`, `fetch_record_batch`, and `fetch_arrow_reader` functions are deprecated. Use `to_arrow_table` and `to_arrow_reader` instead.

#### Export to an Arrow Table {#docs:current:guides:python:export_arrow::export-to-an-arrow-table}

```python
import duckdb
import pyarrow as pa

my_arrow_table = pa.Table.from_pydict({'i': [1, 2, 3, 4],
                                       'j': ["one", "two", "three", "four"]})

# query the Apache Arrow Table "my_arrow_table" and return as an Arrow Table
results = duckdb.sql("SELECT * FROM my_arrow_table").to_arrow_table()
```

#### Export as a RecordBatchReader {#docs:current:guides:python:export_arrow::export-as-a-recordbatchreader}

```python
import duckdb
import pyarrow as pa

my_arrow_table = pa.Table.from_pydict({'i': [1, 2, 3, 4],
                                       'j': ["one", "two", "three", "four"]})

# query the Apache Arrow Table "my_arrow_table" and return as an Arrow RecordBatchReader
chunk_size = 1_000_000
result = duckdb.sql("SELECT * FROM my_arrow_table").to_arrow_reader(chunk_size)

# Loop through the results. A StopIteration exception is thrown when the RecordBatchReader is empty
while (batch := result.read_next_batch()):
    # Process a single chunk here
    print(batch.to_pandas())
```

#### Export from Relational API {#docs:current:guides:python:export_arrow::export-from-relational-api}

Arrow objects can also be exported from the Relational API. A relation can be converted to an Arrow table using `DuckDBPyRelation.to_arrow_table`, and to an Arrow record batch reader using `DuckDBPyRelation.to_arrow_reader`.

```python
import duckdb

# connect to an in-memory database
con = duckdb.connect()

con.execute('CREATE TABLE integers (i integer)')
con.execute('INSERT INTO integers VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9), (NULL)')

# Create a relation from the table and export the entire relation as Arrow
rel = con.table("integers")
relation_as_arrow = rel.to_arrow_table()

# Calculate a result using that relation and export that result to Arrow
res = rel.aggregate("sum(i)").execute()
arrow_table = res.to_arrow_table()

# You can also create an Arrow record batch reader from a relation
arrow_batch_reader = res.to_arrow_reader()
while (batch := arrow_batch_reader.read_next_batch()):
    # Process a single chunk here
    print(batch.to_pandas())
```

### Relational API on Pandas {#docs:current:guides:python:relational_api_pandas}

DuckDB offers a relational API that can be used to chain together query operations. These are lazily evaluated so that DuckDB can optimize their execution. These operators can act on Pandas DataFrames, DuckDB tables or views (which can point to any underlying storage format that DuckDB can read, such as CSV or Parquet files, etc.). Here we show a simple example of reading from a Pandas DataFrame and returning a DataFrame.

```python
import duckdb
import pandas

# connect to an in-memory database
con = duckdb.connect()

input_df = pandas.DataFrame.from_dict({'i': [1, 2, 3, 4],
                                       'j': ["one", "two", "three", "four"]})

# create a DuckDB relation from a dataframe
rel = con.from_df(input_df)

# chain together relational operators (this is a lazy operation, so the operations are not yet executed)
# equivalent to: SELECT i, j, i*2 AS two_i FROM input_df WHERE i >= 2 ORDER BY i DESC LIMIT 2
transformed_rel = rel.filter('i >= 2').project('i, j, i*2 AS two_i').order('i DESC').limit(2)

# trigger execution by requesting .df() of the relation
# .df() could have been added to the end of the chain above - it was separated for clarity
output_df = transformed_rel.df()
```

Relational operators can also be used to group rows, aggregate, find distinct combinations of values, join, union, and more. They are also able to directly insert results into a DuckDB table or write to a CSV.

Please see [these additional examples](https://github.com/duckdb/duckdb/blob/main/examples/python/duckdb-python.py) and the [available relational methods on the `DuckDBPyRelation` class](#docs:current:clients:python:reference:index::duckdb.DuckDBPyRelation).

### Multiple Python Threads {#docs:current:guides:python:multiple_threads}

This page demonstrates how to simultaneously insert into and read from a DuckDB database across multiple Python threads.
This could be useful in scenarios where new data is flowing in and an analysis should be periodically re-run.
Note that this is all within a single Python process (see the [FAQ](#faq) for details on DuckDB concurrency).
Feel free to follow along in this [Google Colab notebook](https://colab.research.google.com/drive/190NB2m-LIfDcMamCY5lIzaD2OTMnYclB?usp=sharing).

#### Setup {#docs:current:guides:python:multiple_threads::setup}

First, import DuckDB and several modules from the Python standard library.
Note: if using Pandas, add `import pandas` at the top of the script as well (as it must be imported prior to the multi-threading).
Then connect to a file-backed DuckDB database and create an example table to store inserted data.
This table will track the name of the thread that completed the insert and automatically insert the timestamp when that insert occurred using the [`DEFAULT` expression](#docs:current:sql:statements:create_table::syntax).

```python
import duckdb
from threading import Thread, current_thread
import random

duckdb_con = duckdb.connect('my_persistent_db.duckdb')
# Use connect without parameters for an in-memory database
# duckdb_con = duckdb.connect()
duckdb_con.execute("""
    CREATE OR REPLACE TABLE my_inserts (
        thread_name VARCHAR,
        insert_time TIMESTAMP DEFAULT current_timestamp
    )
""")
```

#### Reader and Writer Functions {#docs:current:guides:python:multiple_threads::reader-and-writer-functions}

Next, define functions to be executed by the writer and reader threads.
Each thread must use the `.cursor()` method to create a thread-local connection to the same DuckDB file based on the original connection.
This approach also works with in-memory DuckDB databases.

```python
def write_from_thread(duckdb_con):
    # Create a DuckDB connection specifically for this thread
    local_con = duckdb_con.cursor()
    # Insert a row with the name of the thread. insert_time is auto-generated.
    thread_name = str(current_thread().name)
    result = local_con.execute("""
        INSERT INTO my_inserts (thread_name)
        VALUES (?)
    """, (thread_name,)).fetchall()

def read_from_thread(duckdb_con):
    # Create a DuckDB connection specifically for this thread
    local_con = duckdb_con.cursor()
    # Query the current row count
    thread_name = str(current_thread().name)
    results = local_con.execute("""
        SELECT
            ? AS thread_name,
            count(*) AS row_counter,
            current_timestamp
        FROM my_inserts
    """, (thread_name,)).fetchall()
    print(results)
```

#### Create Threads {#docs:current:guides:python:multiple_threads::create-threads}

We define how many writers and readers to use, and define a list to track all of the threads that will be created.
Then, create first writer and then reader threads.
Next, shuffle them so that they will be kicked off in a random order to simulate simultaneous writers and readers.
Note that the threads have not yet been executed, only defined.

```python
write_thread_count = 50
read_thread_count = 5
threads = []

# Create multiple writer and reader threads (in the same process)
# Pass in the same connection as an argument
for i in range(write_thread_count):
    threads.append(Thread(target = write_from_thread,
                            args = (duckdb_con,),
                            name = 'write_thread_' + str(i)))

for j in range(read_thread_count):
    threads.append(Thread(target = read_from_thread,
                            args = (duckdb_con,),
                            name = 'read_thread_' + str(j)))

# Shuffle the threads to simulate a mix of readers and writers
random.seed(6) # Set the seed to ensure consistent results when testing
random.shuffle(threads)
```

#### Run Threads and Show Results {#docs:current:guides:python:multiple_threads::run-threads-and-show-results}

Now, kick off all threads to run in parallel, then wait for all of them to finish before printing out the results.
Note that the timestamps of readers and writers are interspersed as expected due to the randomization.

```python
# Kick off all threads in parallel
for thread in threads:
    thread.start()

# Ensure all threads complete before printing final results
for thread in threads:
    thread.join()

print(duckdb_con.execute("""
    SELECT *
    FROM my_inserts
    ORDER BY
        insert_time
""").df())
```

### Integration with Ibis {#docs:current:guides:python:ibis}

[Ibis](https://ibis-project.org) is a Python dataframe library that supports 20+ backends, with DuckDB as the default. Ibis with DuckDB provides a Pythonic interface for SQL with great performance.

#### Installation {#docs:current:guides:python:ibis::installation}

You can pip install Ibis with the DuckDB backend:

```batch
pip install 'ibis-framework[duckdb,examples]' # examples is only required to access the sample data Ibis provides
```

or use conda:

```batch
conda install ibis-framework
```

or use mamba:

```batch
mamba install ibis-framework
```

#### Create a Database File {#docs:current:guides:python:ibis::create-a-database-file}

Ibis can work with several file types, but at its core, it connects to existing databases and interacts with the data there. You can get started with your own DuckDB databases or create a new one with example data.

```python
import ibis

con = ibis.connect("duckdb://penguins.ddb")
con.create_table(
    "penguins", ibis.examples.penguins.fetch().to_pyarrow(), overwrite = True
)
```

```python
# Output:
DatabaseTable: penguins
  species           string
  island            string
  bill_length_mm    float64
  bill_depth_mm     float64
  flipper_length_mm int64
  body_mass_g       int64
  sex               string
  year              int64
```

You can now see the example dataset copied over to the database:

```python
# reconnect to the persisted database (dropping temp tables)
con = ibis.connect("duckdb://penguins.ddb")
con.list_tables()
```

```python
# Output:
['penguins']
```

There's one table, called `penguins`. We can ask Ibis to give us an object that we can interact with.

```python
penguins = con.table("penguins")
penguins
```

```text
# Output:
DatabaseTable: penguins
  species           string
  island            string
  bill_length_mm    float64
  bill_depth_mm     float64
  flipper_length_mm int64
  body_mass_g       int64
  sex               string
  year              int64
```

Ibis is lazily evaluated, so instead of seeing the data, we see the schema of the table. To peek at the data, we can call `head` and then `to_pandas` to get the first few rows of the table as a pandas DataFrame.

```python
penguins.head().to_pandas()
```

```text
  species     island  bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g     sex  year
0  Adelie  Torgersen            39.1           18.7              181.0       3750.0    male  2007
1  Adelie  Torgersen            39.5           17.4              186.0       3800.0  female  2007
2  Adelie  Torgersen            40.3           18.0              195.0       3250.0  female  2007
3  Adelie  Torgersen             NaN            NaN                NaN          NaN    None  2007
4  Adelie  Torgersen            36.7           19.3              193.0       3450.0  female  2007
```

`to_pandas` takes the existing lazy table expression and evaluates it. If we leave it off, you'll see the Ibis representation of the table expression that `to_pandas` will evaluate (when you're ready!).

```python
penguins.head()
```

```python
# Output:
r0 := DatabaseTable: penguins
  species           string
  island            string
  bill_length_mm    float64
  bill_depth_mm     float64
  flipper_length_mm int64
  body_mass_g       int64
  sex               string
  year              int64

Limit[r0, n=5]
```

Ibis returns results as a pandas DataFrame using `to_pandas`, but isn't using pandas to perform any of the computation. The query is executed by DuckDB. Only when `to_pandas` is called does Ibis then pull back the results and convert them into a DataFrame.

#### Interactive Mode {#docs:current:guides:python:ibis::interactive-mode}

For the rest of this intro, we'll turn on interactive mode, which partially executes queries to give users a preview of the results. There is a small difference in the way the output is formatted, but otherwise this is the same as calling `to_pandas` on the table expression with a limit of 10 result rows returned.

```python
ibis.options.interactive = True
penguins.head()
```

```text
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ float64        │ float64       │ int64             │ int64       │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie  │ Torgersen │           39.1 │          18.7 │               181 │        3750 │ male   │  2007 │
│ Adelie  │ Torgersen │           39.5 │          17.4 │               186 │        3800 │ female │  2007 │
│ Adelie  │ Torgersen │           40.3 │          18.0 │               195 │        3250 │ female │  2007 │
│ Adelie  │ Torgersen │            nan │           nan │              NULL │        NULL │ NULL   │  2007 │
│ Adelie  │ Torgersen │           36.7 │          19.3 │               193 │        3450 │ female │  2007 │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
```

#### Common Operations {#docs:current:guides:python:ibis::common-operations}

Ibis has a collection of useful table methods to manipulate and query the data in a table.

##### filter {#docs:current:guides:python:ibis::filter}

`filter` allows you to select rows based on a condition or set of conditions.

We can filter so we only have penguins of the species Gentoo:

```python
penguins.filter(penguins.species == "Gentoo")
```

```text
┏━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string  │ string │ float64        │ float64       │ int64             │ int64       │ string │ int64 │
├─────────┼────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Gentoo  │ Biscoe │           46.1 │          13.2 │               211 │        4500 │ female │  2007 │
│ Gentoo  │ Biscoe │           50.0 │          16.3 │               230 │        5700 │ male   │  2007 │
│ Gentoo  │ Biscoe │           48.7 │          14.1 │               210 │        4450 │ female │  2007 │
│ Gentoo  │ Biscoe │           50.0 │          15.2 │               218 │        5700 │ male   │  2007 │
│ Gentoo  │ Biscoe │           47.6 │          14.5 │               215 │        5400 │ male   │  2007 │
│ Gentoo  │ Biscoe │           46.5 │          13.5 │               210 │        4550 │ female │  2007 │
│ Gentoo  │ Biscoe │           45.4 │          14.6 │               211 │        4800 │ female │  2007 │
│ Gentoo  │ Biscoe │           46.7 │          15.3 │               219 │        5200 │ male   │  2007 │
│ Gentoo  │ Biscoe │           43.3 │          13.4 │               209 │        4400 │ female │  2007 │
│ Gentoo  │ Biscoe │           46.8 │          15.4 │               215 │        5150 │ male   │  2007 │
│ …       │ …      │              … │             … │                 … │           … │ …      │     … │
└─────────┴────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
```

Or filter for Gentoo penguins that have a body mass larger than 6 kg.

```python
penguins.filter((penguins.species == "Gentoo") & (penguins.body_mass_g > 6000))
```

```text
┏━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string  │ string │ float64        │ float64       │ int64             │ int64       │ string │ int64 │
├─────────┼────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Gentoo  │ Biscoe │           49.2 │          15.2 │               221 │        6300 │ male   │  2007 │
│ Gentoo  │ Biscoe │           59.6 │          17.0 │               230 │        6050 │ male   │  2007 │
└─────────┴────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
```

You can use any Boolean comparison in a filter (although if you try to do something like use `<` on a string, Ibis will yell at you).

##### select {#docs:current:guides:python:ibis::select}

Your data analysis might not require all the columns present in a given table. `select` lets you pick out only those columns that you want to work with.

To select a column you can use the name of the column as a string:

```python
penguins.select("species", "island", "year").limit(3)
```

```text
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ int64 │
├─────────┼───────────┼───────┤
│ Adelie  │ Torgersen │  2007 │
│ Adelie  │ Torgersen │  2007 │
│ Adelie  │ Torgersen │  2007 │
│ …       │ …         │     … │
└─────────┴───────────┴───────┘
```

Or you can use column objects directly (this can be convenient when paired with tab-completion):

```python
penguins.select(penguins.species, penguins.island, penguins.year).limit(3)
```

```text
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ int64 │
├─────────┼───────────┼───────┤
│ Adelie  │ Torgersen │  2007 │
│ Adelie  │ Torgersen │  2007 │
│ Adelie  │ Torgersen │  2007 │
│ …       │ …         │     … │
└─────────┴───────────┴───────┘
```

Or you can mix-and-match:

```python
penguins.select("species", "island", penguins.year).limit(3)
```

```text
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ int64 │
├─────────┼───────────┼───────┤
│ Adelie  │ Torgersen │  2007 │
│ Adelie  │ Torgersen │  2007 │
│ Adelie  │ Torgersen │  2007 │
│ …       │ …         │     … │
└─────────┴───────────┴───────┘
```

##### mutate {#docs:current:guides:python:ibis::mutate}

`mutate` lets you add new columns to your table, derived from the values of existing columns.

```python
penguins.mutate(bill_length_cm=penguins.bill_length_mm / 10)
```

```text
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃ bill_length_cm ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string  │ string    │ float64        │ float64       │ int64             │ int64       │ string │ int64 │ float64        │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┼────────────────┤
│ Adelie  │ Torgersen │           39.1 │          18.7 │               181 │        3750 │ male   │  2007 │           3.91 │
│ Adelie  │ Torgersen │           39.5 │          17.4 │               186 │        3800 │ female │  2007 │           3.95 │
│ Adelie  │ Torgersen │           40.3 │          18.0 │               195 │        3250 │ female │  2007 │           4.03 │
│ Adelie  │ Torgersen │            nan │           nan │              NULL │        NULL │ NULL   │  2007 │            nan │
│ Adelie  │ Torgersen │           36.7 │          19.3 │               193 │        3450 │ female │  2007 │           3.67 │
│ Adelie  │ Torgersen │           39.3 │          20.6 │               190 │        3650 │ male   │  2007 │           3.93 │
│ Adelie  │ Torgersen │           38.9 │          17.8 │               181 │        3625 │ female │  2007 │           3.89 │
│ Adelie  │ Torgersen │           39.2 │          19.6 │               195 │        4675 │ male   │  2007 │           3.92 │
│ Adelie  │ Torgersen │           34.1 │          18.1 │               193 │        3475 │ NULL   │  2007 │           3.41 │
│ Adelie  │ Torgersen │           42.0 │          20.2 │               190 │        4250 │ NULL   │  2007 │           4.20 │
│ …       │ …         │              … │             … │                 … │           … │ …      │     … │              … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┴────────────────┘
```

Notice that the table is a little too wide to display all the columns now (depending on your screen-size). `bill_length` is now present in millimeters _and_ centimeters. Use a `select` to trim down the number of columns we're looking at.

```python
penguins.mutate(bill_length_cm=penguins.bill_length_mm / 10).select(
    "species",
    "island",
    "bill_depth_mm",
    "flipper_length_mm",
    "body_mass_g",
    "sex",
    "year",
    "bill_length_cm",
)
```

```text
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ species ┃ island    ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃ bill_length_cm ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string  │ string    │ float64       │ int64             │ int64       │ string │ int64 │ float64        │
├─────────┼───────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┼────────────────┤
│ Adelie  │ Torgersen │          18.7 │               181 │        3750 │ male   │  2007 │           3.91 │
│ Adelie  │ Torgersen │          17.4 │               186 │        3800 │ female │  2007 │           3.95 │
│ Adelie  │ Torgersen │          18.0 │               195 │        3250 │ female │  2007 │           4.03 │
│ Adelie  │ Torgersen │           nan │              NULL │        NULL │ NULL   │  2007 │            nan │
│ Adelie  │ Torgersen │          19.3 │               193 │        3450 │ female │  2007 │           3.67 │
│ Adelie  │ Torgersen │          20.6 │               190 │        3650 │ male   │  2007 │           3.93 │
│ Adelie  │ Torgersen │          17.8 │               181 │        3625 │ female │  2007 │           3.89 │
│ Adelie  │ Torgersen │          19.6 │               195 │        4675 │ male   │  2007 │           3.92 │
│ Adelie  │ Torgersen │          18.1 │               193 │        3475 │ NULL   │  2007 │           3.41 │
│ Adelie  │ Torgersen │          20.2 │               190 │        4250 │ NULL   │  2007 │           4.20 │
│ …       │ …         │             … │                 … │           … │ …      │     … │              … │
└─────────┴───────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┴────────────────┘
```

##### selectors {#docs:current:guides:python:ibis::selectors}

Typing out _all_ of the column names _except_ one is a little annoying. Instead of doing that again, we can use a `selector` to quickly select or deselect groups of columns.

```python
import ibis.selectors as s

penguins.mutate(bill_length_cm=penguins.bill_length_mm / 10).select(
    ~s.matches("bill_length_mm")
    # match every column except `bill_length_mm`
)
```

```text
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ species ┃ island    ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃ bill_length_cm ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string  │ string    │ float64       │ int64             │ int64       │ string │ int64 │ float64        │
├─────────┼───────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┼────────────────┤
│ Adelie  │ Torgersen │          18.7 │               181 │        3750 │ male   │  2007 │           3.91 │
│ Adelie  │ Torgersen │          17.4 │               186 │        3800 │ female │  2007 │           3.95 │
│ Adelie  │ Torgersen │          18.0 │               195 │        3250 │ female │  2007 │           4.03 │
│ Adelie  │ Torgersen │           nan │              NULL │        NULL │ NULL   │  2007 │            nan │
│ Adelie  │ Torgersen │          19.3 │               193 │        3450 │ female │  2007 │           3.67 │
│ Adelie  │ Torgersen │          20.6 │               190 │        3650 │ male   │  2007 │           3.93 │
│ Adelie  │ Torgersen │          17.8 │               181 │        3625 │ female │  2007 │           3.89 │
│ Adelie  │ Torgersen │          19.6 │               195 │        4675 │ male   │  2007 │           3.92 │
│ Adelie  │ Torgersen │          18.1 │               193 │        3475 │ NULL   │  2007 │           3.41 │
│ Adelie  │ Torgersen │          20.2 │               190 │        4250 │ NULL   │  2007 │           4.20 │
│ …       │ …         │             … │                 … │           … │ …      │     … │              … │
└─────────┴───────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┴────────────────┘
```

You can also use a `selector` alongside a column name.

```python
penguins.select("island", s.numeric())
```

```text
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━┓
┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ year  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━┩
│ string    │ float64        │ float64       │ int64             │ int64       │ int64 │
├───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼───────┤
│ Torgersen │           39.1 │          18.7 │               181 │        3750 │  2007 │
│ Torgersen │           39.5 │          17.4 │               186 │        3800 │  2007 │
│ Torgersen │           40.3 │          18.0 │               195 │        3250 │  2007 │
│ Torgersen │            nan │           nan │              NULL │        NULL │  2007 │
│ Torgersen │           36.7 │          19.3 │               193 │        3450 │  2007 │
│ Torgersen │           39.3 │          20.6 │               190 │        3650 │  2007 │
│ Torgersen │           38.9 │          17.8 │               181 │        3625 │  2007 │
│ Torgersen │           39.2 │          19.6 │               195 │        4675 │  2007 │
│ Torgersen │           34.1 │          18.1 │               193 │        3475 │  2007 │
│ Torgersen │           42.0 │          20.2 │               190 │        4250 │  2007 │
│ …         │              … │             … │                 … │           … │     … │
└───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴───────┘
```

You can read more about [`selectors`](https://ibis-project.org/reference/selectors/) in the docs!

##### `order_by` {#docs:current:guides:python:ibis::order_by}

`order_by` arranges the values of one or more columns in ascending or descending order.

By default, `ibis` sorts in ascending order:

```python
penguins.order_by(penguins.flipper_length_mm).select(
    "species", "island", "flipper_length_mm"
)
```

```text
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species   ┃ island    ┃ flipper_length_mm ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ string    │ string    │ int64             │
├───────────┼───────────┼───────────────────┤
│ Adelie    │ Biscoe    │               172 │
│ Adelie    │ Biscoe    │               174 │
│ Adelie    │ Torgersen │               176 │
│ Adelie    │ Dream     │               178 │
│ Adelie    │ Dream     │               178 │
│ Adelie    │ Dream     │               178 │
│ Chinstrap │ Dream     │               178 │
│ Adelie    │ Dream     │               179 │
│ Adelie    │ Torgersen │               180 │
│ Adelie    │ Biscoe    │               180 │
│ …         │ …         │                 … │
└───────────┴───────────┴───────────────────┘
```

You can sort in descending order using the `desc` method of a column:

```python
penguins.order_by(penguins.flipper_length_mm.desc()).select(
    "species", "island", "flipper_length_mm"
)
```

```text
┏━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species ┃ island ┃ flipper_length_mm ┃
┡━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ string  │ string │ int64             │
├─────────┼────────┼───────────────────┤
│ Gentoo  │ Biscoe │               231 │
│ Gentoo  │ Biscoe │               230 │
│ Gentoo  │ Biscoe │               230 │
│ Gentoo  │ Biscoe │               230 │
│ Gentoo  │ Biscoe │               230 │
│ Gentoo  │ Biscoe │               230 │
│ Gentoo  │ Biscoe │               230 │
│ Gentoo  │ Biscoe │               230 │
│ Gentoo  │ Biscoe │               229 │
│ Gentoo  │ Biscoe │               229 │
│ …       │ …      │                 … │
└─────────┴────────┴───────────────────┘
```

Or you can use `ibis.desc`

```python
penguins.order_by(ibis.desc("flipper_length_mm")).select(
    "species", "island", "flipper_length_mm"
)
```

```text
┏━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species ┃ island ┃ flipper_length_mm ┃
┡━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ string  │ string │ int64             │
├─────────┼────────┼───────────────────┤
│ Gentoo  │ Biscoe │               231 │
│ Gentoo  │ Biscoe │               230 │
│ Gentoo  │ Biscoe │               230 │
│ Gentoo  │ Biscoe │               230 │
│ Gentoo  │ Biscoe │               230 │
│ Gentoo  │ Biscoe │               230 │
│ Gentoo  │ Biscoe │               230 │
│ Gentoo  │ Biscoe │               230 │
│ Gentoo  │ Biscoe │               229 │
│ Gentoo  │ Biscoe │               229 │
│ …       │ …      │                 … │
└─────────┴────────┴───────────────────┘
```

##### aggregate {#docs:current:guides:python:ibis::aggregate}

Ibis has several aggregate functions available to help summarize data.

`mean`, `max`, `min`, `count`, `sum` (the list goes on).

To aggregate an entire column, call the corresponding method on that column.

```python
penguins.flipper_length_mm.mean()
```

```python
# Output:
200.91520467836258
```

You can compute multiple aggregates at once using the `aggregate` method:

```python
penguins.aggregate([penguins.flipper_length_mm.mean(), penguins.bill_depth_mm.max()])
```

```text
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
┃ Mean(flipper_length_mm) ┃ Max(bill_depth_mm) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
│ float64                 │ float64            │
├─────────────────────────┼────────────────────┤
│              200.915205 │               21.5 │
└─────────────────────────┴────────────────────┘
```

But `aggregate` _really_ shines when it's paired with `group_by`.

##### `group_by` {#docs:current:guides:python:ibis::group_by}

`group_by` creates groupings of rows that have the same value for one or more columns.

But it doesn't do much on its own -- you can pair it with `aggregate` to get a result.

```python
penguins.group_by("species").aggregate()
```

```text
┏━━━━━━━━━━━┓
┃ species   ┃
┡━━━━━━━━━━━┩
│ string    │
├───────────┤
│ Adelie    │
│ Gentoo    │
│ Chinstrap │
└───────────┘
```

We grouped by the `species` column and handed it an “empty” aggregate command. The result of that is a column of the unique values in the `species` column.

If we add a second column to the `group_by`, we'll get each unique pairing of the values in those columns.

```python
penguins.group_by(["species", "island"]).aggregate()
```

```text
┏━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ species   ┃ island    ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━┩
│ string    │ string    │
├───────────┼───────────┤
│ Adelie    │ Torgersen │
│ Adelie    │ Biscoe    │
│ Adelie    │ Dream     │
│ Gentoo    │ Biscoe    │
│ Chinstrap │ Dream     │
└───────────┴───────────┘
```

Now, if we add an aggregation function to that, we start to really open things up.

```python
penguins.group_by(["species", "island"]).aggregate(penguins.bill_length_mm.mean())
```

```text
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓
┃ species   ┃ island    ┃ Mean(bill_length_mm) ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩
│ string    │ string    │ float64              │
├───────────┼───────────┼──────────────────────┤
│ Adelie    │ Torgersen │            38.950980 │
│ Adelie    │ Biscoe    │            38.975000 │
│ Adelie    │ Dream     │            38.501786 │
│ Gentoo    │ Biscoe    │            47.504878 │
│ Chinstrap │ Dream     │            48.833824 │
└───────────┴───────────┴──────────────────────┘
```

By adding that `mean` to the `aggregate`, we now have a concise way to calculate aggregates over each of the distinct groups in the `group_by`. And we can calculate as many aggregates as we need.

```python
penguins.group_by(["species", "island"]).aggregate(
    [penguins.bill_length_mm.mean(), penguins.flipper_length_mm.max()]
)
```

```text
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ species   ┃ island    ┃ Mean(bill_length_mm) ┃ Max(flipper_length_mm) ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string    │ string    │ float64              │ int64                  │
├───────────┼───────────┼──────────────────────┼────────────────────────┤
│ Adelie    │ Torgersen │            38.950980 │                    210 │
│ Adelie    │ Biscoe    │            38.975000 │                    203 │
│ Adelie    │ Dream     │            38.501786 │                    208 │
│ Gentoo    │ Biscoe    │            47.504878 │                    231 │
│ Chinstrap │ Dream     │            48.833824 │                    212 │
└───────────┴───────────┴──────────────────────┴────────────────────────┘
```

If we need more specific groups, we can add to the `group_by`.

```python
penguins.group_by(["species", "island", "sex"]).aggregate(
    [penguins.bill_length_mm.mean(), penguins.flipper_length_mm.max()]
)
```

```text
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ species ┃ island    ┃ sex    ┃ Mean(bill_length_mm) ┃ Max(flipper_length_mm) ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string  │ string    │ string │ float64              │ int64                  │
├─────────┼───────────┼────────┼──────────────────────┼────────────────────────┤
│ Adelie  │ Torgersen │ male   │            40.586957 │                    210 │
│ Adelie  │ Torgersen │ female │            37.554167 │                    196 │
│ Adelie  │ Torgersen │ NULL   │            37.925000 │                    193 │
│ Adelie  │ Biscoe    │ female │            37.359091 │                    199 │
│ Adelie  │ Biscoe    │ male   │            40.590909 │                    203 │
│ Adelie  │ Dream     │ female │            36.911111 │                    202 │
│ Adelie  │ Dream     │ male   │            40.071429 │                    208 │
│ Adelie  │ Dream     │ NULL   │            37.500000 │                    179 │
│ Gentoo  │ Biscoe    │ female │            45.563793 │                    222 │
│ Gentoo  │ Biscoe    │ male   │            49.473770 │                    231 │
│ …       │ …         │ …      │                    … │                      … │
└─────────┴───────────┴────────┴──────────────────────┴────────────────────────┘
```

#### Chaining It All Together {#docs:current:guides:python:ibis::chaining-it-all-together}

We've already chained some Ibis calls together. We used `mutate` to create a new column and then `select` to only view a subset of the new table. We were just chaining `group_by` with `aggregate`.

There's nothing stopping us from putting all of these concepts together to ask questions of the data.

How about:

* What was the largest female penguin (by body mass) on each island in the year 2008?

```python
penguins.filter((penguins.sex == "female") & (penguins.year == 2008)).group_by(
    ["island"]
).aggregate(penguins.body_mass_g.max())
```

```text
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃ island    ┃ Max(body_mass_g) ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ string    │ int64            │
├───────────┼──────────────────┤
│ Biscoe    │             5200 │
│ Torgersen │             3800 │
│ Dream     │             3900 │
└───────────┴──────────────────┘
```

* What about the largest male penguin (by body mass) on each island for each year of data collection?

```python
penguins.filter(penguins.sex == "male").group_by(["island", "year"]).aggregate(
    penguins.body_mass_g.max().name("max_body_mass")
).order_by(["year", "max_body_mass"])
```

```text
┏━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ island    ┃ year  ┃ max_body_mass ┃
┡━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━┩
│ string    │ int64 │ int64         │
├───────────┼───────┼───────────────┤
│ Dream     │  2007 │          4650 │
│ Torgersen │  2007 │          4675 │
│ Biscoe    │  2007 │          6300 │
│ Torgersen │  2008 │          4700 │
│ Dream     │  2008 │          4800 │
│ Biscoe    │  2008 │          6000 │
│ Torgersen │  2009 │          4300 │
│ Dream     │  2009 │          4475 │
│ Biscoe    │  2009 │          6000 │
└───────────┴───────┴───────────────┘
```

#### Learn More {#docs:current:guides:python:ibis::learn-more}

That's all for this quick-start guide. If you want to learn more, check out the [Ibis documentation](https://ibis-project.org).

### Integration with Polars {#docs:current:guides:python:polars}

[Polars](https://github.com/pola-rs/polars) is a DataFrames library built in Rust with bindings for Python and Node.js. It uses [Apache Arrow's columnar format](https://arrow.apache.org/docs/format/Columnar.html) as its memory model. DuckDB can read Polars DataFrames and convert query results to Polars DataFrames. It does this internally using the efficient Apache Arrow integration. Note that the `pyarrow` library must be installed for the integration to work.

#### Installation {#docs:current:guides:python:polars::installation}

```batch
pip install -U duckdb 'polars[pyarrow]'
```

#### Polars to DuckDB {#docs:current:guides:python:polars::polars-to-duckdb}

DuckDB can natively query Polars DataFrames by referring to the name of Polars DataFrames as they exist in the current scope.

```python
import duckdb
import polars as pl

df = pl.DataFrame(
    {
        "A": [1, 2, 3, 4, 5],
        "fruits": ["banana", "banana", "apple", "apple", "banana"],
        "B": [5, 4, 3, 2, 1],
        "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
    }
)
duckdb.sql("SELECT * FROM df").show()
```

#### DuckDB to Polars {#docs:current:guides:python:polars::duckdb-to-polars}

DuckDB can output results as Polars DataFrames using the `.pl()` result-conversion method.

```python
df = duckdb.sql("""
    SELECT 1 AS id, 'banana' AS fruit
    UNION ALL
    SELECT 2, 'apple'
    UNION ALL
    SELECT 3, 'mango'"""
).pl()
print(df)
```

```text
shape: (3, 2)
┌─────┬────────┐
│ id  ┆ fruit  │
│ --- ┆ ---    │
│ i32 ┆ str    │
╞═════╪════════╡
│ 1   ┆ banana │
│ 2   ┆ apple  │
│ 3   ┆ mango  │
└─────┴────────┘
```

The optional `lazy` parameter allows returning Polars LazyFrames.

```python
df = duckdb.sql("""
    SELECT 1 AS id, 'banana' AS fruit
    UNION ALL
    SELECT 2, 'apple'
    UNION ALL
    SELECT 3, 'mango'"""
).pl(lazy=True)
print(df)
```

```text
naive plan: (run LazyFrame.explain(optimized=True) to see the optimized plan)

PYTHON SCAN []
PROJECT */2 COLUMNS
```

To learn more about Polars, feel free to explore their [Python API Reference](https://pola-rs.github.io/polars/py-polars/html/reference/index.html).

### Using fsspec Filesystems {#docs:current:guides:python:filesystems}

DuckDB support for [`fsspec`](https://filesystem-spec.readthedocs.io) filesystems allows querying data in filesystems that DuckDB's [`httpfs` extension](#docs:current:core_extensions:httpfs:overview) does not support. `fsspec` has a large number of [inbuilt filesystems](https://filesystem-spec.readthedocs.io/en/latest/api.html#built-in-implementations), and there are also many [external implementations](https://filesystem-spec.readthedocs.io/en/latest/api.html#other-known-implementations). This capability is only available in DuckDB's Python client because `fsspec` is a Python library, while the `httpfs` extension is available in many DuckDB clients.

#### Example {#docs:current:guides:python:filesystems::example}

The following is an example of using `fsspec` to query a file in Google Cloud Storage (instead of using their S3-compatible API).

Firstly, you must install `duckdb` and `fsspec`, and a filesystem interface of your choice.

```batch
pip install duckdb fsspec gcsfs
```

Then, you can register whichever filesystem you'd like to query:

```python
import duckdb
from fsspec import filesystem

# this line will throw an exception if the appropriate filesystem interface is not installed
duckdb.register_filesystem(filesystem('gcs'))

duckdb.sql("SELECT * FROM read_csv('gcs:///bucket/file.csv')")
```

> These filesystems are not implemented in C++, hence, their performance may not be comparable to the ones provided by the `httpfs` extension.
> It is also worth noting that as they are third-party libraries, they may contain bugs that are beyond our control.

## SQL Editors {#guides:sql_editors}

### DBeaver SQL IDE {#docs:current:guides:sql_editors:dbeaver}

[DBeaver](https://dbeaver.io/) is a powerful and popular desktop SQL editor and integrated development environment (IDE). It has both an open source and enterprise version. DBeaver is useful for visually inspecting the available tables in DuckDB and for quickly building complex queries. DuckDB's [JDBC connector](https://search.maven.org/artifact/org.duckdb/duckdb_jdbc) allows DBeaver to query DuckDB files, and by extension, any other files that DuckDB can access (like [Parquet files](#docs:current:guides:file_formats:query_parquet)).

#### Installing DBeaver {#docs:current:guides:sql_editors:dbeaver::installing-dbeaver}

1. Install DBeaver using the download links and instructions found at their [download page](https://dbeaver.io/download/).

2. Open DBeaver and create a new connection. Either click on the “New Database Connection” button or go to Database > New Database Connection in the menu bar.

    ![](../images/guides/DBeaver_new_database_connection.png)

    ![](../images/guides/DBeaver_new_database_connection_menu.png)


3. Search for DuckDB, select it, and click Next.

    ![](../images/guides/DBeaver_select_database_driver.png)


4. Enter the path or browse to the DuckDB database file you wish to query. To use an in-memory DuckDB (useful primarily if just interested in querying Parquet files, or for testing) enter `:memory:` as the path.

    ![](../images/guides/DBeaver_connection_settings_path.png)


5. Click “Test Connection”. This will then prompt you to install the DuckDB JDBC driver. If you are not prompted, see alternative driver installation instructions below.

    ![](../images/guides/DBeaver_connection_settings_test_connection.png)


6. Click “Download” to download DuckDB's JDBC driver from Maven. Once download is complete, click “OK”, then click “Finish”.
* Note: If you are in a corporate environment or behind a firewall, before clicking download, click the “Download Configuration” link to configure your proxy settings.

    ![](../images/guides/DBeaver_download_driver_files.png)


7. You should now see a database connection to your DuckDB database in the left hand “Database Navigator” pane. Expand it to see the tables and views in your database. Right click on that connection and create a new SQL script.

    ![](../images/guides/DBeaver_new_sql_script.png)


8. Write some SQL and click the “Execute” button.

    ![](../images/guides/DBeaver_execute_query.png)


9. Now you're ready to fly with DuckDB and DBeaver!

    ![](../images/guides/DBeaver_query_results.png)


#### Alternative Driver Installation {#docs:current:guides:sql_editors:dbeaver::alternative-driver-installation}

1. If not prompted to install the DuckDB driver when testing your connection, return to the “Connect to a database” dialog and click “Edit Driver Settings”.

    ![](../images/guides/DBeaver_edit_driver_settings.png)


2. Alternatively, you can access the driver settings menu by returning to the main DBeaver window and clicking Database > Driver Manager in the menu bar. Then select DuckDB, then click Edit.

    ![](../images/guides/DBeaver_driver_manager.png)

    ![](../images/guides/DBeaver_driver_manager_edit.png)


3. Go to the “Libraries” tab, then click on the DuckDB driver and click “Download/Update”. If you do not see the DuckDB driver, first click on “Reset to Defaults”.

    ![](../images/guides/DBeaver_edit_driver_duckdb.png)


4. Click “Download” to download DuckDB's JDBC driver from Maven. Once download is complete, click “OK”, then return to the main DBeaver window and continue with step 7 above.

    * Note: If you are in a corporate environment or behind a firewall, before clicking download, click the “Download Configuration” link to configure your proxy settings.

    ![](../images/guides/DBeaver_download_driver_files_from_driver_settings.png)

## SQL Features {#guides:sql_features}

### AsOf Join {#docs:current:guides:sql_features:asof_join}

#### What is an AsOf Join? {#docs:current:guides:sql_features:asof_join::what-is-an-asof-join}

Time series data is not always perfectly aligned.
Clocks may be slightly off, or there may be a delay between cause and effect.
This can make connecting two sets of ordered data challenging.
AsOf joins are a tool for solving this and other similar problems.

One of the problems that AsOf joins are used to solve is
finding the value of a varying property at a specific point in time.
This use case is so common that it is where the name came from:

_Give me the value of the property **as of this time**_.

More generally, however, AsOf joins embody some common temporal analytic semantics,
which can be cumbersome and slow to implement in standard SQL.

#### Portfolio Example Dataset {#docs:current:guides:sql_features:asof_join::portfolio-example-dataset}

Let's start with a concrete example.
Suppose we have a table of stock [`prices`](https://duckdb.org/data/prices.csv) with timestamps:

| ticker | when | price |
| :----- | :--- | ----: |
| APPL   | 2001-01-01 00:00:00 | 1 |
| APPL   | 2001-01-01 00:01:00 | 2 |
| APPL   | 2001-01-01 00:02:00 | 3 |
| MSFT   | 2001-01-01 00:00:00 | 1 |
| MSFT   | 2001-01-01 00:01:00 | 2 |
| MSFT   | 2001-01-01 00:02:00 | 3 |
| GOOG   | 2001-01-01 00:00:00 | 1 |
| GOOG   | 2001-01-01 00:01:00 | 2 |
| GOOG   | 2001-01-01 00:02:00 | 3 |

We have another table containing portfolio [`holdings`](https://duckdb.org/data/holdings.csv) at various points in time:

| ticker | when | shares |
| :----- | :--- | -----: |
| APPL   | 2000-12-31 23:59:30 | 5.16   |
| APPL   | 2001-01-01 00:00:30 | 2.94   |
| APPL   | 2001-01-01 00:01:30 | 24.13  |
| GOOG   | 2000-12-31 23:59:30 | 9.33   |
| GOOG   | 2001-01-01 00:00:30 | 23.45  |
| GOOG   | 2001-01-01 00:01:30 | 10.58  |
| DATA   | 2000-12-31 23:59:30 | 6.65   |
| DATA   | 2001-01-01 00:00:30 | 17.95  |
| DATA   | 2001-01-01 00:01:30 | 18.37  |

To load these tables to DuckDB, run:

```sql
CREATE TABLE prices AS FROM 'https://duckdb.org/data/prices.csv';
CREATE TABLE holdings AS FROM 'https://duckdb.org/data/holdings.csv';
```

#### Inner AsOf Joins {#docs:current:guides:sql_features:asof_join::inner-asof-joins}

We can compute the value of each holding at that point in time by finding
the most recent price before the holding's timestamp by using an AsOf Join:

```sql
SELECT h.ticker, h.when, price * shares AS value
FROM holdings h
ASOF JOIN prices p
       ON h.ticker = p.ticker
      AND h.when >= p.when;
```

This attaches the value of the holding at that time to each row:

| ticker | when | value |
| :----- | :--- | ----: |
| APPL   | 2001-01-01 00:00:30 | 2.94  |
| APPL   | 2001-01-01 00:01:30 | 48.26 |
| GOOG   | 2001-01-01 00:00:30 | 23.45 |
| GOOG   | 2001-01-01 00:01:30 | 21.16 |

It essentially executes a function defined by looking up nearby values in the `prices` table.
Note also that missing `ticker` values do not have a match and don't appear in the output.

#### Outer AsOf Joins {#docs:current:guides:sql_features:asof_join::outer-asof-joins}

Because AsOf produces at most one match from the right hand side,
the left side table will not grow as a result of the join,
but it could shrink if there are missing times on the right.
To handle this situation, you can use an *outer* AsOf Join:

```sql
SELECT h.ticker, h.when, price * shares AS value
FROM holdings h
ASOF LEFT JOIN prices p
            ON h.ticker = p.ticker
           AND h.when >= p.when
ORDER BY ALL;
```

As you might expect, this will produce `NULL` prices and values instead of dropping left side rows
when there is no ticker or the time is before the prices begin.

| ticker | when | value |
| :----- | :--- | ----: |
| APPL   | 2000-12-31 23:59:30 |       |
| APPL   | 2001-01-01 00:00:30 | 2.94  |
| APPL   | 2001-01-01 00:01:30 | 48.26 |
| GOOG   | 2000-12-31 23:59:30 |       |
| GOOG   | 2001-01-01 00:00:30 | 23.45 |
| GOOG   | 2001-01-01 00:01:30 | 21.16 |
| DATA   | 2000-12-31 23:59:30 |       |
| DATA   | 2001-01-01 00:00:30 |       |
| DATA   | 2001-01-01 00:01:30 |       |

#### AsOf Joins with the `USING` Keyword {#docs:current:guides:sql_features:asof_join::asof-joins-with-the-using-keyword}

So far we have been explicit about specifying the conditions for AsOf,
but SQL also has a simplified join condition syntax
for the common case where the column names are the same in both tables.
This syntax uses the `USING` keyword to list the fields that should be compared for equality.
AsOf also supports this syntax, but with two restrictions:

* The last field is the inequality
* The inequality is `>=` (the most common case)

Our first query can then be written as:

```sql
SELECT ticker, h.when, price * shares AS value
FROM holdings h
ASOF JOIN prices p USING (ticker, "when");
```

##### Clarification on Column Selection with `USING` in ASOF Joins {#docs:current:guides:sql_features:asof_join::clarification-on-column-selection-with-using-in-asof-joins}

When you use the `USING` keyword in a join, the columns specified in the `USING` clause are merged in the result set. This means that if you run:

```sql
SELECT *
FROM holdings h
ASOF JOIN prices p USING (ticker, "when");
```

You will get back only the columns `h.ticker, h.when, h.shares, p.price`. The columns `ticker` and `when` will appear only once, with `ticker`
and `when` coming from the left table (holdings).

This behavior is fine for the `ticker` column because the value is the same in both tables. However, for the `when` column, the values might
differ between the two tables due to the `>=` condition used in the AsOf join. The AsOf join is designed to match each row in the left
table (` holdings`) with the nearest preceding row in the right table (` prices`) based on the `when` column.

If you want to retrieve the `when` column from both tables to see both timestamps, you need to list the columns explicitly rather than
relying on `*`, like so:

```sql
SELECT h.ticker, h.when AS holdings_when, p.when AS prices_when, h.shares, p.price
FROM holdings h
ASOF JOIN prices p USING (ticker, "when");
```

This ensures that you get the complete information from both tables, avoiding any potential confusion caused by the default behavior of
the `USING` keyword.

#### See Also {#docs:current:guides:sql_features:asof_join::see-also}

For implementation details, see the [blog post “DuckDB's AsOf joins: Fuzzy Temporal Lookups”](https://duckdb.org/2023/09/15/asof-joins-fuzzy-temporal-lookups).

### Full-Text Search {#docs:current:guides:sql_features:full_text_search}

DuckDB supports full-text search via the [`fts` extension](#docs:current:core_extensions:full_text_search).
A full-text index allows for a query to quickly search for all occurrences of individual words within longer text strings.

#### Example: Shakespeare Corpus {#docs:current:guides:sql_features:full_text_search::example-shakespeare-corpus}

Here's an example of building a full-text index of Shakespeare's plays.

```sql
CREATE TABLE corpus AS
    SELECT * FROM 'https://blobs.duckdb.org/data/shakespeare.parquet';
```

```sql
DESCRIBE corpus;
```



| column_name | column_type | null | key  | default | extra |
|-------------|-------------|------|------|---------|-------|
| line_id     | VARCHAR     | YES  | NULL | NULL    | NULL  |
| play_name   | VARCHAR     | YES  | NULL | NULL    | NULL  |
| line_number | VARCHAR     | YES  | NULL | NULL    | NULL  |
| speaker     | VARCHAR     | YES  | NULL | NULL    | NULL  |
| text_entry  | VARCHAR     | YES  | NULL | NULL    | NULL  |

The text of each line is in `text_entry`, and a unique key for each line is in `line_id`.

#### Creating a Full-Text Search Index {#docs:current:guides:sql_features:full_text_search::creating-a-full-text-search-index}

First, we create the index, specifying the table name, the unique id column, and the column(s) to index. We will just index the single column `text_entry`, which contains the text of the lines in the play.

```sql
PRAGMA create_fts_index('corpus', 'line_id', 'text_entry');
```

The table is now ready to query using the [Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25) ranking function. Rows with no match return a `NULL` score.

What does Shakespeare say about butter?

```sql
SELECT
    fts_main_corpus.match_bm25(line_id, 'butter') AS score,
    line_id, play_name, speaker, text_entry
FROM corpus
WHERE score IS NOT NULL
ORDER BY score DESC;
```

|       score        |   line_id   |        play_name         |   speaker    |                     text_entry                     |
|-------------------:|-------------|--------------------------|--------------|----------------------------------------------------|
| 4.427313429798464  | H4/2.4.494  | Henry IV                 | Carrier      | As fat as butter.                                  |
| 3.836270302568675  | H4/1.2.21   | Henry IV                 | FALSTAFF     | prologue to an egg and butter.                     |
| 3.836270302568675  | H4/2.1.55   | Henry IV                 | Chamberlain  | They are up already, and call for eggs and butter; |
| 3.3844488405497115 | H4/4.2.21   | Henry IV                 | FALSTAFF     | toasts-and-butter, with hearts in their bellies no |
| 3.3844488405497115 | H4/4.2.62   | Henry IV                 | PRINCE HENRY | already made thee butter. But tell me, Jack, whose |
| 3.3844488405497115 | AWW/4.1.40  | Alls well that ends well | PAROLLES     | butter-womans mouth and buy myself another of      |
| 3.3844488405497115 | AYLI/3.2.93 | As you like it           | TOUCHSTONE   | right butter-womens rank to market.                |
| 3.3844488405497115 | KL/2.4.132  | King Lear                | Fool         | kindness to his horse, buttered his hay.           |
| 3.0278411214953107 | AWW/5.2.9   | Alls well that ends well | Clown        | henceforth eat no fish of fortunes buttering.      |
| 3.0278411214953107 | MWW/2.2.260 | Merry Wives of Windsor   | FALSTAFF     | Hang him, mechanical salt-butter rogue! I will     |
| 3.0278411214953107 | MWW/2.2.284 | Merry Wives of Windsor   | FORD         | rather trust a Fleming with my butter, Parson Hugh |
| 3.0278411214953107 | MWW/3.5.7   | Merry Wives of Windsor   | FALSTAFF     | Ill have my brains taen out and buttered, and give |
| 3.0278411214953107 | MWW/3.5.102 | Merry Wives of Windsor   | FALSTAFF     | to heat as butter; a man of continual dissolution  |
| 2.739219044070792  | H4/2.4.115  | Henry IV                 | PRINCE HENRY | Didst thou never see Titan kiss a dish of butter?  |

Unlike standard indexes, full-text indexes don't auto-update as the underlying data is changed, so you need to `PRAGMA drop_fts_index(my_fts_index)` and recreate it when appropriate.

#### Note on Generating the Corpus Table {#docs:current:guides:sql_features:full_text_search::note-on-generating-the-corpus-table}

For more details, see the [“Generating a Shakespeare corpus for full-text searching from JSON” blog post](https://duckdb.blogspot.com/2023/04/generating-shakespeare-corpus-for-full.html).

* The Columns are: line_id, play_name, line_number, speaker, text_entry.
* We need a unique key for each row in order for full-text searching to work.
* The line_id `KL/2.4.132` means King Lear, Act 2, Scene 4, Line 132.

### Graph Queries {#docs:current:guides:sql_features:graph_queries}

DuckDB supports graph queries via the [DuckPGQ community extension](https://duckpgq.org), which implements the SQL/PGQ syntax from the SQL:2023 standard.

Graph queries allow you to find patterns and paths in connected data, such as social networks, financial transactions, or knowledge graphs, using a visual, intuitive syntax.

> **Warning.** DuckPGQ is a community extension and is still under active development. Some features may be incomplete. See the [DuckPGQ website](https://duckpgq.org) for the latest status.

#### Installing DuckPGQ {#docs:current:guides:sql_features:graph_queries::installing-duckpgq}

```sql
INSTALL duckpgq FROM community;
LOAD duckpgq;
```

#### Creating a Property Graph {#docs:current:guides:sql_features:graph_queries::creating-a-property-graph}

A property graph consists of vertices (nodes) and edges (relationships). You create one as a layer on top of existing tables:

```sql
CREATE TABLE Person (id BIGINT, name VARCHAR);
CREATE TABLE Knows (person1_id BIGINT, person2_id BIGINT, since DATE);

INSERT INTO Person VALUES (1, 'Alice'), (2, 'Bob'), (3, 'Charlie');
INSERT INTO Knows VALUES (1, 2, '2020-01-01'), (2, 3, '2021-06-15');

CREATE PROPERTY GRAPH social_network
VERTEX TABLES (
    Person
)
EDGE TABLES (
    Knows
        SOURCE KEY (person1_id) REFERENCES Person (id)
        DESTINATION KEY (person2_id) REFERENCES Person (id)
);
```

#### Pattern Matching {#docs:current:guides:sql_features:graph_queries::pattern-matching}

Use the `GRAPH_TABLE` function with `MATCH` to find patterns. The syntax uses `()` for nodes and `[]` for edges:

```sql
FROM GRAPH_TABLE (social_network
    MATCH (a:Person)-[k:Knows]->(b:Person)
    COLUMNS (a.name AS person1, b.name AS person2, k.since)
);
```

| person1 | person2 | since      |
|---------|---------|------------|
| Alice   | Bob     | 2020-01-01 |
| Bob     | Charlie | 2021-06-15 |

#### Path Finding {#docs:current:guides:sql_features:graph_queries::path-finding}

Find paths of variable length using quantifiers like `{1,5}` (1 to 5 hops) or `+` (one or more):

```sql
FROM GRAPH_TABLE (social_network
    MATCH p = ANY SHORTEST (a:Person)-[k:Knows]->{1,3}(b:Person)
    WHERE a.name = 'Alice' AND b.name = 'Charlie'
    COLUMNS (a.name AS start_person, b.name AS end_person, path_length(p) AS hops)
);
```

| start_person | end_person | hops |
|--------------|------------|------|
| Alice        | Charlie    | 2    |

#### Graph Algorithms {#docs:current:guides:sql_features:graph_queries::graph-algorithms}

> **Warning.** Graph algorithm functions may currently fail due to a [known issue](https://github.com/cwida/duckpgq-extension/issues/283) and return the `csr_cte does not exist` error.

DuckPGQ includes built-in graph algorithms:

| Function | Description |
|----------|-------------|
| `pagerank(graph, vertex_label, edge_label)` | Computes PageRank centrality scores |
| `local_clustering_coefficient(graph, vertex_label, edge_label)` | Measures how connected a node's neighbors are |
| `weakly_connected_component(graph, vertex_label, edge_label)` | Identifies connected components |

Example:

```sql
FROM pagerank(social_network, Person, Knows);
```

#### Use Case: Financial Fraud Detection {#docs:current:guides:sql_features:graph_queries::use-case-financial-fraud-detection}

Graph queries excel at finding suspicious patterns in financial data. See the ["Uncovering Financial Crime with DuckDB and Graph Queries" blog post](https://duckdb.org/2025/10/22/duckdb-graph-queries-duckpgq) for a detailed example of detecting money laundering patterns.

#### Cleanup {#docs:current:guides:sql_features:graph_queries::cleanup}

To remove a property graph:

```sql
DROP PROPERTY GRAPH social_network;
```

#### Further Reading {#docs:current:guides:sql_features:graph_queries::further-reading}

* [DuckPGQ Documentation](https://duckpgq.org)
* [DuckPGQ Community Extension](#community_extensions:extensions:duckpgq)
* ["Uncovering Financial Crime with DuckDB and Graph Queries" blog post](https://duckdb.org/2025/10/22/duckdb-graph-queries-duckpgq)

### query and query_table Functions {#docs:current:guides:sql_features:query_and_query_table_functions}

The [`query_table`](#docs:current:sql:functions:utility::query_tabletbl_name)
and [`query`](#docs:current:sql:functions:utility::queryquery_string_literal)
functions enable powerful and more dynamic SQL.

The `query_table` function returns the table whose name is specified by its string argument; the `query` function returns the table obtained by executing the query specified by its string argument.

Both functions only accept constant strings. For example, they allow passing in a table name as a prepared statement parameter:

```sql
CREATE TABLE my_table (i INTEGER);
INSERT INTO my_table VALUES (42);

PREPARE select_from_table AS SELECT * FROM query_table($1);
EXECUTE select_from_table('my_table');
```

| i  |
|---:|
| 42 |

When combined with the [`COLUMNS` expression](#docs:current:sql:expressions:star::columns), we can write very generic SQL-only macros. For example, below is a custom version of `SUMMARIZE` that computes the `min` and `max` of every column in a table:

```sql
CREATE OR REPLACE MACRO my_summarize(table_name) AS TABLE
SELECT
    unnest([*COLUMNS('alias_.*')]) AS column_name,
    unnest([*COLUMNS('min_.*')]) AS min_value,
    unnest([*COLUMNS('max_.*')]) AS max_value
FROM (
    SELECT
        any_value(alias(COLUMNS(*))) AS "alias_\0",
        min(COLUMNS(*))::VARCHAR AS "min_\0",
        max(COLUMNS(*))::VARCHAR AS "max_\0"
    FROM query_table(table_name::VARCHAR)
);

SELECT *
FROM my_summarize('https://blobs.duckdb.org/data/ontime.parquet')
LIMIT 3;
```

| column_name | min_value | max_value |
|-------------|----------:|----------:|
| year        | 2017      | 2017      |
| quarter     | 1         | 3         |
| month       | 1         | 9         |

The `query` function allows for even more flexibility. For example, users who prefer pandas' `stack` syntax over SQL's `UNPIVOT` syntax, may use:

```sql
CREATE OR REPLACE MACRO stack(table_name, index, name, values) AS TABLE 
FROM query(
    'UNPIVOT ' || table_name 
    || ' ON COLUMNS(* EXCLUDE (' || array_to_string(index, ', ') 
    || ')) INTO NAME ' || name || ' VALUES ' || values
);

WITH cities AS (
    FROM (
        VALUES 
            ('NL', 'Amsterdam', '10', '12', '15'),
            ('US', 'New York', '100', '120', '150')
    ) _(country, city, '2000', '2010', '2020')
)
SELECT *
FROM stack('cities', ['country', 'city'], 'year', 'population');
```

| country |   city    | year | population |
|---------|-----------|------|------------|
| NL      | Amsterdam | 2000 | 10         |
| NL      | Amsterdam | 2010 | 12         |
| NL      | Amsterdam | 2020 | 15         |
| US      | New York  | 2000 | 100        |
| US      | New York  | 2010 | 120        |
| US      | New York  | 2020 | 150        |

### Merge Statement for SCD Type 2 {#docs:current:guides:sql_features:merge}

This is a practical, step-by-step guide to using DuckDB’s `MERGE` statement (introduced in v1.4.0) to perform upserts and build [Slowly Changing Dimension Type 2 (SCD Type 2) tables](https://en.wikipedia.org/wiki/Slowly_changing_dimension). Type 2 SCDs let you keep full historical versions of records while clearly identifying the current version, perfect for audit trails, data warehousing, and analytical workloads. Type 2 SCDs are practical when you want to know previous values of your primary key data, when it changed and for how long it was in a particular state.

#### Why Use MERGE in DuckDB? {#docs:current:guides:sql_features:merge::why-use-merge-in-duckdb}

- Single SQL statement for `INSERT`, `UPDATE`, and soft `DELETE` (upsert and expire).
- Much cleaner and faster than equivalent Python/Pandas logic.
- Full history tracking without hard deletes.
- Works directly on Parquet, CSV, databases, thanks to DuckDB's connectivity!

#### Prerequisites {#docs:current:guides:sql_features:merge::prerequisites}

* Basic SQL knowledge

#### Key Terminology {#docs:current:guides:sql_features:merge::key-terminology}

| Term                          | Meaning                                                                                   |
|-------------------------------|-------------------------------------------------------------------------------------------|
| **Target table**              | The main/master table you are updating (e.g., `master_ducks`)                             |
| **Source table**              | The incoming/new data (e.g., `incoming_ducks`)                                            |
| **MERGE INTO**                | Specifies the target table                                                                |
| **USING**                     | Specifies the source table/query                                                          |
| **ON**                        | Join condition (usually primary/business key + current flag)                             |
| **WHEN MATCHED**              | Row exists in both → typically UPDATE (or DELETE)                                         |
| **WHEN NOT MATCHED BY TARGET**| New row (insert)                                                                          |
| **WHEN NOT MATCHED BY SOURCE**| Row disappeared → soft-delete/expire old version                                          |
| **RETURNING merge_action**    | Optional: shows what happened to each row (INSERT/UPDATE/DELETE)                          |

#### Build an SCD Type 2 Dimension Table {#docs:current:guides:sql_features:merge::build-an-scd-type-2-dimension-table}

We’ll track ducks and preserve history whenever their name, breed, or location changes.

> DuckDB has a frontend notebook UI, this is great for managing several SQL statements and segmenting your code.
> The UI ships with the DuckDB CLI, so if you have the CLI installed you can use the front end.
> To start the notebook front end just run: `duckdb -ui` and you can navigate to [http://localhost:4213/](http://localhost:4213/) to start writing your SQL code inside of your notebooks. Just copy and paste the following code blocks to follow this guide.

##### Step 1: Create the Incoming (source) Table {#docs:current:guides:sql_features:merge::step-1-create-the-incoming-source-table}

This table represents today’s transactional data.

```sql
CREATE TABLE IF NOT EXISTS incoming_ducks (
    duck_id     INTEGER,
    duck_name   VARCHAR,
    breed       VARCHAR,
    location    VARCHAR,
    begin_date  DATE,
    end_date    DATE,
    is_current  BOOLEAN
);

INSERT INTO incoming_ducks VALUES
    (101, 'Quackers',   'Mallard',       'Pond B',      CURRENT_DATE - INTERVAL '1 day', NULL, true),
    (102, 'Waddles',    'Pekin',         'Pond A',      CURRENT_DATE - INTERVAL '1 day', NULL, true),
    (104, 'Splash',     'Muscovy',       'Pond C',      CURRENT_DATE - INTERVAL '1 day', NULL, true),
    (105, 'Puddles',    'Indian Runner', 'Relocated',   CURRENT_DATE - INTERVAL '1 day', NULL, true);

```

##### Step 2: Create the Master (target) Table {#docs:current:guides:sql_features:merge::step-2-create-the-master-target-table}

This table represents the type 2 SCD data (i.e., transaction data with history).

```sql
CREATE TABLE IF NOT EXISTS master_ducks (
    record_id   INTEGER PRIMARY KEY,
    duck_id     INTEGER NOT NULL,
    duck_name   VARCHAR,
    breed       VARCHAR,
    location    VARCHAR,
    begin_date  DATE NOT NULL,
    end_date    DATE,
    is_current  BOOLEAN NOT NULL DEFAULT true
);

CREATE SEQUENCE IF NOT EXISTS duck_record_seq START 1;

INSERT INTO master_ducks VALUES
    (nextval('duck_record_seq'), 101, 'Quackers', 'Mallard',       'Pond A', CURRENT_DATE - INTERVAL '2 days', NULL, true),
    (nextval('duck_record_seq'), 102, 'Waddles',  'Pekin',         'Pond A', CURRENT_DATE - INTERVAL '2 days', NULL, true),
    (nextval('duck_record_seq'), 103, 'Feathers', 'Rouen',         'Pond B', CURRENT_DATE - INTERVAL '2 days', NULL, true),
    (nextval('duck_record_seq'), 105, 'Puddles',  'Indian Runner', 'Pond A', CURRENT_DATE - INTERVAL '2 days', NULL, true);
```

##### Step 3: Perform the Merge Statement {#docs:current:guides:sql_features:merge::step-3-perform-the-merge-statement}

This statement will perform the merge, it will check for differences between the data of target and source and follow the `WHEN MATCHED` or `WHEN NOT MATCHED` logic specified.

```sql
MERGE INTO master_ducks AS target
USING incoming_ducks AS source
ON target.duck_id = source.duck_id AND target.is_current = true

WHEN MATCHED AND (
       target.duck_name <> source.duck_name OR
       target.breed     <> source.breed     OR
       target.location  <> source.location
) THEN UPDATE SET
    end_date    = CURRENT_DATE - INTERVAL '1 day',
    is_current  = false

WHEN NOT MATCHED BY SOURCE AND target.is_current = true THEN UPDATE SET
    end_date    = CURRENT_DATE - INTERVAL '1 day',
    is_current  = false

WHEN NOT MATCHED BY TARGET THEN INSERT (
    record_id, duck_id, duck_name, breed, location,
    begin_date, end_date, is_current
) VALUES (
    nextval('duck_record_seq'),
    source.duck_id, source.duck_name, source.breed, source.location,
    source.begin_date, source.end_date, source.is_current
)

RETURNING merge_action, *;
```

##### Step 4: Insert New Current Versions for Changed Records {#docs:current:guides:sql_features:merge::step-4-insert-new-current-versions-for-changed-records}

This statement inserts the new current records into the master table. While it's possible to achieve the same result using the `MERGE` statement's `RETURNING` clause, this two-step approach is more straightforward and easier to understand.

```sql
INSERT INTO master_ducks (
    record_id, duck_id, duck_name, breed, location,
    begin_date, end_date, is_current
)
SELECT
    nextval('duck_record_seq'),
    source.duck_id,
    source.duck_name,
    source.breed,
    source.location,
    CURRENT_DATE AS begin_date,
    NULL AS end_date,
    true AS is_current
FROM incoming_ducks AS source
INNER JOIN master_ducks AS target
    ON source.duck_id = target.duck_id
WHERE target.is_current = false
  AND target.end_date = CURRENT_DATE - INTERVAL '1 day';
```

##### Step 5: Query The Results {#docs:current:guides:sql_features:merge::step-5-query-the-results}

The following queries can be used to examine the data resulting from the `MERGE` statement.

```sql
-- All history
SELECT * FROM master_ducks ORDER BY duck_id, begin_date DESC;

-- Only current records
SELECT * FROM master_ducks WHERE is_current = true;

-- Only expired historical records
SELECT * FROM master_ducks WHERE is_current = false ORDER BY duck_id, begin_date DESC;
```

##### Step 6: Examine a Single Duck {#docs:current:guides:sql_features:merge::step-6-examine-a-single-duck}

To better illustrate the concept, let's examine a single duck, to drive home the value add for type 2 SCDs.
If we select from the master table after running the merge statement and the post update insert statement, we can see the individual rows for `Quackers`.

To view the original row of data that is historical: 

```sql
SELECT * FROM master_ducks where duck_name = 'Quackers' and is_current = false;
```

Returns:

| record_id | duck_id | duck_name | breed   | location | begin_date   | end_date     | is_current |
|----------:|--------:|----------:|---------|----------|-------------|--------------|------------|
| 1         | 101     | Quackers  | Mallard | Pond A   | 2025-11-24  | 2025-11-25   | false      |

**Note**: 

- The `end date` is NOT NULL, it has the date when this duck's data was updated.
- The `is_current` is `false` indicating this is a historical record.
- The field that will change is `location`, it is currently `Pond A` and will be updated to `Pond B`.

To view the current row of data:

```sql
SELECT * FROM master_ducks where duck_name = 'Quackers' and is_current = true;
```

| record_id | duck_id | duck_name | breed   | location | begin_date   | end_date | is_current |
|----------:|--------:|----------:|---------|----------|-------------|----------|------------|
| 10        | 101     | Quackers  | Mallard | Pond B   | 2025-11-26  | NULL     | true       |

**Note**: 

- The `end date` is NULL, the NULL in this context indicates this is the latest record for this `duck_id`.
- The `is_current` is `true` also indicating this is a current record.
- The `location` is now `Pond B`.

To view all of `Quackers` data, which will contain both current and non-current rows:

```sql
SELECT * FROM master_ducks where duck_name = 'Quackers';
```

| record_id | duck_id | duck_name | breed   | location | begin_date   | end_date | is_current |
| 1         | 101     | Quackers  | Mallard | Pond A   | 2025-11-24  | 2025-11-25   | false      |
| 10        | 101     | Quackers  | Mallard | Pond B   | 2025-11-26  | NULL     | true       |

#### Common Patterns and Variations {#docs:current:guides:sql_features:merge::common-patterns-and-variations}

| Use Case                          | Clause to Use                                                      |
|-----------------------------------|--------------------------------------------------------------------|
| Simple upsert (no history)        | `WHEN MATCHED THEN UPDATE` and `WHEN NOT MATCHED BY TARGET THEN INSERT` |
| Upsert and delete missing rows      | Add `WHEN NOT MATCHED BY SOURCE THEN DELETE`                       |
| Only insert new, never update     | Omit `WHEN MATCHED`                                                |
| Return affected rows              | Add `RETURNING merge_action, *`                                    |

#### Best Practices {#docs:current:guides:sql_features:merge::best-practices}

- Remember that `TARGET` is the master table and `SOURCE` is the incoming table or query.
- Keep end_date NULL for current rows (makes queries faster).
- Wrap `MERGE` and `INSERT` statements in a transaction when needed.
- Use a primary key or a surrogate key for uniqueness.
- Test with RETURNING first.

### Timestamp Issues {#docs:current:guides:sql_features:timestamps}

#### Timestamp with Time Zone Promotion Casts {#docs:current:guides:sql_features:timestamps::timestamp-with-time-zone-promotion-casts}

Working with time zones in SQL can be quite confusing at times. 
For example, when filtering to a date range, one might try the following query:

```sql
SET timezone = 'America/Los_Angeles';

CREATE TABLE times AS
    FROM range('2025-08-30'::TIMESTAMPTZ, '2025-08-31'::TIMESTAMPTZ, INTERVAL 1 HOUR) tbl(t);

FROM times WHERE t <= '2025-08-30';
```

```text
┌──────────────────────────┐
│            t             │
│ timestamp with time zone │
├──────────────────────────┤
│ 2025-08-30 00:00:00-07   │
└──────────────────────────┘
```

But if you change to another time zone, the results of the query change:

```sql
SET timezone = 'HST';
FROM times WHERE t <= '2025-08-30';
```

```text
┌──────────────────────────┐
│            t             │
│ timestamp with time zone │
├──────────────────────────┤
│ 2025-08-29 21:00:00-10   │
│ 2025-08-29 22:00:00-10   │
│ 2025-08-29 23:00:00-10   │
│ 2025-08-30 00:00:00-10   │
└──────────────────────────┘
```

Or worse:

```sql
SET timezone = 'America/New_York';
FROM times WHERE t <= '2025-08-30';
```

```text
┌──────────────────────────┐
│            t             │
│ timestamp with time zone │
├──────────────────────────┤
│          0 rows          │
└──────────────────────────┘
```

These confusing results are due to the SQL casting rules from `DATE` to `TIMESTAMP WITH TIME ZONE`.
This cast is required to promote the date to midnight _in the current time zone_. 

In general, unless you need to use the current time zone for display (or 
[other temporal binning](https://duckdb.org/2022/01/06/time-zones) operations) 
you should use plain `TIMESTAMP`s for temporal data.
This will avoid confusing issues such as this, and the arithmetic operations are generally faster. 

#### Time Zone Performance {#docs:current:guides:sql_features:timestamps::time-zone-performance}

DuckDB uses the _International Components for Unicode_ time library for 
[time zone support](https://duckdb.org/2022/01/06/time-zones).
This library has a number of advantages, including support for daylight savings time past 2037.
(Note: Pandas gives incorrect results past that year).

The downside of using ICU is that it is not highly performant.
One workaround for this is to create a calendar table for the timestamps being modeled.
For example, if the application is modeling electrical supply and demand out to 2100 at hourly resolution,
one can create the calendar table like so:

```sql
SET timezone = 'Europe/Amsterdam';

CREATE OR REPLACE TABLE hourly AS
    SELECT 
        ts, 
        year::SMALLINT AS year,
        month::TINYINT AS month,
        day::TINYINT AS day,
        hour::TINYINT AS hour,
    FROM (
        SELECT ts, unnest(date_part(['year', 'month', 'day', 'hour',], ts))
        FROM generate_series(
            '2020-01-01'::DATE::TIMESTAMPTZ, 
            '2100-01-01'::DATE::TIMESTAMPTZ, 
            INTERVAL 1 HOUR) tbl(ts)
    ) parts;
```

You can then join this ~700K row table against any timestamp column 
to quickly obtain the temporal bin values for the time zone in question.
The inner casts are not required, but result in a smaller table 
because `date_part` returns 64 bit integers for all parts.

Notice that we can extract _all_ of the parts with a single call to `date_part`.
This part list version of the function is faster than extracting the parts one by one
because the underlying binning computation computes all parts,
so picking out the ones in the list avoids duplicate calls to the slow ICU function.

Also notice that we are leveraging the `DATE` cast rules from the previous section 
to bound the calendar to the model domain.

#### Half Open Intervals {#docs:current:guides:sql_features:timestamps::half-open-intervals}

Another subtle problem in using SQL for temporal analytics is the `BETWEEN` operator.
Temporal analytics almost always uses 
[half-open binning intervals](https://www.cs.arizona.edu/~rts/tdbbook.pdf) 
to avoid overlaps at the ends.
Unfortunately, the `BETWEEN` operator is a closed-closed interval:

```sql
x BETWEEN begin AND end
-- expands to
begin <= x AND x <= end
-- not
begin <= x AND x < end

```

To avoid this problem, make sure you are explicit about comparison boundaries instead of using `BETWEEN`.

## Snippets {#guides:snippets}

### Create Synthetic Data {#docs:current:guides:snippets:create_synthetic_data}

DuckDB allows you to quickly generate synthetic datasets. To do so, you may use:

* [range functions](#docs:current:sql:functions:list::range-functions)
* hash functions, e.g.,
  [`hash`](#docs:current:sql:functions:utility::hashvalue),
  [`md5`](#docs:current:sql:functions:utility::md5string),
  [`sha256`](#docs:current:sql:functions:utility::sha256value)
* the [Faker Python package](https://faker.readthedocs.io/) via the [Python function API](#docs:current:clients:python:function)
* using [cross products (Cartesian products)](#docs:current:sql:query_syntax:from::cross-product-joins-cartesian-product)

For example:

```python
import duckdb

from duckdb.sqltypes import *
from faker import Faker

fake = Faker()

def random_date():
    return fake.date_between()

def random_short_text():
    return fake.text(max_nb_chars=20)

def random_long_text():
    return fake.text(max_nb_chars=200)

con = duckdb.connect()
con.create_function("random_date",       random_date,       [], DATE,    type="native", side_effects=True)
con.create_function("random_short_text", random_short_text, [], VARCHAR, type="native", side_effects=True)
con.create_function("random_long_text",  random_long_text,  [], VARCHAR, type="native", side_effects=True)

res = con.sql("""
                 SELECT
                    hash(i * 10 + j) AS id,
                    random_date() AS creationDate,
                    random_short_text() AS short,
                    random_long_text() AS long,
                    IF (j % 2, true, false) AS bool
                 FROM generate_series(1, 5) s(i)
                 CROSS JOIN generate_series(1, 2) t(j)
                 """)
res.show()
```

This generates the following:

```text
┌──────────────────────┬──────────────┬─────────┐
│          id          │ creationDate │  flag   │
│        uint64        │     date     │ boolean │
├──────────────────────┼──────────────┼─────────┤
│  6770051751173734325 │ 2019-11-05   │ true    │
│ 16510940941872865459 │ 2002-08-03   │ true    │
│ 13285076694688170502 │ 1998-11-27   │ true    │
│ 11757770452869451863 │ 1998-07-03   │ true    │
│  2064835973596856015 │ 2010-09-06   │ true    │
│ 17776805813723356275 │ 2020-12-26   │ false   │
│ 13540103502347468651 │ 1998-03-21   │ false   │
│  4800297459639118879 │ 2015-06-12   │ false   │
│  7199933130570745587 │ 2005-04-13   │ false   │
│ 18103378254596719331 │ 2014-09-15   │ false   │
├──────────────────────┴──────────────┴─────────┤
│ 10 rows                             3 columns │
└───────────────────────────────────────────────┘
```

### Dutch Railway Datasets {#docs:current:guides:snippets:dutch_railway_datasets}

Examples in this documentation often use datasets based on the [Dutch Railway datasets](https://www.rijdendetreinen.nl/en/open-data/).
These high-quality datasets are maintained by the team behind the [Rijden de Treinen _(Are the trains running?)_ application](https://www.rijdendetreinen.nl/en/about).
This page contains download links to our mirrors to the datasets.

> In 2024, we have published a [blog post on the analysis of these datasets](https://duckdb.org/2024/05/31/analyzing-railway-traffic-in-the-netherlands).

#### Loading the Datasets {#docs:current:guides:snippets:dutch_railway_datasets::loading-the-datasets}

You can load the datasets directly as follows:

```sql
CREATE TABLE services AS
    FROM 'https://blobs.duckdb.org/nl-railway/services-2025-03.csv.gz';
```

```sql
DESCRIBE services;
```



|         column_name          |       column_type        | null | key  | default | extra |
|------------------------------|--------------------------|------|------|---------|-------|
| Service:RDT-ID               | BIGINT                   | YES  | NULL | NULL    | NULL  |
| Service:Date                 | DATE                     | YES  | NULL | NULL    | NULL  |
| Service:Type                 | VARCHAR                  | YES  | NULL | NULL    | NULL  |
| Service:Company              | VARCHAR                  | YES  | NULL | NULL    | NULL  |
| Service:Train number         | BIGINT                   | YES  | NULL | NULL    | NULL  |
| Service:Completely cancelled | BOOLEAN                  | YES  | NULL | NULL    | NULL  |
| Service:Partly cancelled     | BOOLEAN                  | YES  | NULL | NULL    | NULL  |
| Service:Maximum delay        | BIGINT                   | YES  | NULL | NULL    | NULL  |
| Stop:RDT-ID                  | BIGINT                   | YES  | NULL | NULL    | NULL  |
| Stop:Station code            | VARCHAR                  | YES  | NULL | NULL    | NULL  |
| Stop:Station name            | VARCHAR                  | YES  | NULL | NULL    | NULL  |
| Stop:Arrival time            | TIMESTAMP WITH TIME ZONE | YES  | NULL | NULL    | NULL  |
| Stop:Arrival delay           | BIGINT                   | YES  | NULL | NULL    | NULL  |
| Stop:Arrival cancelled       | BOOLEAN                  | YES  | NULL | NULL    | NULL  |
| Stop:Departure time          | TIMESTAMP WITH TIME ZONE | YES  | NULL | NULL    | NULL  |
| Stop:Departure delay         | BIGINT                   | YES  | NULL | NULL    | NULL  |
| Stop:Departure cancelled     | BOOLEAN                  | YES  | NULL | NULL    | NULL  |

#### Datasets {#docs:current:guides:snippets:dutch_railway_datasets::datasets}

##### 80-Month Datasets {#docs:current:guides:snippets:dutch_railway_datasets::80-month-datasets}

* [2019-01 to 2025-08](https://blobs.duckdb.org/nl-railway/railway-services-80-months.zip): 80 months as uncompressed CSVs in a single zip

##### Yearly Datasets {#docs:current:guides:snippets:dutch_railway_datasets::yearly-datasets}

The yearly datasets are about 350 MB each.

* [2019](https://blobs.duckdb.org/nl-railway/services-2019.csv.gz)
* [2020](https://blobs.duckdb.org/nl-railway/services-2020.csv.gz)
* [2021](https://blobs.duckdb.org/nl-railway/services-2021.csv.gz)
* [2022](https://blobs.duckdb.org/nl-railway/services-2022.csv.gz)
* [2023](https://blobs.duckdb.org/nl-railway/services-2023.csv.gz)
* [2024](https://blobs.duckdb.org/nl-railway/services-2024.csv.gz)
* [2025](https://blobs.duckdb.org/nl-railway/services-2025.csv.gz)

##### Monthly Datasets {#docs:current:guides:snippets:dutch_railway_datasets::monthly-datasets}

The monthly datasets are about 30 MB each.

* [2024-01](https://blobs.duckdb.org/nl-railway/services-2024-01.csv.gz)
* [2024-02](https://blobs.duckdb.org/nl-railway/services-2024-02.csv.gz)
* [2024-03](https://blobs.duckdb.org/nl-railway/services-2024-03.csv.gz)
* [2024-04](https://blobs.duckdb.org/nl-railway/services-2024-04.csv.gz)
* [2024-05](https://blobs.duckdb.org/nl-railway/services-2024-05.csv.gz)
* [2024-06](https://blobs.duckdb.org/nl-railway/services-2024-06.csv.gz)
* [2024-07](https://blobs.duckdb.org/nl-railway/services-2024-07.csv.gz)
* [2024-08](https://blobs.duckdb.org/nl-railway/services-2024-08.csv.gz)
* [2024-09](https://blobs.duckdb.org/nl-railway/services-2024-09.csv.gz)
* [2024-10](https://blobs.duckdb.org/nl-railway/services-2024-10.csv.gz)
* [2024-11](https://blobs.duckdb.org/nl-railway/services-2024-11.csv.gz)
* [2024-12](https://blobs.duckdb.org/nl-railway/services-2024-12.csv.gz)
* [2025-01](https://blobs.duckdb.org/nl-railway/services-2025-01.csv.gz)
* [2025-02](https://blobs.duckdb.org/nl-railway/services-2025-02.csv.gz)
* [2025-03](https://blobs.duckdb.org/nl-railway/services-2025-03.csv.gz)
* [2025-04](https://blobs.duckdb.org/nl-railway/services-2025-04.csv.gz)
* [2025-05](https://blobs.duckdb.org/nl-railway/services-2025-05.csv.gz)
* [2025-06](https://blobs.duckdb.org/nl-railway/services-2025-06.csv.gz)
* [2025-07](https://blobs.duckdb.org/nl-railway/services-2025-07.csv.gz)
* [2025-08](https://blobs.duckdb.org/nl-railway/services-2025-08.csv.gz)
* [2025-09](https://blobs.duckdb.org/nl-railway/services-2025-09.csv.gz)
* [2025-10](https://blobs.duckdb.org/nl-railway/services-2025-10.csv.gz)
* [2025-11](https://blobs.duckdb.org/nl-railway/services-2025-11.csv.gz)
* [2025-12](https://blobs.duckdb.org/nl-railway/services-2025-12.csv.gz)
* [2026-01](https://blobs.duckdb.org/nl-railway/services-2026-01.csv.gz)
* [2026-02](https://blobs.duckdb.org/nl-railway/services-2026-02.csv.gz)
* [2026-03](https://blobs.duckdb.org/nl-railway/services-2026-03.csv.gz)

### Sharing Macros {#docs:current:guides:snippets:sharing_macros}

DuckDB has a powerful [macro mechanism](#docs:current:sql:statements:create_macro) that allows creating shorthands for common tasks.

#### Sharing a Scalar Macro {#docs:current:guides:snippets:sharing_macros::sharing-a-scalar-macro}

First, we define a macro that pretty-prints a non-negative integer as a short string with thousands, millions, and billions (without rounding) as follows:

```batch
duckdb pretty_print_integer_macro.duckdb
```

```sql
CREATE MACRO pretty_print_integer(n) AS
    CASE
        WHEN n >= 1_000_000_000 THEN printf('%dB', n // 1_000_000_000)
        WHEN n >= 1_000_000     THEN printf('%dM', n // 1_000_000)
        WHEN n >= 1_000         THEN printf('%dk', n // 1_000)
        ELSE printf('%d', n)
    END;

SELECT pretty_print_integer(25_500_000) AS x;
```

```text
┌─────────┐
│    x    │
│ varchar │
├─────────┤
│ 25M     │
└─────────┘
```

As one would expect, the macro gets persisted in the database.
But this also means that we can host it on an HTTPS endpoint and share it with anyone!
We have published this macro on `blobs.duckdb.org`.

You can try it from DuckDB:

```batch
duckdb
```

Make sure that the [`httpfs` extension](#docs:current:core_extensions:httpfs:overview) is installed:

```sql
INSTALL httpfs;
```

You can now attach to the remote endpoint and use the macro:

```sql
ATTACH 'https://blobs.duckdb.org/data/pretty_print_integer_macro.duckdb'
    AS pretty_print_macro_db;

SELECT pretty_print_macro_db.pretty_print_integer(42_123) AS x;
```

```text
┌─────────┐
│    x    │
│ varchar │
├─────────┤
│ 42k     │
└─────────┘
```

#### Sharing a Table Macro {#docs:current:guides:snippets:sharing_macros::sharing-a-table-macro}

It's also possible to share table macros. For example, we created the [`checksum` macro](https://duckdb.org/2024/10/11/duckdb-tricks-part-2#computing-checksums-for-columns) as follows:

```batch
duckdb compute_table_checksum.duckdb
```

```sql
CREATE MACRO checksum(table_name) AS TABLE
    SELECT bit_xor(md5_number(COLUMNS(*)::VARCHAR))
    FROM query_table(table_name);
```

To use it, make sure that the [`httpfs` extension](#docs:current:core_extensions:httpfs:overview) is installed:

```sql
INSTALL httpfs;
```

You can attach to the remote endpoint and use the macro:

```sql
ATTACH 'https://blobs.duckdb.org/data/compute_table_checksum.duckdb'
    AS compute_table_checksum_db;

CREATE TABLE stations AS
    FROM 'https://blobs.duckdb.org/stations.parquet';

.mode line
FROM compute_table_checksum_db.checksum('stations');
```

```text
         id = -132780776949939723506211681506129908318
       code = 126327004005066229305810236187733612209
        uic = -145623335062491121476006068124745817380
 name_short = -114540917565721687000878144381189869683
name_medium = -568264780518431562127359918655305384
  name_long = 126079956280724674884063510870679874110
       slug = -53458800462031706622213217090663245511
    country = 143068442936912051858689770843609587944
       type = 5665662315470785456147400604088879751
    geo_lat = 160608116135251821259126521573759502306
    geo_lng = -138297281072655463682926723171691547732
```

### Analyzing a Git Repository {#docs:current:guides:snippets:analyze_git_repository}

You can use DuckDB to analyze Git logs using the output of the [`git log` command](https://git-scm.com/docs/git-log).

#### Exporting the Git Log {#docs:current:guides:snippets:analyze_git_repository::exporting-the-git-log}

We start by picking a character that doesn't occur in any part of the commit log (author names, messages, etc).
Since version v1.2.0, DuckDB's CSV reader supports [4-byte delimiters](https://duckdb.org/2025/02/05/announcing-duckdb-120#csv-features), making it possible to use emojis! 🎉

Despite being featured in the [Emoji Movie](https://www.imdb.com/title/tt4877122/) (IMDb rating: 3.4),
we can assume that the [Fish Cake with Swirl emoji (🍥)](https://emojipedia.org/fish-cake-with-swirl) is not a common occurrence in most Git logs.
So, let's clone the [`duckdb/duckdb` repository](https://github.com/duckdb/duckdb) and export its log as follows:

```batch
git log --date=iso-strict --pretty=format:%ad🍥%h🍥%an🍥%s > git-log.csv
```

The resulting file looks like this:

```text
2025-02-25T18:12:54+01:00🍥d608a31e13🍥Mark🍥MAIN_BRANCH_VERSIONING: Adopt also for Python build and amalgamation (#16400)
2025-02-25T15:05:56+01:00🍥920b39ad96🍥Mark🍥Read support for Parquet Float16 (#16395)
2025-02-25T13:43:52+01:00🍥61f55734b9🍥Carlo Piovesan🍥MAIN_BRANCH_VERSIONING: Adopt also for Python build and amalgamation
2025-02-25T12:35:28+01:00🍥87eff7ebd3🍥Mark🍥Fix issue #16377 (#16391)
2025-02-25T10:33:49+01:00🍥35af26476e🍥Hannes Mühleisen🍥Read support for Parquet Float16
```

#### Loading the Git Log into DuckDB {#docs:current:guides:snippets:analyze_git_repository::loading-the-git-log-into-duckdb}

Start DuckDB and read the log as a <s>CSV</s> 🍥SV:

```sql
CREATE TABLE commits AS 
    FROM read_csv(
            'git-log.csv',
            delim = '🍥',
            header = false,
            column_names = ['timestamp', 'hash', 'author', 'message']
        );
```

This will result in a nice DuckDB table:

```sql
FROM commits
LIMIT 5;
```

```text
┌─────────────────────┬────────────┬──────────────────┬───────────────────────────────────────────────────────────────────────────────┐
│      timestamp      │    hash    │      author      │                                    message                                    │
│      timestamp      │  varchar   │     varchar      │                                    varchar                                    │
├─────────────────────┼────────────┼──────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ 2025-02-25 17:12:54 │ d608a31e13 │ Mark             │ MAIN_BRANCH_VERSIONING: Adopt also for Python build and amalgamation (#16400) │
│ 2025-02-25 14:05:56 │ 920b39ad96 │ Mark             │ Read support for Parquet Float16 (#16395)                                     │
│ 2025-02-25 12:43:52 │ 61f55734b9 │ Carlo Piovesan   │ MAIN_BRANCH_VERSIONING: Adopt also for Python build and amalgamation          │
│ 2025-02-25 11:35:28 │ 87eff7ebd3 │ Mark             │ Fix issue #16377 (#16391)                                                     │
│ 2025-02-25 09:33:49 │ 35af26476e │ Hannes Mühleisen │ Read support for Parquet Float16                                              │
└─────────────────────┴────────────┴──────────────────┴───────────────────────────────────────────────────────────────────────────────┘
```

#### Analyzing the Log {#docs:current:guides:snippets:analyze_git_repository::analyzing-the-log}

We can analyze the table as any other in DuckDB.

##### Common Topics {#docs:current:guides:snippets:analyze_git_repository::common-topics}

Let's start with a simple question: which topic was the most commonly mentioned in the commit messages: CI, CLI, or Python?

```sql
SELECT
    message.lower().regexp_extract('\b(ci|cli|python)\b') AS topic,
    count(*) AS num_commits
FROM commits
WHERE topic <> ''
GROUP BY ALL
ORDER BY num_commits DESC;
```

```text
┌─────────┬─────────────┐
│  topic  │ num_commits │
│ varchar │    int64    │
├─────────┼─────────────┤
│ ci      │         828 │
│ python  │         666 │
│ cli     │          49 │
└─────────┴─────────────┘
```

Out of these three topics, commits related to continuous integration dominate the log!

We can also do a more exploratory analysis by looking at all words in the commit messages.
To do so, we first tokenize the messages:

```sql
CREATE TABLE words AS
    SELECT unnest(
        message
            .lower()
            .regexp_replace('\W', ' ')
            .trim(' ')
            .string_split_regex('\W')
        ) AS word    
FROM commits;
```

Then, we remove stopwords using a pre-defined list:

```sql
CREATE TABLE stopwords AS
    SELECT unnest(['a', 'about', 'above', 'after', 'again', 'against', 'all', 'am', 'an', 'and', 'any', 'are', 'as', 'at', 'be', 'because', 'been', 'before', 'being', 'below', 'between', 'both', 'but', 'by', 'can', 'did', 'do', 'does', 'doing', 'don', 'down', 'during', 'each', 'few', 'for', 'from', 'further', 'had', 'has', 'have', 'having', 'he', 'her', 'here', 'hers', 'herself', 'him', 'himself', 'his', 'how', 'i', 'if', 'in', 'into', 'is', 'it', 'its', 'itself', 'just', 'me', 'more', 'most', 'my', 'myself', 'no', 'nor', 'not', 'now', 'of', 'off', 'on', 'once', 'only', 'or', 'other', 'our', 'ours', 'ourselves', 'out', 'over', 'own', 's', 'same', 'she', 'should', 'so', 'some', 'such', 't', 'than', 'that', 'the', 'their', 'theirs', 'them', 'themselves', 'then', 'there', 'these', 'they', 'this', 'those', 'through', 'to', 'too', 'under', 'until', 'up', 'very', 'was', 'we', 'were', 'what', 'when', 'where', 'which', 'while', 'who', 'whom', 'why', 'will', 'with', 'you', 'your', 'yours', 'yourself', 'yourselves']) AS word;

CREATE OR REPLACE TABLE words AS
    FROM words
    NATURAL ANTI JOIN stopwords
    WHERE word != '';
```

> We use the `NATURAL ANTI JOIN` clause here, which allows us to elegantly filter out values that occur in the `stopwords` table.

Finally, we select the top-20 most common words.

```sql
SELECT word, count(*) AS count FROM words
GROUP BY ALL
ORDER BY count DESC
LIMIT 20;
```

```text
┌──────────┬───────┐
│    w     │ count │
│ varchar  │ int64 │
├──────────┼───────┤
│ merge    │ 12550 │
│ fix      │  6402 │
│ branch   │  6005 │
│ pull     │  5950 │
│ request  │  5945 │
│ add      │  5687 │
│ test     │  3801 │
│ master   │  3289 │
│ tests    │  2339 │
│ issue    │  1971 │
│ main     │  1935 │
│ remove   │  1884 │
│ format   │  1819 │
│ duckdb   │  1710 │
│ use      │  1442 │
│ mytherin │  1410 │
│ fixes    │  1333 │
│ hawkfish │  1147 │
│ feature  │  1139 │
│ function │  1088 │
├──────────┴───────┤
│     20 rows      │
└──────────────────┘
```

As expected, there are many Git terms (` merge`, `branch`, `pull`, etc.), followed by terminology related to development (` fix`, `test`/`tests`, `issue`, `format`).
We also see the account names of some developers ([`mytherin`](https://github.com/Mytherin), [`hawkfish`](https://github.com/hawkfish)), which are likely there due to commit messages for merging pull requests (e.g., [”Merge pull request #13776 from Mytherin/expressiondepth”](https://github.com/duckdb/duckdb/commit/4d18b9d05caf88f0420dbdbe03d35a0faabf4aa7)).
Finally, we also see some DuckDB-related terms such as `duckdb` (shocking!) and `function`.

##### Visualizing the Number of Commits {#docs:current:guides:snippets:analyze_git_repository::visualizing-the-number-of-commits}

Let's visualize the number of commits each year:

```sql
SELECT
    year(timestamp) AS year,
    count(*) AS num_commits,
    num_commits.bar(0, 20_000) AS num_commits_viz
FROM commits
GROUP BY ALL
ORDER BY ALL;
```

```text
┌───────┬─────────────┬──────────────────────────────────────────────────────────────────────────────────┐
│ year  │ num_commits │                                 num_commits_viz                                  │
│ int64 │    int64    │                                     varchar                                      │
├───────┼─────────────┼──────────────────────────────────────────────────────────────────────────────────┤
│  2018 │         870 │ ███▍                                                                             │
│  2019 │        1621 │ ██████▍                                                                          │
│  2020 │        3484 │ █████████████▉                                                                   │
│  2021 │        6488 │ █████████████████████████▉                                                       │
│  2022 │        9817 │ ███████████████████████████████████████▎                                         │
│  2023 │       14585 │ ██████████████████████████████████████████████████████████▎                      │
│  2024 │       15949 │ ███████████████████████████████████████████████████████████████▊                 │
│  2025 │        1788 │ ███████▏                                                                         │
└───────┴─────────────┴──────────────────────────────────────────────────────────────────────────────────┘
```

We see a steady growth over the years –
especially considering that many of DuckDB's functionalities and clients, which were originally part of the main repository, are now maintained in separate repositories
(e.g., [Java](https://github.com/duckdb/duckdb-java), [R](https://github.com/duckdb/duckdb-r)).

Happy hacking!

### Importing Duckbox Tables {#docs:current:guides:snippets:importing_duckbox_tables}

> The scripts provided in this page work on Linux, macOS, and WSL.

By default, the DuckDB [CLI client](#docs:current:clients:cli:overview) renders query results in the [duckbox format](#docs:current:clients:cli:output_formats),
which uses rich, ASCII-art inspired tables to show data.
These tables are often shared verbatim in other documents.
For example, take the table used to demonstrate [new CSV features in the DuckDB v1.2.0 release blog post](https://duckdb.org/2025/02/05/announcing-duckdb-120#csv-features.md):

```text
┌─────────┬───────┐
│    a    │   b   │
│ varchar │ int64 │
├─────────┼───────┤
│ hello   │    42 │
│ world   │    84 │
└─────────┴───────┘
```

What if we would like to load this data back to DuckDB?
This is not supported by default but it can be achieved by some scripting:
we can turn the table into a `│`-separated file and read it with DuckDB's [CSV reader](#docs:current:data:csv:overview).
Note that the separator is not the pipe character `|`, instead it is the [“Box Drawings Light Vertical” character](https://www.compart.com/en/unicode/U+2502) `│`.

#### Loading Duckbox Tables to DuckDB {#docs:current:guides:snippets:importing_duckbox_tables::loading-duckbox-tables-to-duckdb}

First, we save the table above as `duckbox.csv`.
Then, we clean it using `sed`:

```bash
echo -n > duckbox-cleaned.csv
sed -n "2s/^│ *//;s/ *│$//;s/ *│ */│/p;2q" duckbox.csv >> duckbox-cleaned.csv
sed "1,4d;\$d;s/^│ *//;s/ *│$//;s/ *│ */│/g" duckbox.csv >> duckbox-cleaned.csv
```

The `duckbox-cleaned.csv` file looks as follows:

```text
a│b
hello│42
world│84
```

We can then simply load this to DuckDB via:

```sql
FROM read_csv('duckbox-cleaned.csv', delim = '│');
```

And export it to a CSV:

```sql
COPY (FROM read_csv('duckbox-cleaned.csv', delim = '│')) TO 'out.csv';
```

```text
a,b
hello,42
world,84
```

#### Using `shellfs` {#docs:current:guides:snippets:importing_duckbox_tables::using-shellfs}

To parse duckbox tables with a single `read_csv` call – and without creating any temporary files –, we can use the [`shellfs` community extension](#community_extensions:extensions:shellfs):

```sql
INSTALL shellfs FROM community;
LOAD shellfs;
FROM read_csv(
        '(sed -n "2s/^│ *//;s/ *│$//;s/ *│ */│/p;2q" duckbox.csv; ' ||
        'sed "1,4d;\$d;s/^│ *//;s/ *│$//;s/ *│ */│/g" duckbox.csv) |',
        delim = '│'
    );
```

We can also create a [table macro](#docs:current:sql:statements:create_macro::table-macros):

```sql
CREATE MACRO read_duckbox(path) AS TABLE
    FROM read_csv(
            printf(
                '(sed -n "2s/^│ *//;s/ *│$//;s/ *│ */│/p;2q" %s; ' ||
                'sed "1,4d;\$d;s/^│ *//;s/ *│$//;s/ *│ */│/g" %s) |',
                path, path
            ),
            delim = '│'
        );
```

Then, reading a duckbox table is as simple as:

```sql
FROM read_duckbox('duckbox.csv');
```

> `shellfs` is a community extension and it comes without any support or guarantees.
> Only use it if you can ensure that its inputs are appropriately sanitized.
> Please consult the [Securing DuckDB page](#docs:current:operations_manual:securing_duckdb:overview) for more details.

#### Limitations {#docs:current:guides:snippets:importing_duckbox_tables::limitations}

Please consider the following limitations when running this script:

* This approach only works if the table does not have long pipe `│` characters.
  It also trims spaces from the table cell values.
  Make sure to factor in these assumptions when running the script.

* The script is compatible with both BSD `sed` (which is the default on macOS) and GNU `sed` (which is the default on Linux and available on macOS as `gsed`).

* Only the data types [supported by the CSV sniffer](#docs:current:data:csv:auto_detection::type-detection) are parsed correctly. Values containing nested data will be parsed as a `VARCHAR`.

### Copying an In-Memory Database to a File {#docs:current:guides:snippets:copy_in-memory_database_to_file}

Imagine the following situation – you started DuckDB in in-memory mode but would like to persist the state of your database to disk.
To achieve this, **attach to a new disk-based database** and use the [`COPY FROM DATABASE ... TO` command](#docs:current:sql:statements:copy::copy-from-database--to):

```sql
ATTACH 'my_database.db';
COPY FROM DATABASE memory TO my_database;
DETACH my_database;
```

> Ensure that the disk-based database file does not exist before attaching to it.

## Troubleshooting {#guides:troubleshooting}

### Command Line {#docs:current:guides:troubleshooting:command_line}

On Linux and macOS, DuckDB v1.5.0 has a known issue that the [command line client](#docs:current:clients:cli:overview) does not interpret piped scripts ([#21243](https://github.com/duckdb/duckdb/issues/21243)).

To demonstrate the problem, create a `test.sql` file:

```bash
echo "SELECT 42 AS x;" > test.sql
```

Piping the file to the DuckDB 1.5.0 CLI client does not run the script:

```bash
duckdb < test.sql
# does not run the script
```

To work around this, add `| cat` to the end of the call:

```bash
duckdb < test.sql | cat
```

```text
┌───────┐
│   x   │
│ int32 │
├───────┤
│    42 │
└───────┘
```

If you are piping from a file, you can also use the [`-f` argument](#docs:current:clients:cli:arguments):

```bash
duckdb -f test.sql
```

```text
┌───────┐
│   x   │
│ int32 │
├───────┤
│    42 │
└───────┘
```

### Crashes {#docs:current:guides:troubleshooting:crashes}

DuckDB is [thoroughly tested](#why_duckdb::thoroughly-tested) via an extensive test suite.
However, bugs can still occur and these can sometimes lead to crashes.
This page contains practical information on how to troubleshoot DuckDB crashes.

#### Types of Crashes {#docs:current:guides:troubleshooting:crashes::types-of-crashes}

There are a few major types of crashes:

* **Termination signals:** The process stops with a `SIGSEGV` (segmentation fault), `SIGABRT`, etc.: these should never occur. Please [submit an issue](#::submitting-an-issue).

* **Internal errors:** an operation may result in an [`Internal Error`](#docs:current:dev:internal_errors), e.g.:

  ```console
  INTERNAL Error:
  Attempted to access index 3 within vector of size 3
  ```

  After encountering an internal error, DuckDB enters a restricted mode where any further operations will result in the following error message:

  ```console
  FATAL Error:
  Failed: database has been invalidated because of a previous fatal error.
  The database must be restarted prior to being used again.
  ```

* **Out of memory errors:** A DuckDB crash can also be a symptom of the operating system killing the process.
  For example, many Linux distributions run an [OOM reaper or OOM killer process](https://learn.redhat.com/t5/Platform-Linux/Out-of-Memory-Killer/td-p/48828), which kills processes to free up their memory and thus prevents the operating system from running out of memory.
  If your DuckDB session is killed by the OOM reaper, consult the [“OOM errors” page](#docs:current:guides:troubleshooting:oom_errors)

#### Recovering Data {#docs:current:guides:troubleshooting:crashes::recovering-data}

If your DuckDB session was writing to a persistent database file prior to crashing,
there might be a WAL ([write-ahead log](https://en.wikipedia.org/wiki/Write-ahead_logging)) file next to your database named `⟨database_filename⟩.wal`{:.language-sql .highlight}.
To recover data from the WAL file, simply start a new DuckDB session on the persistent database.
DuckDB will then replay the write-ahead log and perform a [checkpoint operation](#docs:current:sql:statements:checkpoint), restoring the database to the state before the crash.

#### Troubleshooting the Crash {#docs:current:guides:troubleshooting:crashes::troubleshooting-the-crash}

##### Using the Latest Stable and Preview Builds {#docs:current:guides:troubleshooting:crashes::using-the-latest-stable-and-preview-builds}

DuckDB is constantly improving, so there is a chance that the bug you have encountered has already been fixed in the codebase.
First, try updating to the [**latest stable build**](https://duckdb.org/install/index.html?version=stable).
If this doesn't resolve the problem, try using the [**preview build**](https://duckdb.org/install/index.html?version=main) (also known as the “nightly build”).

If you would like to use DuckDB with an [open pull request](https://github.com/duckdb/duckdb/pulls) applied to the codebase,
you can try [building it from source](#docs:current:dev:building:overview).

##### Search for Existing Issues {#docs:current:guides:troubleshooting:crashes::search-for-existing-issues}

There is a chance that someone else already reported the bug that causes the crash.
Please search in the [GitHub issue tracker](https://github.com/duckdb/duckdb/issues) for the error message to see potentially related issues.
DuckDB has a large community and there may be some suggestions for a workaround.

##### Disabling the Query Optimizer {#docs:current:guides:troubleshooting:crashes::disabling-the-query-optimizer}

Some crashes are caused by DuckDB's query optimizer component.
To identify whether the optimizer is causing the crash, try to turn it off and re-run the query:

```sql
PRAGMA disable_optimizer;
```

If the query finishes successfully, then the crash was caused by one or more optimizer rules.
To pinpoint the specific rules that caused the crash, you can try to [selectively disable optimizer rules](#docs:current:configuration:pragmas::selectively-disabling-optimizers). This way, your query can still benefit from the rest of the optimizer rules.

##### Try to Isolate the Issue {#docs:current:guides:troubleshooting:crashes::try-to-isolate-the-issue}

Some issues are caused by the interplay of different components and extensions, or are specific to certain platforms or client languages.
You can often isolate the issue to a smaller problem.

###### Reproducing in Plain SQL {#docs:current:guides:troubleshooting:crashes::reproducing-in-plain-sql}

Issues can also occur due to differences in client libraries.
To understand whether this is the case, try reproducing the issue using plain SQL queries with the [DuckDB CLI client](#docs:current:clients:cli:overview).
If you cannot reproduce the issue in the command line client, it is likely related to the client library.

###### Different Hardware Setup {#docs:current:guides:troubleshooting:crashes::different-hardware-setup}

According to our experience, several crashes occur due to faulty hardware (overheating hard drives, overclocked CPUs, etc.).
Therefore, it's worth trying another computer to run the same workload.

###### Decomposing the Query {#docs:current:guides:troubleshooting:crashes::decomposing-the-query}

It's a good idea to try to break down the query into multiple smaller queries with each using a separate DuckDB extension and SQL feature.

For example, if you have a query that targets a dataset in an AWS S3 bucket and performs two joins on it, try to rewrite it as a series of smaller steps as follows.
Download the dataset's files manually and load them into DuckDB.
Then perform the first join and the second join separately.
If the multi-step approach still exhibits the crash at some step, then the query that triggers the crash is a good basis for a minimal reproducible example. If the multi-step approach works and the multi-step process no longer crashes, try to reconstruct the original query and observe which step reintroduces the error.
In both cases, you will have a better understanding of what is causing the issue and potentially also a workaround that you can use right away.
In any case, please consider [submitting an issue](#::submitting-an-issue) with your findings.

#### Submitting an Issue {#docs:current:guides:troubleshooting:crashes::submitting-an-issue}

If you found a crash in DuckDB, please consider submitting an issue in our [GitHub issue tracker](https://github.com/duckdb/duckdb/issues) with a [minimal reproducible example](https://en.wikipedia.org/wiki/Minimal_reproducible_example).

### Out of Memory Errors {#docs:current:guides:troubleshooting:oom_errors}

DuckDB has a state of the art out-of-core query engine that can spill to disk for larger-than-memory processing.
We continuously strive to improve DuckDB's scalability and prevent out of memory errors whenever possible.
That said, you may still experience out-of-memory errors if you run queries with multiple [blocking operators](#docs:current:guides:performance:how_to_tune_workloads::blocking-operators), certain aggregation functions, `PIVOT` operations, etc., or if you have very little available memory compared to the dataset size.

#### Types of “Out of Memory” Errors {#docs:current:guides:troubleshooting:oom_errors::types-of-out-of-memory-errors}

Out of memory errors mainly occur in two forms:

##### `OutOfMemoryException` {#docs:current:guides:troubleshooting:oom_errors::outofmemoryexception}

Most of the time DuckDB runs out of memory with an `OutOfMemoryException`.
For example:

```console
duckdb.duckdb.OutOfMemoryException: Out of Memory Error: failed to pin block of size 256.0 KiB (476.7 MiB/476.8 MiB used)
```

##### OOM Reaper (Linux) {#docs:current:guides:troubleshooting:oom_errors::oom-reaper-linux}

Many Linux distributions have an [OOM killer or OOM reaper process](https://learn.redhat.com/t5/Platform-Linux/Out-of-Memory-Killer/td-p/48828)
whose goal is to prevent memory overcommitment.
If the OOM reaper killed your process, you often see the following message where DuckDB was running:

```console
Killed
```

To get more detailed information, check the diagnostic messages using the [`dmesg` command](https://en.wikipedia.org/wiki/Dmesg) (you may need `sudo`):

```batch
sudo dmesg
```

If the process was killed by the OOM killer/reaper, you will find an entry like this:

```console
[Fri Apr 18 02:04:10 2025] Out of memory: Killed process 54400 (duckdb) total-vm:1037911068kB, anon-rss:770031964kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:1814612kB oom_score_adj:0
```

#### Troubleshooting Out of Memory Errors {#docs:current:guides:troubleshooting:oom_errors::troubleshooting-out-of-memory-errors}

To prevent out of memory errors, try to reduce memory usage.
To this end, please consult the [“How to Tune Workloads” site](#docs:current:guides:performance:how_to_tune_workloads).
In short:

* Reduce the number of threads using the `SET threads = ...` command.
* If your query reads a large amount of data from a file or writes a large amount of data, try setting the `preserve_insertion_order` option to `false`: `SET preserve_insertion_order = false`.
* Counter-intuitively, reducing the memory limit below the [default 80%](#docs:current:operations_manual:limits) can help prevent out of memory errors. This is because some DuckDB operations circumvent the database's buffer manager and thus they can reserve more memory than allowed by the memory limit. If this happens (e.g., DuckDB is killed by the operating system or an OOM reaper process), set the memory limit to just 50-60% of the total system memory by using the `SET memory_limit = '...'` statement.
* Break up the query into subqueries. This allows you to see where the intermediate results “blow up”, causing the query to run out of memory.

#### See Also {#docs:current:guides:troubleshooting:oom_errors::see-also}

For more information on DuckDB's memory management, see the [“Memory Management in DuckDB” blog post](https://duckdb.org/2024/07/09/memory-management).

## Glossary of Terms {#docs:current:guides:glossary}

This page contains a glossary of a few common terms used in DuckDB.

#### Terms {#docs:current:guides:glossary::terms}

##### In-Process Database Management System {#docs:current:guides:glossary::in-process-database-management-system}

The DBMS runs in the client application's process instead of running as a separate process, which is common in the traditional client–server setup. An alternative term is **embeddable** database management system. In general, the term _“embedded database management system”_ should be avoided, as it can be confused with DBMSs targeting _embedded systems_ (which run on e.g., microcontrollers).

##### Replacement Scan {#docs:current:guides:glossary::replacement-scan}

In DuckDB, replacement scans are used when a table name used by a query does not exist in the catalog. These scans can substitute another data source instead of the table. Using replacement scans allows DuckDB to, e.g., seamlessly read [Pandas DataFrames](#docs:current:guides:python:sql_on_pandas) or read input data from remote sources without explicitly invoking the functions that perform this (e.g., [reading Parquet files from https](#docs:current:guides:network_cloud_storage:http_import)). For details, see the [C API – Replacement Scans page](#docs:current:clients:c:replacement_scans).

##### Extension {#docs:current:guides:glossary::extension}

DuckDB has a flexible extension mechanism that allows for dynamically loading extensions. These may extend DuckDB's functionality by providing support for additional file formats, introducing new types, and domain-specific functionality. For details, see the [Extensions page](#docs:current:extensions:overview).

##### Platform {#docs:current:guides:glossary::platform}

The platform is a combination of the operating system (e.g., Linux, macOS, Windows), system architecture (e.g., AMD64, ARM64), and, optionally, the compiler used (e.g., GCC4). Platforms are used to distribute DuckDB binaries and [extension packages](#docs:current:extensions:extension_distribution::platforms).

## Browsing Offline {#docs:current:guides:offline-copy}

The offline documentation is currently not available. Please check back later.
You can browse the DuckDB documentation offline in the following formats:

* [Single Markdown file](https://blobs.duckdb.org/docs/duckdb-docs.md) (approx. 5 MB)

* [PDF file](https://blobs.duckdb.org/docs/duckdb-docs.pdf) (approx. 35 MB)

# Operations Manual {#operations_manual}

## Overview {#docs:current:operations_manual:overview}

We designed DuckDB to be easy to deploy and operate. We believe that most users do not need to consult the pages of the operations manual.
However, there are certain setups – e.g., when DuckDB is running in mission-critical infrastructure – where we would like to offer advice on how to configure DuckDB.
The operations manual contains advice for these cases and also offers convenient configuration snippets such as Gitignore files.

For advice on getting the best performance from DuckDB, see also the [Performance Guide](#docs:current:guides:performance:overview).

## DuckDB's Footprint {#operations_manual:footprint_of_duckdb}

### Files Created by DuckDB {#docs:current:operations_manual:footprint_of_duckdb:files_created_by_duckdb}

DuckDB creates several files and directories on disk. This page lists both the global and the local ones.

#### Global Files and Directories {#docs:current:operations_manual:footprint_of_duckdb:files_created_by_duckdb::global-files-and-directories}

DuckDB creates the following global files and directories in the user's home directory (denoted with `~`):

| Location | Description | Shared between versions | Shared between clients |
|-------|-------------------|--|--|
| `~/.duckdbrc` | The content of this file is executed when starting the [DuckDB CLI client](#docs:current:clients:cli:overview). The commands can be both [dot command](#docs:current:clients:cli:dot_commands) and SQL statements. The naming of this file follows the `~/.bashrc` and `~/.zshrc` “run commands” files. | Yes | Only used by CLI |
| `~/.duckdb_history` | History file, similar to `~/.bash_history` and `~/.zsh_history`. Used by the [DuckDB CLI client](#docs:current:clients:cli:overview). | Yes | Only used by CLI |
| `~/.duckdb/extensions` | Binaries of installed [extensions](#docs:current:extensions:overview). | No | Yes |
| `~/.duckdb/stored_secrets` | [Persistent secrets](#docs:current:configuration:secrets_manager::persistent-secrets) created by the [Secrets manager](#docs:current:configuration:secrets_manager). | Yes | Yes |

#### Local Files and Directories {#docs:current:operations_manual:footprint_of_duckdb:files_created_by_duckdb::local-files-and-directories}

DuckDB creates the following files and directories in the working directory (for in-memory connections) or relative to the database file (for persistent connections):

| Name | Description | Example |
|-------|-------------------|---|
| `⟨database_filename⟩`{:.language-sql .highlight} | Database file. Only created in on-disk mode. The file can have any extension with typical extensions being `.duckdb`, `.db` and `.ddb`. | `weather.duckdb` |
| `.tmp/` | Temporary directory. Only created in in-memory mode. | `.tmp/` |
| `⟨database_filename⟩.tmp/`{:.language-sql .highlight} | Temporary directory. Only created in on-disk mode. | `weather.tmp/` |
| `⟨database_filename⟩.wal`{:.language-sql .highlight} | [Write-ahead log](https://en.wikipedia.org/wiki/Write-ahead_logging) file. If DuckDB exits normally, the WAL file is deleted upon exit. If DuckDB crashes, the WAL file is required to recover data. | `weather.wal` |

If you are working in a Git repository and would like to disable tracking these files by Git,
see the instructions on using [`.gitignore` for DuckDB](#docs:current:operations_manual:footprint_of_duckdb:gitignore_for_duckdb).

### Gitignore for DuckDB {#docs:current:operations_manual:footprint_of_duckdb:gitignore_for_duckdb}

If you work in a Git repository, you may want to configure your [Gitignore](https://git-scm.com/docs/gitignore) to disable tracking [files created by DuckDB](#docs:current:operations_manual:footprint_of_duckdb:files_created_by_duckdb).
These potentially include the DuckDB database, write-ahead log, and temporary files.

#### Sample Gitignore Files {#docs:current:operations_manual:footprint_of_duckdb:gitignore_for_duckdb::sample-gitignore-files}

In the following, we present sample Gitignore configuration snippets for DuckDB.

##### Ignore Temporary Files but Keep Database {#docs:current:operations_manual:footprint_of_duckdb:gitignore_for_duckdb::ignore-temporary-files-but-keep-database}

This configuration is useful if you would like to keep the database file in the version control system:

```text
*.wal
*.tmp/
```

##### Ignore Database and Temporary Files {#docs:current:operations_manual:footprint_of_duckdb:gitignore_for_duckdb::ignore-database-and-temporary-files}

If you would like to ignore both the database and the temporary files, extend the Gitignore file to include the database file.
The exact Gitignore configuration to achieve this depends on the extension you use for your DuckDB databases (` .duckdb`, `.db`, `.ddb`, etc.).
For example, if your DuckDB files use the `.duckdb` extension, add the following lines to your `.gitignore` file:

```text
*.duckdb*
*.wal
*.tmp/
```

### Reclaiming Space {#docs:current:operations_manual:footprint_of_duckdb:reclaiming_space}

DuckDB uses a single-file format, which has some inherent limitations w.r.t. reclaiming disk space.

#### `CHECKPOINT` {#docs:current:operations_manual:footprint_of_duckdb:reclaiming_space::checkpoint}

To reclaim space after deleting rows, use the [`CHECKPOINT` statement](#docs:current:sql:statements:checkpoint).

#### `VACUUM` {#docs:current:operations_manual:footprint_of_duckdb:reclaiming_space::vacuum}

The [`VACUUM` statement](#docs:current:sql:statements:vacuum) does _not_ trigger vacuuming deletes and hence does not reclaim space.

#### Compacting a Database by Copying {#docs:current:operations_manual:footprint_of_duckdb:reclaiming_space::compacting-a-database-by-copying}

To compact the database, you can create a fresh copy of the database using the [`COPY FROM DATABASE` statement](#docs:current:sql:statements:copy::copy-from-database--to). In the following example, we first connect to the original database `db1`, then the new (empty) database `db2`. Then, we copy the content of `db1` to `db2`.

```sql
ATTACH 'db1.db' AS db1;
ATTACH 'db2.db' AS db2;
COPY FROM DATABASE db1 TO db2;
```

## Installing DuckDB {#operations_manual:installing_duckdb}

### Install Script {#docs:current:operations_manual:installing_duckdb:install_script}

You can install the [DuckDB CLI client](#docs:current:clients:cli:overview) using an install script.

#### Linux and macOS {#docs:current:operations_manual:installing_duckdb:install_script::linux-and-macos}

To use the [DuckDB install script](https://install.duckdb.org) on Linux and macOS, run:

```bash
curl https://install.duckdb.org | sh
```



<details markdown='1'>
<summary markdown='span'>
Click to see the output of the install script.
</summary>
```text
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3507  100  3507    0     0  34367      0 --:--:-- --:--:-- --:--:-- 34382
https://install.duckdb.org/v1.4.1/duckdb_cli-osx-universal.gz

*** DuckDB Linux/MacOS installation script, version 1.4.1 ***


         .;odxdl,
       .xXXXXXXXXKc
       0XXXXXXXXXXXd  cooo:
      ,XXXXXXXXXXXXK  OXXXXd
       0XXXXXXXXXXXo  cooo:
       .xXXXXXXXXKc
         .;odxdl,


########################################################################## 100.0% {#docs:current:operations_manual:installing_duckdb:install_script::1000}

Successfully installed DuckDB binary to /Users/your_user/.duckdb/cli/1.4.1/duckdb
  with a link from                      /Users/your_user/.duckdb/cli/latest/duckdb

Hint: Append the following line to your shell profile:
export PATH='/Users/your_user/.duckdb/cli/latest':$PATH


To launch DuckDB now, type
/Users/your_user/.duckdb/cli/latest/duckdb
```
</details>



By default, this installs the latest stable version of DuckDB to `~/.duckdb/cli/latest/duckdb`.
To add the DuckDB binary to your path, append the following line to your shell profile or RC file (e.g., `~/.bashrc`, `~/.zshrc`):

```bash
export PATH="~/.duckdb/cli/latest":$PATH
```

You can install [past DuckDB releases](#release_calendar::past-releases) (all the way back to v1.0.0) using the `DUCKDB_VERSION` variable. For example, to install v1.2.2, run:

```bash
curl https://install.duckdb.org | DUCKDB_VERSION=1.2.2 sh
```

#### Windows {#docs:current:operations_manual:installing_duckdb:install_script::windows}

On Windows, we provide a beta install script. To use it, run the following command:

```bash
powershell -NoExit iex (iwr "https://install.duckdb.org/install.ps1").Content
```

## Logging {#operations_manual:logging}

### Logging {#docs:current:operations_manual:logging:overview}

DuckDB implements a logging mechanism that provides users with detailed information about events such as query execution,
performance metrics and system events.

#### Basics {#docs:current:operations_manual:logging:overview::basics}

The DuckDB logging mechanism can be enabled or disabled using a special function, `enable_logging`. Logs are stored in a special view
named `duckdb_logs`, which can be queried like any standard table.

Example:

```sql
CALL enable_logging();
-- Run some queries...
SELECT * FROM duckdb_logs;
```

To disable logging, run

```sql
CALL disable_logging();
```

To clear the current log, run

```sql
CALL truncate_duckdb_logs();
```

#### Log Level {#docs:current:operations_manual:logging:overview::log-level}

DuckDB supports different logging levels that control the verbosity of the logs:

* `ERROR`: Only logs error messages
* `WARN`: Logs warnings and errors
* `INFO`: Logs general information, warnings and errors (default)
* `DEBUG`: Logs detailed debugging information
* `TRACE`: Logs very detailed tracing information

The log level can be set using:

```sql
CALL enable_logging(level = 'debug');
```

#### Log Types {#docs:current:operations_manual:logging:overview::log-types}

In DuckDB, log messages can have an associated log type. Log types allow two main things:

* Fine-grained control over log message generation
* Support for structured logging

##### Logging-Specific Types {#docs:current:operations_manual:logging:overview::logging-specific-types}

To log only messages of a specific type:

```sql
CALL enable_logging('HTTP');
```

The above function will automatically set the correct log level and will add the `HTTP` type to the `enabled_log_types` settings. This ensures
only log messages of the 'HTTP' type will be written to the log.

To enable multiple log types, simply pass:

```sql
CALL enable_logging(['HTTP', 'QueryLog']);
```

##### Structured Logging {#docs:current:operations_manual:logging:overview::structured-logging}

Some log types like `HTTP` will have an associated message schema. To make DuckDB automatically parse the message, use the `duckdb_logs_parsed()` macro. For example:

```sql
SELECT request.headers FROM duckdb_logs_parsed('HTTP');
```

To view the schema of each structure log type simply run:

```sql
DESCRIBE FROM duckdb_logs_parsed('HTTP');
```

##### List of Available Log Types {#docs:current:operations_manual:logging:overview::list-of-available-log-types}

This is a (non-exhaustive) list of the available log types in DuckDB.

| Log Type     | Description                                              | Structured |
|--------------|----------------------------------------------------------|------------|
| `QueryLog`   | Logs which queries are executed in DuckDB                | No         |
| `FileSystem` | Logs all FileSystem interaction with DuckDB's Filesystem | Yes        |
| `HTTP`       | Logs all HTTP traffic from DuckDB's internal HTTP client | Yes        |

#### Log Storages {#docs:current:operations_manual:logging:overview::log-storages}

By default, DuckDB logs to an in-memory log storage (` memory`). DuckDB supports different types of log storage. Currently,
the following log storage types are implemented in core DuckDB.

| Log Storage | Description                                               |
|-------------|-----------------------------------------------------------|
| `memory`    | (default) Log to an in-memory buffer                      |
| `stdout`    | Log to the stdout of the current process (in CSV format)  |
| `file`      | Log to (a) csv file(s)                                    |


Note that the `duckdb_logs` view is automatically updated to target the currently active log storage. This means that switching
the log storage may influence what is returned by the `duckdb_logs` function.

##### Logging to stdout {#docs:current:operations_manual:logging:overview::logging-to-stdout}

```sql
CALL enable_logging(storage = 'stdout');
```

##### Logging to File  {#docs:current:operations_manual:logging:overview::logging-to-file-}

```sql
CALL enable_logging(storage = 'file', storage_config = {'path': 'path/to/store/logs'});
```

or using the equivalent shorthand:

```sql
CALL enable_logging(storage_path = 'path/to/store/logs');
```

#### Advanced Usage {#docs:current:operations_manual:logging:overview::advanced-usage}

##### Normalized vs. Denormalized Logging {#docs:current:operations_manual:logging:overview::normalized-vs-denormalized-logging}

DuckDB's log storages can log in two ways: normalized vs. denormalized.

In denormalized logging, the log context information is appended directly to each log entry, while in normalized logging
the log entries are stored separately with context_ids referencing the context information.

| Log Storage | Normalized   |
|-------------|--------------|
| `memory`    | yes          |
| `file`      | configurable |
| `stdout`    | no           |

For file storage, you can switch between normalized and denormalized by providing a path ending in .csv (for normalized)
or without .csv (for denormalized). For file logging, denormalized is generally recommended since this increases performance 
and reduces the total size of the logs. To configure normalization of `file` log storage:

```sql
-- normalized: creates `/tmp/duckdb_log_contexts.csv` and `/tmp/duckdb_log_entries.csv`
CALL enable_logging(storage_path = '/tmp');
-- denormalized: creates `/tmp/logs.csv`
CALL enable_logging(storage_path = '/tmp/logs.csv');
```

Note that the difference between normalized and denormalized is typically hidden from users through the 'duckdb_logs' function,
which automatically joins normalized tables into a single unified result. To illustrate, both configurations above will be
queryable using `FROM duckdb_logs;` and will produce identical results.

##### Buffer Size {#docs:current:operations_manual:logging:overview::buffer-size}

The log storage in DuckDB implements a buffering mechanism to optimize logging performance. This implementation
introduces a potential delay between message logging and storage writing. This delay can obscure the actual message writing time,
which is particularly problematic when debugging crashes, as messages generated immediately before a crash might not be
written. To address this, the buffer size can be configured as follows:

```sql
CALL enable_logging(storage_config = {'buffer_size': 0});
```

or using the equivalent shorthand:

```sql
CALL enable_logging(storage_buffer_size = 0);
```

Note that the default buffer size is different for different log storages:

| Log Storage | Default buffer size           |
|-------------|-------------------------------|
| `memory`    | `STANDARD_VECTOR_SIZE` (2048) |
| `file`      | `STANDARD_VECTOR_SIZE` (2048) |
| `stdout`    | Disabled (0)                  |

So for example, if you want to increase your `stdout` logging performance, simply enable buffering to greatly (>10x) speed up 
your logging:

```SQL
CALL enable_logging(storage = 'stdout', storage_buffer_size = 2048);
```

Or imagine you are debugging a crash in DuckDB and you want to use the `file` logger to understand what's going on:
Simply disable the
buffering using:

```sql
CALL enable_logging(storage_path = '/tmp/mylogs', storage_buffer_size = 2048);
```

##### Syntactic Sugar {#docs:current:operations_manual:logging:overview::syntactic-sugar}

DuckDB contains some syntactic sugar to make common paths easier. For example, the following statements are all equal:

```sql
-- regular invocation 
CALL enable_logging(storage = 'file', storage_config = {'path': 'path/to/store/logs'});
-- using shorthand for common path storage config param 
CALL enable_logging(storage = 'file', storage_path = 'path/to/store/logs');
-- omitting `storage = 'file'` -> is implied from presence of `storage_config`
CALL enable_logging(storage_config = {'path': 'path/to/store/logs'});
```

## HTTP User-Agent {#docs:current:operations_manual:user_agents}

#### HTTP User-Agent {#docs:current:operations_manual:user_agents::http-user-agent}

Core DuckDB sets the default user-agent as follows:

```text
duckdb/v1.4.4(osx_arm64) cli 6ddac802ff
```

which indicates version, architecture, client, buildref in the agent string. The user-agent string can also be modified via the `custom_user_agent` setting, see [Configuration](#docs:current:configuration:overview). The currently generated user-agent string can be seen via `PRAGMA user_agent;`, see [Configuration/Pragmas](#docs:current:configuration:pragmas::user-agent).

In addition, some extensions set their own user agents; notable examples here include the following.

#### Extensions {#docs:current:operations_manual:user_agents::extensions}

##### Azure {#docs:current:operations_manual:user_agents::azure}

Azure uses the Azure SDK which sets its own user agents. For identity and storage calls you may see respectively strings like these:

- via Azure Identity: `azsdk-cpp-identity/1.11.0 (Darwin 25.2.0 arm64 Darwin Kernel Version 25.2.0: Tue Nov 18 21:07:05 PST 2025; root:xnu-12377.61.12~1/RELEASE_ARM64_T6020 Cpp/201402)`
- via Azure Blob/ADLSv2: `azsdk-cpp-storage-blobs/12.15.0 (Darwin 25.2.0 arm64 Darwin Kernel Version 25.2.0: Tue Nov 18 21:07:05 PST 2025; root:xnu-12377.61.12~1/RELEASE_ARM64_T6020 Cpp/201402)`

##### Delta (and Unity Catalog) {#docs:current:operations_manual:user_agents::delta-and-unity-catalog}

The Delta extension employs calls from DuckDB core, tagged as the DuckDB default above, and also has calls originating from the Delta Kernel, which may look like:

- `object_store/0.12.5`

Unity Catalog calls also use a mix of DuckDB default user-agents, and the Delta style agent above.

##### HTTPFS - HTTPS/S3 {#docs:current:operations_manual:user_agents::httpfs---httpss3}

Calls via HTTPFS the extension use the DuckDB default strings noted above.

## Securing DuckDB {#operations_manual:securing_duckdb}

### Securing DuckDB {#docs:current:operations_manual:securing_duckdb:overview}

DuckDB is a powerful analytical database engine. It can read and write files, access the network, load extensions, 
and use system resources. Like any powerful tool, these capabilities require appropriate configuration when 
working with sensitive data or in shared environments.

This page documents DuckDB's security model and security-related settings. The right configuration depends on your use case, environment, and threat model.
If you plan to embed DuckDB in your application, also consult the ["Embedding DuckDB"](#docs:current:operations_manual:securing_duckdb:embedding_duckdb) page.

#### Untrusted Input {#docs:current:operations_manual:securing_duckdb:overview::untrusted-input}

##### Untrusted SQL Input {#docs:current:operations_manual:securing_duckdb:overview::untrusted-sql-input}

> **Warning.** Treat SQL in DuckDB like code in Bash or Python. Do not execute SQL from untrusted sources without proper sandboxing.

DuckDB executes SQL with the full privileges of the user running it, much like a shell or scripting interpreter (such as bash or Python).
Just as you wouldn't run an untrusted shell script or Python program without sandboxing, apply the same caution to SQL in DuckDB.

If your application must execute SQL from untrusted sources, use additional safeguards when running untrusted code such as:

* Use [duckdb-wasm](https://github.com/duckdb/duckdb-wasm) for sandboxing
* Run DuckDB in an isolated container (e.g., Docker with restricted capabilities)
* Use a virtual machine or separate process with minimal privileges
* Apply operating system-level sandboxing
* Use network isolation to prevent data exfiltration
* Implement strict query timeouts at the application level

The settings described on this page provide **defense-in-depth** and can limit certain capabilities, but they are not a substitute for proper sandboxing. Also keep in mind that
sandboxing should not just be considered for security purposes, but also for preventing denial of service (DoS) attacks: malicious inputs can easily cause DuckDB to consume excessive resources such 
as memory, disk, CPU, or network.

##### Untrusted Non-SQL Input {#docs:current:operations_manual:securing_duckdb:overview::untrusted-non-sql-input}

> **Warning.** Even non-SQL input into DuckDB can easily have unintended consequences. When building security-sensitive applications with DuckDB, always make sure you properly understand the impact of feeding untrusted input into DuckDB.

Besides SQL, DuckDB also has several non-SQL APIs that can be used to interact with the database. For example, there is a [relational API](#docs:current:clients:python:relational_api) in Python that allows building queries programmatically.

These APIs accept user input such as file paths, table names, column names, and filter expressions. While they don't execute raw SQL strings, they still trigger DuckDB operations that can read files, access the network, and use system resources.

**Example considerations for non-SQL APIs:**

* **File paths:** Functions like `duckdb.read_csv(path)` or `duckdb.read_parquet(path)` accept file paths. An attacker-controlled path could read sensitive files (e.g., `/etc/passwd`) or access remote URLs.
* **Table and column names:** While these are typically identifiers rather than executable code, unsanitized input could lead to unexpected behavior or information disclosure.
* **Filter expressions:** Some APIs accept filter expressions that are compiled into DuckDB expressions, which often support subqueries containing arbitrary SQL. Treat these with the same caution as SQL.

**Recommendations:**

* Validate and sanitize all user-provided inputs before passing them to DuckDB APIs.
* Apply the same sandboxing principles as for untrusted SQL when accepting input from untrusted sources.
* Properly read the documentation of all used functions to ensure you understand whether a function is safe to use with untrusted input under your specific use case.

#### Extensions {#docs:current:operations_manual:securing_duckdb:overview::extensions}

DuckDB has a flexible [extension mechanism](#docs:current:extensions:overview) that adds functionality such as new file formats, functions, and remote file system access. Extensions run with the same privileges as the DuckDB process itself, so they warrant careful consideration in security-sensitive environments.

##### Autoloading {#docs:current:operations_manual:securing_duckdb:overview::autoloading}

DuckDB can automatically load [core extensions](#docs:current:core_extensions:overview) when certain SQL statements require them. To maintain full control over which extensions are loaded, you can disable autoloading:

```sql
SET autoload_known_extensions = false;
SET autoinstall_known_extensions = false;
```

##### Core vs. Community Extensions {#docs:current:operations_manual:securing_duckdb:overview::core-vs-community-extensions}

DuckDB extensions fall into two categories:

* **Core extensions:** Maintained by the DuckDB team with full support. These include extensions like `parquet`, `json`, and `httpfs`.
* **[Community extensions](#community_extensions:index):** Contributed by third parties and installed via `INSTALL extension_name FROM community`. These are not maintained by the DuckDB team, so only install community extensions from sources you trust.

To disable community extensions entirely:

```sql
SET allow_community_extensions = false;
```

#### Reporting Vulnerabilities {#docs:current:operations_manual:securing_duckdb:overview::reporting-vulnerabilities}

If you discover a potential vulnerability, please [report it confidentially via GitHub](https://github.com/duckdb/duckdb/security/advisories/new).

#### Settings to Limit DuckDB's Capabilities {#docs:current:operations_manual:securing_duckdb:overview::settings-to-limit-duckdbs-capabilities}

The settings documented in this section provide additional hardening for DuckDB deployments. However, they should not be
relied upon as comprehensive security mechanisms in all configurations. These settings are designed as defense-in-depth
measures to limit the impact of potential security issues, but they cannot provide complete protection against all
attack vectors, especially when executing untrusted SQL. For robust security when dealing with untrusted input, combine
these settings with proper sandboxing at the operating system or container level, as described in
the ["Untrusted SQL Input"](#::untrusted-sql-input) section.

##### Safe Mode (CLI) {#docs:current:operations_manual:securing_duckdb:overview::safe-mode-cli}

DuckDB's CLI client supports [“safe mode”](#docs:current:clients:cli:safe_mode), which prevents DuckDB from accessing external files other than the database file.
This can be activated via a command line argument or a [dot command](#docs:current:clients:cli:dot_commands):

```batch
duckdb -safe ...
```

```plsql
.safe_mode
```


##### Restricting File Access {#docs:current:operations_manual:securing_duckdb:overview::restricting-file-access}

DuckDB can list directories and read arbitrary files via its CSV parser’s [`read_csv` function](#docs:current:data:csv:overview) or read text via the [`read_text` function](#docs:current:sql:functions:text::read_textsource).
This makes it possible to read from the local file system, for example:

```sql
SELECT *
FROM read_csv('/etc/passwd', sep = ':');
```

###### Disabling File Access {#docs:current:operations_manual:securing_duckdb:overview::disabling-file-access}

File access can be disabled in two ways. First, you can disable individual file systems. For example:

```sql
SET disabled_filesystems = 'LocalFileSystem';
```

Second, you can also completely disable external access by setting the [`enable_external_access` option](#docs:current:configuration:overview::configuration-reference) option to `false`.

```sql
SET enable_external_access = false;
```

This setting implies that:

* `ATTACH` cannot attach to a database in a file.
* `COPY` cannot read from or write to files.
* Functions such as `read_csv`, `read_parquet`, `read_json`, etc. cannot read from an external source.

###### The `allowed_directories` and `allowed_paths` Options {#docs:current:operations_manual:securing_duckdb:overview::the-allowed_directories-and-allowed_paths-options}

You can restrict DuckDB's access to certain directories or files using the `allowed_directories` and `allowed_paths` options (respectively).
These options allow fine-grained access control for the file system.
For example, you can set DuckDB to only use the `/tmp` directory.

```sql
SET allowed_directories = ['/tmp'];
SET enable_external_access = false;
FROM read_csv('test.csv');
```

With the setting applied, DuckDB will refuse to read files in the current working directory:

```console
Permission Error:
Cannot access file "test.csv" - file system operations are disabled by configuration
```

##### Locking Configurations {#docs:current:operations_manual:securing_duckdb:overview::locking-configurations}

Security-related configuration settings generally lock themselves for safety reasons. For example, while we can disable [community extensions](#community_extensions:index) using the `SET allow_community_extensions = false`, we cannot re-enable them again after the fact without restarting the database. Trying to do so will result in an error:

```console
Invalid Input Error: Cannot upgrade allow_community_extensions setting while database is running
```

This prevents re-enabling settings that were explicitly disabled.

Nevertheless, many configuration settings do not disable themselves, such as the resource constraints. If you allow users to run SQL statements unrestricted on your own hardware, you might want to consider locking the configuration after your own configuration has finished using the following command:

```sql
SET lock_configuration = true;
```

This prevents any configuration settings from being modified from that point onwards.

To allow specific settings to remain configurable even when `lock_configuration` is enabled, use the `allowed_configs` option:

```sql
SET allowed_configs = ['memory_limit', 'threads'];
SET lock_configuration = true;
```

With this configuration, `memory_limit` and `threads` can still be changed, while all other settings are locked.

#### Secrets {#docs:current:operations_manual:securing_duckdb:overview::secrets}

[Secrets](#docs:current:configuration:secrets_manager) are used to manage credentials to log into third party services like AWS or Azure. DuckDB can show a list of secrets using the `duckdb_secrets()` table function. This will redact any sensitive information such as security keys by default. The `allow_unredacted_secrets` option can be set to show all information contained within a security key. It is recommended not to turn on this option if you are running untrusted SQL input.

Queries can access the secrets defined in the Secrets Manager. For example, if there is a secret defined to authenticate with a user, who has write privileges to a given AWS S3 bucket, queries may write to that bucket. This is applicable for both persistent and temporary secrets.

[Persistent secrets](#docs:current:configuration:secrets_manager::persistent-secrets) are stored in unencrypted binary format on the disk. These have the same permissions as SSH keys, `600`, i.e., only the user who is running the DuckDB (parent) process can read and write them.

#### Prepared Statements to Prevent SQL Injection {#docs:current:operations_manual:securing_duckdb:overview::prepared-statements-to-prevent-sql-injection}

Similarly to other SQL databases, it's recommended to use [prepared statements](#docs:current:sql:query_syntax:prepared_statements) in DuckDB to prevent [SQL injection](https://en.wikipedia.org/wiki/SQL_injection).

> Important Prepared statements protect against SQL injection when **you control the query structure** but accept **untrusted data values** (e.g., user-provided search terms or IDs). If users can supply the SQL query itself, this is equivalent to allowing them to run arbitrary code – see ["Untrusted SQL Input"](#::untrusted-sql-input).

**Therefore, avoid concatenating strings for queries:**

```python
import duckdb
duckdb.execute("SELECT * FROM (VALUES (32, 'a'), (42, 'b')) t(x) WHERE x = " + str(42)).fetchall()
```

**Instead, use prepared statements:**

```python
import duckdb
duckdb.execute("SELECT * FROM (VALUES (32, 'a'), (42, 'b')) t(x) WHERE x = ?", [42]).fetchall()
```

#### Constrain Resource Usage {#docs:current:operations_manual:securing_duckdb:overview::constrain-resource-usage}

DuckDB can use quite a lot of CPU, RAM, and disk space. These resources can be limited to control the usage of the DuckDB instance.

The number of CPU threads that DuckDB can use can be set using, for example:

```sql
SET threads = 4;
```

Where 4 is the number of allowed threads.

The maximum amount of memory (RAM) can also be limited, for example:

```sql
SET memory_limit = '4GB';
```

The size of the temporary file directory can be limited with:

```sql
SET max_temp_directory_size = '4GB';
```

#### Privileges {#docs:current:operations_manual:securing_duckdb:overview::privileges}

Avoid running DuckDB as a root user (e.g., using `sudo`).
There is no good reason to run DuckDB as root.

### Embedding DuckDB {#docs:current:operations_manual:securing_duckdb:embedding_duckdb}

#### CLI Client {#docs:current:operations_manual:securing_duckdb:embedding_duckdb::cli-client}

The [Command Line Interface (CLI) client](#docs:current:clients:cli:overview) is intended for interactive use cases and not for embedding.
As a result, it has more features that could be abused by a malicious actor.
For example, the CLI client has the `.sh` feature that allows executing arbitrary shell commands.
This feature is only present in the CLI client and not in any other DuckDB clients.

```sql
.sh ls
```

> **Tip.** Calling DuckDB's CLI client via shell commands is **not recommended** for embedding DuckDB. It is recommended to use one of the client libraries, e.g., [Python](#docs:current:clients:python:overview), [R](#docs:current:clients:r), [Java](#docs:current:clients:java), etc.

### Securing Extensions {#docs:current:operations_manual:securing_duckdb:securing_extensions}

DuckDB has a powerful extension mechanism, which has the same privileges as the user running DuckDB's (parent) process.
This introduces security considerations. Therefore, we recommend reviewing the configuration options listed on this page and setting them according to your attack models.

#### DuckDB Signature Checks {#docs:current:operations_manual:securing_duckdb:securing_extensions::duckdb-signature-checks}

DuckDB extensions are checked on every load using the signature of the binaries.
There are currently three categories of extensions:

* Signed with a `core` key. Only extensions vetted by the core DuckDB team are signed with these keys.
* Signed with a `community` key. These are open-source extensions distributed via the [DuckDB Community Extensions repository](#community_extensions:index).
* Unsigned.

#### Overview of Security Levels for Extensions {#docs:current:operations_manual:securing_duckdb:securing_extensions::overview-of-security-levels-for-extensions}

DuckDB offers the following security levels for extensions.

| Usable extensions | Description | Configuration |
|-----|---|---|
| `core` | Extensions can only be loaded if signed from a `core` key. | `SET allow_community_extensions = false` |
| `core` and `community` | Extensions can only be loaded if signed from a `core` or `community` key. | This is the default security level. |
| Any extension including unsigned | Any extensions can be loaded. | `SET allow_unsigned_extensions = true` |

Security-related configuration settings [lock themselves](#docs:current:operations_manual:securing_duckdb:overview::locking-configurations), i.e., it is only possible to restrict capabilities in the current process.

For example, attempting the following configuration changes will result in an error:

```sql
SET allow_community_extensions = false;
SET allow_community_extensions = true;
```

```console
Invalid Input Error: Cannot upgrade allow_community_extensions setting while database is running
```

#### Community Extensions {#docs:current:operations_manual:securing_duckdb:securing_extensions::community-extensions}

DuckDB has a [Community Extensions repository](#community_extensions:index), which allows convenient installation of third-party extensions.
Community extension repositories like pip or npm are essentially enabling remote code execution by design. This is less dramatic than it sounds. For better or worse, we are quite used to piping random scripts from the web into our shells, and routinely install a staggering amount of transitive dependencies without thinking twice. Some repositories like CRAN enforce a human inspection at some point, but that’s no guarantee for anything either.

We’ve studied several different approaches to community extension repositories and have picked what we think is a sensible approach: we do not attempt to review the submissions, but require that the *source code of extensions is available*. We do take over the complete build, sign and distribution process. Note that this is a step up from pip and npm that allow uploading arbitrary binaries but a step down from reviewing everything manually. We allow users to [report malicious extensions](https://github.com/duckdb/community-extensions/security/advisories/new) and show adoption statistics like GitHub stars and download count. Because we manage the repository, we can remove problematic extensions from distribution quickly.

Despite this, installing and loading DuckDB extensions from the community extension repository will execute code written by third party developers, and therefore *can* be dangerous. A malicious developer could create and register a harmless-looking DuckDB extension that steals your crypto coins.
If you’re running a web service that executes untrusted SQL from users with DuckDB, we recommend disabling community extensions. To do so, run:

```sql
SET allow_community_extensions = false;
```

#### Disabling Autoinstalling and Autoloading Known Extensions {#docs:current:operations_manual:securing_duckdb:securing_extensions::disabling-autoinstalling-and-autoloading-known-extensions}

By default, DuckDB automatically installs and loads known extensions. To disable autoinstalling known extensions, run:

```sql
SET autoinstall_known_extensions = false;
```

To disable autoloading known extensions, run:

```sql
SET autoload_known_extensions = false;
```

To lock this configuration, use the [`lock_configuration` option](#docs:current:operations_manual:securing_duckdb:overview::locking-configurations):

```sql
SET lock_configuration = true;
```

#### Always Require Signed Extensions {#docs:current:operations_manual:securing_duckdb:securing_extensions::always-require-signed-extensions}

By default, DuckDB requires extensions to be either signed as core extensions (created by the DuckDB developers) or community extensions (created by third-party developers but distributed by the DuckDB developers).
The [`allow_unsigned_extensions` setting](#docs:current:extensions:overview::unsigned-extensions) can be enabled on start-up to allow loading unsigned extensions.
While this setting is useful for extension development, enabling it will allow DuckDB to load _any extensions,_ which means more care must be taken to ensure malicious extensions are not loaded.

## Non-Deterministic Behavior {#docs:current:operations_manual:non-deterministic_behavior}

Several operators in DuckDB exhibit non-deterministic behavior.
Most notably, SQL uses set semantics, which allows results to be returned in a different order.
DuckDB exploits this to improve performance, particularly when performing multi-threaded query execution.
Other factors, such as using different compilers, operating systems and hardware architectures, can also cause changes in ordering.
This page documents the cases where non-determinism is an _expected behavior_.
If you would like to make your queries deterministic, see the [“Working Around Non-Determinism” section](#::working-around-non-determinism).

#### Set Semantics {#docs:current:operations_manual:non-deterministic_behavior::set-semantics}

One of the most common sources of non-determinism is the set semantics used by SQL.
E.g., if you run the following query repeatedly, you may get two different results:

```sql
SELECT *
FROM (
    SELECT 'A' AS x
    UNION
    SELECT 'B' AS x
);
```

Both results `A`, `B` and `B`, `A` are correct.

#### Different Results on Different Platforms: `array_distinct` {#docs:current:operations_manual:non-deterministic_behavior::different-results-on-different-platforms-array_distinct}

The `array_distinct` function may return results [in a different order on different platforms](https://github.com/duckdb/duckdb/issues/13746):

```sql
SELECT array_distinct(['A', 'A', 'B', NULL, NULL]) AS arr;
```

For this query, both `[A, B]` and `[B, A]` are valid results.

#### Floating-Point Aggregate Operations with Multi-Threading {#docs:current:operations_manual:non-deterministic_behavior::floating-point-aggregate-operations-with-multi-threading}

Floating-point inaccuracies may produce different results when run in multi-threaded configurations:
For example, [`stddev` and `corr` may produce non-deterministic results](https://github.com/duckdb/duckdb/issues/13763):

```sql
CREATE TABLE tbl AS
    SELECT 'ABCDEFG'[floor(random() * 7 + 1)::INT] AS s, 3.7 AS x, i AS y
    FROM range(1, 1_000_000) r(i);

SELECT s, stddev(x) AS standard_deviation, corr(x, y) AS correlation
FROM tbl
GROUP BY s
ORDER BY s;
```

The expected standard deviations and correlations from this query are 0 for all values of `s`.
However, when executed on multiple threads, the query may return small numbers (` 0 <= z < 10e-16`) due to floating-point inaccuracies.

#### Working Around Non-Determinism {#docs:current:operations_manual:non-deterministic_behavior::working-around-non-determinism}

For the majority of use cases, non-determinism is not causing any issues.
However, there are some cases where deterministic results are desirable.
In these cases, try the following workarounds:

1. Limit the number of threads to prevent non-determinism introduced by multi-threading.

   ```sql
   SET threads = 1;
   ```

2. Enforce ordering. For example, you can use the [`ORDER BY ALL` clause](#docs:current:sql:query_syntax:orderby::order-by-all):

   ```sql
   SELECT *
   FROM (
       SELECT 'A' AS x
       UNION
       SELECT 'B' AS x
   )
   ORDER BY ALL;
   ```

   You can also sort lists using [`list_sort`](#docs:current:sql:functions:list::list_sortlist):

   ```sql
   SELECT list_sort(array_distinct(['A', 'A', 'B', NULL, NULL])) AS i
   ORDER BY i;
   ```

   It's also possible to introduce a [deterministic shuffling](https://duckdb.org/2024/08/19/duckdb-tricks-part-1#shuffling-data).

## Limits {#docs:current:operations_manual:limits}

This page contains DuckDB's built-in limit values.
To check the value of a setting on your system, use the `current_setting` function.

#### Limit Values {#docs:current:operations_manual:limits::limit-values}

| Limit | Default value | Configuration option | Comment |
|---|---|---|---|
| Array size | 100000 | - | |
| BLOB size | 4 GB | - | |
| Expression depth | 1000 | [`max_expression_depth`](#docs:current:configuration:overview) | |
| Memory allocation for a vector | 128 GB | - | |
| Memory use | 80% of RAM | [`memory_limit`](#docs:current:configuration:pragmas::memory-limit) | Note: This limit only applies to the buffer manager. |
| String size | 4 GB | - | |
| Temporary directory size | unlimited | [`max_temp_directory_size`](#docs:current:configuration:overview) | |

#### Size of Database Files {#docs:current:operations_manual:limits::size-of-database-files}

DuckDB doesn't have a practical limit for the size of a single DuckDB database file.
We have database files using 15 TB+ of disk space and they work fine.
However, connecting to such a huge database may take a few seconds and [checkpointing](#docs:current:sql:statements:checkpoint) can be slower.

## DuckDB Docker Container {#docs:current:operations_manual:duckdb_docker}

DuckDB has an [official Docker image](https://github.com/duckdb/duckdb-docker), which supports both the ARM64 (AArch64) and x86_64 (AMD64) architectures.

#### Usage {#docs:current:operations_manual:duckdb_docker::usage}

To use the DuckDB Docker image, run:

```batch
docker run --rm -it -v "$(pwd):/workspace" -w /workspace duckdb/duckdb
```

#### Using the DuckDB UI with Docker {#docs:current:operations_manual:duckdb_docker::using-the-duckdb-ui-with-docker}

To use the [DuckDB UI](#docs:current:core_extensions:ui) with Docker, enable host networking.

> This setting forwards all ports from the container, so exercise caution and avoid it in secure environments.

```batch
docker run --rm -it -v "$(pwd):/workspace" -w /workspace --net host duckdb/duckdb
```

Then, launch the UI as follows:

```plsql
CALL start_ui();
```

To enable host networking in Docker Desktop, follow the instructions on the [Host network driver](https://docs.docker.com/engine/network/drivers/host/#docker-desktop) page.

# Development {#dev}

## DuckDB Repositories {#docs:current:dev:repositories}

Several components of DuckDB are maintained in separate repositories.

#### Main Repositories {#docs:current:dev:repositories::main-repositories}

* [`duckdb`](https://github.com/duckdb/duckdb): core DuckDB project
* [`duckdb-web`](https://github.com/duckdb/duckdb-web): documentation and blog

#### Clients {#docs:current:dev:repositories::clients}

* [`duckdb-go`](https://github.com/duckdb/duckdb-go): Go client
* [`duckdb-java`](https://github.com/duckdb/duckdb-java): Java (JDBC) client
* [`duckdb-node`](https://github.com/duckdb/duckdb-node): Node.js client (deprecated)
* [`duckdb-node-neo`](https://github.com/duckdb/duckdb-node-neo): Node.js client
* [`duckdb-odbc`](https://github.com/duckdb/duckdb-odbc): ODBC client
* [`duckdb-pyodide`](https://github.com/duckdb/duckdb-pyodide): Pyodide client
* [`duckdb-python`](https://github.com/duckdb/duckdb-python): Python client
* [`duckdb-r`](https://github.com/duckdb/duckdb-r): R client
* [`duckdb-rs`](https://github.com/duckdb/duckdb-rs): Rust client
* [`duckdb-swift`](https://github.com/duckdb/duckdb-swift): Swift client
* [`duckdb-wasm`](https://github.com/duckdb/duckdb-wasm): WebAssembly client
* [`duckplyr`](https://github.com/tidyverse/duckplyr): a drop-in replacement for dplyr in R

#### Connectors {#docs:current:dev:repositories::connectors}

* [`dbt-duckdb`](https://github.com/duckdb/dbt-duckdb): dbt
* [`duckdb-mysql`](https://github.com/duckdb/duckdb-mysql): MySQL connector
* [`duckdb-postgres`](https://github.com/duckdb/duckdb-postgres): PostgreSQL connector (connect to PostgreSQL from DuckDB)
* [`duckdb-sqlite`](https://github.com/duckdb/duckdb-sqlite): SQLite connector
* [`pg_duckdb`](https://github.com/duckdb/pg_duckdb): official PostgreSQL extension for DuckDB (run DuckDB in PostgreSQL)

#### Extensions {#docs:current:dev:repositories::extensions}

* [`duckdb-ui`](https://github.com/duckdb/duckdb-ui): web UI for DuckDB
* Core extension repositories are linked in the [Official Extensions page](#docs:current:core_extensions:overview)
* Community extensions are served from the [Community Extensions repository](#community_extensions:index)

#### Specifications {#docs:current:dev:repositories::specifications}

* [DuckLake specification](https://ducklake.select/docs/stable/specification/introduction)

## Release Cycle {#docs:current:dev:release_cycle}

This document outlines the DuckDB and core extension release cycle framework. It is intended for developers working on
DuckDB extensions to better understand the underlying processes.

#### Overview {#docs:current:dev:release_cycle::overview}

- DuckDB follows [Semantic Versioning](https://semver.org/) (` v<MAJOR>.<MINOR>.<PATCH>`)
- Minor versions are released approximately every 4 months
- Patch releases are issued as needed for:
    - The latest stable version
    - The current Long Term Support (LTS) version
- All releases are listed in the [Release Calendar](https://duckdb.org/release_calendar.html)

##### Terminology {#docs:current:dev:release_cycle::terminology}

In the release docs we use some basic terminology to describe versions and branches. We briefly go over them here.

- **`vx.y.z`**: The latest stable release
- **`vx.y-codename`**: The name of the branch that will produce `vx.y.<n>` releases
- **`vx.<y+1>-codename`**: The branch name that is used for the branch that will produce the next minor release
- **`Main release cycle`**: The branches, commits, and PRs related to producing `vx.<y+1>.0` and `vx.y.<z+1>` releases
- **`Active branch`**: A branch that is part of the main release cycle. Either main or vx.<y+n>-codename where n >= 0
- **`Single branch extension`**: Extension with 1 active branch. Since main is always an active branch this is always
  main. This means all other branches of format `vx.y-codename` must be `vx.<y-n>-codename` where `n >= 1`
- **`Multi branch extension`**: Extension with more than 1 active branch
- **`Two branch extension`**: Extension with two active branches: main and `vx.y-codename`
- **`Three branch extension`**: Extension with three active branches: main, `vx.y-codename`, and `vx.<y+1>-codename`
- **`LTS release`**: Long term support release. These releases will receive support (patch releases) beyond their
  lifetime in the active release cycle. Currently LTS releases will receive 1 year of support
- **`Unstable API extension`**: An extension targeting the *unstable* extension API. This can be both the C++ API or the
  unstable C API. These extensions are not binary-compatible across multiple DuckDB versions
- **`Stable API extension`**: An extension targeting the *stable* C API of DuckDB. These extensions are
  binary-compatible across multiple DuckDB versions
- **`In-tree extensions`**: Extensions that live inside the `duckdb/duckdb` source tree

##### Main Branches and Tags {#docs:current:dev:release_cycle::main-branches-and-tags}

In git-based version control, branches are used to allow multiple versions of the same codebase to co-exist. At DuckDB,
there are two core branches that play the main role in the DuckDB (and extensions) release cycle. We will start off by
listing the format these core branches come in.

- **`main`** branch: the main branch can mean various things, but can generally be considered the catch-all branch
- **`vx.y-codename`** branch: the branch used to produce all `vx.y.z` releases
- **`vx.y.z`** tag: a stable release of DuckDB. These tags are write-only and will always be tied to the same commit

#### The Main DuckDB Release Cycle {#docs:current:dev:release_cycle::the-main-duckdb-release-cycle}

> LTS (Long-Term Support) releases follow a separate maintenance cycle to provide extended support and stability.

The main DuckDB release cycle consists of 3 main phases: *Mid-cycle*, *Pre-release* and *Feature freeze*. These phases are clearly defined and communicated to ensure the
whole team is synchronized and working together towards the next release.

##### Phase 1: Mid-Cycle {#docs:current:dev:release_cycle::phase-1-mid-cycle}

###### Active DuckDB Branches {#docs:current:dev:release_cycle::active-duckdb-branches}

- `main`
- `vx.y-codename`

###### Description {#docs:current:dev:release_cycle::description}

The *mid-cycle* phase is the most common phase of the release cycle, with about 75% of the time being spent in this
phase. It can be seen as *business-as-usual*, where the upcoming release is still far away and the team is working hard
on merging a variety of features and bug-fixes. During this phase, patch releases (` vx.y.<z+n>`) may be created from the
`vx.y-codename` branch. The patches are merged into the `vx.y-codename` branch, and the `vx.y-codename` branch is
frequently merged into main to keep the two in sync.

###### PRs into DuckDB {#docs:current:dev:release_cycle::prs-into-duckdb}

- Bug-fixes for `vx.y.<z+n>` patch releases are merged into `vx.y-codename`
- Features and bug-fixes for `vx.<y+1>.0` are merged into `main`

##### Phase 2: Pre-Release {#docs:current:dev:release_cycle::phase-2-pre-release}

###### Active Branches {#docs:current:dev:release_cycle::active-branches}

- `main`
- `vx.y-codename`
- `vx.<y+1>-codename`

###### Description {#docs:current:dev:release_cycle::description}

The pre-release phase is intended to prepare for the upcoming `vx.<y+1>.0` minor release. At the start of this phase,
the `vx.<y+1>-codename` branch is created. This branch will be used to produce the upcoming minor release and is the
branch from which all subsequent `vx.<y+1>.<n>` patch releases are released.

###### PRs into DuckDB {#docs:current:dev:release_cycle::prs-into-duckdb}

- Bug-fixes for `vx.y.<z+1>` patch releases are merged into `vx.y-codename`
- Features and bug-fixes for `vx.<y+1>.0` are merged into `vx.<y+1>-codename`
- Features for `vx.<y+2>.0` are merged into `vx.<y+2>-codename`

##### Phase 3: Feature Freeze {#docs:current:dev:release_cycle::phase-3-feature-freeze}

###### Active Branches {#docs:current:dev:release_cycle::active-branches}

- `main`
- `vx.y-codename`
- `vx.<y+1>-codename`

###### Description {#docs:current:dev:release_cycle::description}

The feature freeze phase is the phase closest to release. During this phase features are no longer allowed to be merged
into `vx.<y+1>-codename` and only bug fixes are merged. This phase is intended to ensure the quality of the upcoming
release. During this phase additional testing and benchmarking is performed while reducing the risk of introducing
last-minute bugs by disallowing feature merges.

###### PRs into DuckDB {#docs:current:dev:release_cycle::prs-into-duckdb}

- Bug-fixes for `vx.y.<z+1>` are no longer allowed, should target `vx.<y+1>.0` instead
- Bug-fixes for `vx.<y+1>.0` are merged into `vx.<y+1>-codename`
- Features for `vx.<y+1>.0` are no longer allowed, should target `vx.<y+2>.0` instead
- Features for `vx.<y+2>.0` are merged into `vx.<y+2>-codename`

#### Main Extension Release Cycle {#docs:current:dev:release_cycle::main-extension-release-cycle}

Most DuckDB extensions are completely separate from the main `duckdb/duckdb` repository and are free to follow their own
release cycle. In this section we categorize different types of DuckDB extensions and go over their release cycles.

To describe the release cycle of extensions, we need to first categorize extensions in three different groups, as
extensions share the same release cycle based on which of these three categories they belong to.

- In-tree extensions
- Unstable API extensions
- Stable API extensions

We will now go over the release cycles of the three different categories, in order of increasing complexity.

##### In-Tree Extensions {#docs:current:dev:release_cycle::in-tree-extensions}

For *in-tree extensions*, the release cycle is very simple. Since their code lives in the `duckdb/duckdb` repository,
they move in complete lock-step with DuckDB. This means they share the same versioning and branching. In this sense they
are not really extensions, but more lazy-loadable parts of the `duckdb/duckdb` codebase.

##### Stable API Extensions {#docs:current:dev:release_cycle::stable-api-extensions}

Stable API extensions in DuckDB are a relatively new concept, but are planned to form the majority of extensions in the
future. Stable API extensions are built on the stable C extension API, making them binary compatible with multiple
versions of DuckDB. This means that their release cycle can/should also be completely separate from the DuckDB release
cycle.

While the release cycle for stable API extensions is still work in progress, the basic idea is that the release cycle of
stable API extensions consists of a similar but separate cycle to that of `duckdb/duckdb`, where every version will
target 1 or more versions of DuckDB.

##### Unstable API Extensions {#docs:current:dev:release_cycle::unstable-api-extensions}

Unstable API extensions currently make up the majority of DuckDB extensions. These extensions either target the C++
extension API, or the unstable C extension API. They are, from a release cycle point of view, the most complex. Every
version of an unstable API extension only targets a single DuckDB version. This 1:1 tie means that the release cycle of
these extensions tends to form a sometimes intricate dance around the main DuckDB release cycle. While the goal is to
move as many extensions over to stable APIs, we expect unstable API extensions to be around for quite some time so there
remains a need to clearly define their lifecycle. Therefore we will use the remainder of this section to describe it.

###### Categorizing by Branching {#docs:current:dev:release_cycle::categorizing-by-branching}

To start, we will divide the unstable API extensions into different subcategories. Just like DuckDB itself, these
extensions follow the same branching scheme as DuckDB where a combination of `main` and `vx.y-codename` play the main
role. We will now define the three types of unstable extensions by looking at their **number of active branches**.

- **Single branch extensions** have only the `main` *active* branch
- **Two branch extensions** have two *active* branches: `main` and `vx.y-codename`
- **Three branch extensions** have three *active* branches: `main`, `vx.y-codename`, and `vx.<y+1>-codename`

###### DuckDB Targets {#docs:current:dev:release_cycle::duckdb-targets}

Every unstable API extension should target a single version of DuckDB. This target version is defined by a combination
of **the `duckdb` submodule** and the target version in the [`MainDistributionPipeline`](https://github.com/duckdb/extension-template/blob/main/.github/workflows/MainDistributionPipeline.yml) workflow.
Which version an extension targets depends on the release cycle phase and the branch. We will now go over all
combinations

- Phase: **Mid-cycle**
    - Type: **Single branch**
        - Extension **`main`** `->` DuckDB **`vx.y.z`** or **`main`**
    - Type: **Two branch**
        - Extension **`main`** `->` DuckDB **`vx.y.z`** or **`main`**
        - Extension **`vx.y-codename`** `->` DuckDB **`vx.y.z`** or **`vx.y-codename`**
    - Type: **Three branch**: should not exist
- Phase: **Pre-release** / **Patch**
    - Type: **Single branch**
        - Extension **`main`** `->` DuckDB **`vx.y.z`** or **`vx.<y+1>-codename`**
    - Type: **Two branch**
        - Extension **`main`** `->` DuckDB **`vx.y.z`** or **`vx.<y+1>-codename`**
        - Extension **`vx.y-codename`** `->` DuckDB **`vx.y.z`** or **`vx.y-codename`**
    - Type: **Three branch**
        - Extension **`main`** `->` DuckDB **`main`**
        - Extension **`vx.y-codename`** `->` DuckDB **`vx.y.z`** or **`vx.y-codename`**
        - Extension **`vx.<y+1>-codename`** `->` DuckDB **`vx.<y+1>-codename`**

###### Where to Merge PRs {#docs:current:dev:release_cycle::where-to-merge-prs}

To know where to merge a PR into an unstable API extension depends on two things: the
current release phase and the type of extensions. We will now go over all combinations.

- Phase: **Mid-cycle**
    - Type: **Single branch**
        - if DuckDB target: `vx.y.z`:
            - PR for **`vx.y.<z+1>`** into **`main`**[^1]
            - PR for **`vx.<y+1>.0`** merges into **`main`**
        - if DuckDB target: `main`:
            - PR for **`vx.y.<z+1>`** are **impossible**
            - PR for **`vx.<y+1>.0`** merges into **`main`**
    - Type: **Two branch**
        - PR for **`vx.y.<z+1>`** merges into **`vx.y-codename`**
        - PR for **`vx.<y+1>.0`** merges into **`main`**
    - Type: **Three branch**
        - PR for **`vx.y.<z+1>`** merges into **`vx.y-codename`**
        - PR for **`vx.<y+1>.0`** merges into **`vx.<y+1>-codename`**
        - PR for **`vx.<y+2>.0`** merges into **`main`**
- Phase: **Pre-release** / **Patch**
    - Type: **Single branch**
        - if DuckDB target: `vx.y.z`:
            - PR for **`vx.y.<z+1>`** into **`main`**[^1] [^2]
            - PR for **`vx.<y+1>.0`** merges into **`main`**
        - if DuckDB target: `main`:
            - PR for **`vx.y.<z+1>`** are **impossible**
            - PR for **`vx.<y+1>.0`** merges into **`main`**
    - Type: **Two branch**
        - PR for **`vx.y.<z+1>`** merges into **`vx.y-codename`** [^2]
        - PR for **`vx.<y+1>.0`** merges into **`main`**
    - Type: **Three branch**
        - PR for **`vx.y.<z+1>`** merges into **`vx.y-codename`**[^2]
        - PR for **`vx.<y+1>.0`** merges into **`vx.<y+1>-codename`**
        - PR for **`vx.<y+2>.0`** merges into **`main`**

[^1]: Single branch extensions require manual version updates to ensure changes are included in the targeted release.
[^2]: Patch releases during pre-release or feature-freeze phases are uncommon. Consider targeting changes for the next
minor release instead.

###### What Extension Version Will Be Released? {#docs:current:dev:release_cycle::what-extension-version-will-be-released}

Every DuckDB release, a complete set of all core extensions should be available. For unstable API extensions, this means
a rebuild of the binaries. For the core extensions, this build generally happens through the `duckdb/duckdb` CI. This
means that the list of extensions that will be available on release is documented in
the [extension config files](https://github.com/duckdb/duckdb/tree/main/.github/config/extensions). However, this config
file may not always be up to date. To decide which version of an extension should be part of the upcoming release, we
define the following sources-of-truth for latest extension version based on the release type (major/minor) and extension
type (single/multi branch):

- Release type: **Patch**
    - Extension type: **Single branch**
        - Latest version: **commit
          in [config files](https://github.com/duckdb/duckdb/tree/main/.github/config/extensions)**
    - Extension type: **Multi branch**
        - Latest version: Extension **`vx.y-codename`** branch
- Release type: **Minor**
    - Extension type: **Single branch**
        - Latest version: Extension **`main`** branch
    - Extension type: **Two branch**
        - Latest version: Extension **`main`** branch
    - Extension type: **Three branch**
        - Latest version: Extension **`vx.<y+1>-codename`** branch

###### Switching between Single Branch, Two Branch and Three Branch {#docs:current:dev:release_cycle::switching-between-single-branch-two-branch-and-three-branch}

Switching between the different branch types for extensions is a fairly straightforward process and should be done as follows:

- Switch: **Single branch** `->` **Two branch**
    - When: during **any** phase
    - Reasons:
        - When desire arises to merge features not eligible for `vx.y.<z+1>` while also maintaining ability to do releases for `vx.y.<z+n>`
        - To be able to test with latest DuckDB main while maintaining ability to do releases for `vx.y.<z+n>` (including `vx.y.z` itself)
    - Actions:
        - Create branch `vx.y-codename` from a commit on main between HEAD of `main` and the commit in the DuckDB `vx.y.z` config file.
- Switch: **Two branch** `->` **Three branch**
    - When: during **Pre-release** or **Feature-freeze** phase
    - Reasons:
        - Whenever a feature needs to be merged that is not eligible for merging into `vx.<y+1>.0`.
    - Actions:
        - Create `vx.<y+1>-codename` branch from main
- Switch **Three branch** `->` **Two branch** or **Two branch** `->` **Single branch**
    - When: part of transition from **Feature Freeze** `->` **Mid-cycle**
    - Action: happens automatically (` vx.y-codename` becomes *inactive* by definition)

## Metrics {#docs:current:dev:metrics}

DuckDB provides a set of metrics that can be used to monitor the performance and health of the database.

The query tree has two types of nodes: the `QUERY_ROOT` and `OPERATOR` nodes.
The `QUERY_ROOT` refers exclusively to the top-level node, and the metrics it contains are measured over the entire query.
The `OPERATOR` nodes refer to the individual operators in the query plan.
Some metrics are only available for `QUERY_ROOT` nodes, while others are only for `OPERATOR` nodes.
The table below describes each metric and which nodes they are available for.

Other than `OPERATOR_TYPE`, all metrics can be turned on or off.

#### All Metrics {#docs:current:dev:metrics::all-metrics}

| Name                                                                  | Group                                 | Description                                                                |
|-----------------------------------------------------------------------|---------------------------------------|----------------------------------------------------------------------------|
| [`CPU_TIME`](#::cpu_time)                                               | [core](#::core-metrics)                 | CPU time spent on the query                                                |
| [`CUMULATIVE_CARDINALITY`](#::cumulative_cardinality)                   | [core](#::core-metrics)                 | Cumulative cardinality of the query                                        |
| [`CUMULATIVE_ROWS_SCANNED`](#::cumulative_rows_scanned)                 | [core](#::core-metrics)                 | Cumulative number of rows scanned by the query                             |
| [`EXTRA_INFO`](#::extra_info)                                           | [core](#::core-metrics)                 | Unique operator metrics                                                    |
| [`LATENCY`](#::latency)                                                 | [core](#::core-metrics)                 | Time spent executing the entire query                                      |
| [`QUERY_NAME`](#::query_name)                                           | [core](#::core-metrics)                 | The SQL string of the query                                                |
| [`RESULT_SET_SIZE`](#::result_set_size)                                 | [core](#::core-metrics)                 | The size of the result                                                     |
| [`ROWS_RETURNED`](#::rows_returned)                                     | [core](#::core-metrics)                 | The number of rows returned by the query                                   |
| [`BLOCKED_THREAD_TIME`](#::blocked_thread_time)                         | [execution](#::execution-metrics)       | Time spent waiting for a thread to become available                        |
| [`SYSTEM_PEAK_BUFFER_MEMORY`](#::system_peak_buffer_memory)             | [execution](#::execution-metrics)       | Peak memory usage of the system                                            |
| [`SYSTEM_PEAK_TEMP_DIR_SIZE`](#::system_peak_temp_dir_size)             | [execution](#::execution-metrics)       | Peak size of the temporary directory                                       |
| [`TOTAL_MEMORY_ALLOCATED`](#::total_memory_allocated)                   | [execution](#::execution-metrics)       | The total memory allocated by the buffer manager.                          |
| [`ATTACH_LOAD_STORAGE_LATENCY`](#::attach_load_storage_latency)         | [file](#::file-metrics)                 | Time spent loading from storage.                                           |
| [`ATTACH_REPLAY_WAL_LATENCY`](#::attach_replay_wal_latency)             | [file](#::file-metrics)                 | Time spent replaying the WAL file.                                         |
| [`CHECKPOINT_LATENCY`](#::checkpoint_latency)                           | [file](#::file-metrics)                 | Time spent running checkpoints                                             |
| [`COMMIT_LOCAL_STORAGE_LATENCY`](#::commit_local_storage_latency)       | [file](#::file-metrics)                 | Time spent committing the transaction-local storage.                       |
| [`TOTAL_BYTES_READ`](#::total_bytes_read)                               | [file](#::file-metrics)                 | The total bytes read by the file system.                                   |
| [`TOTAL_BYTES_WRITTEN`](#::total_bytes_written)                         | [file](#::file-metrics)                 | The total bytes written by the file system.                                |
| [`WAITING_TO_ATTACH_LATENCY`](#::waiting_to_attach_latency)             | [file](#::file-metrics)                 | Time spent waiting to ATTACH a file.                                       |
| [`WAL_REPLAY_ENTRY_COUNT`](#::wal_replay_entry_count)                   | [file](#::file-metrics)                 | The total number of entries to replay in the WAL.                          |
| [`WRITE_TO_WAL_LATENCY`](#::write_to_wal_latency)                       | [file](#::file-metrics)                 | Time spent writing to the WAL.                                             |
| [`ALL_OPTIMIZERS`](#::all_optimizers)                                   | [phase_timing](#::phase_timing-metrics) | Enables all optimizers                                                     |
| [`CUMULATIVE_OPTIMIZER_TIMING`](#::cumulative_optimizer_timing)         | [phase_timing](#::phase_timing-metrics) | Time spent in all optimizers                                               |
| [`PHYSICAL_PLANNER`](#::physical_planner)                               | [phase_timing](#::phase_timing-metrics) | The time spent generating the physical plan                                |
| [`PHYSICAL_PLANNER_COLUMN_BINDING`](#::physical_planner_column_binding) | [phase_timing](#::phase_timing-metrics) | The time spent binding the columns in the logical plan to physical columns |
| [`PHYSICAL_PLANNER_CREATE_PLAN`](#::physical_planner_create_plan)       | [phase_timing](#::phase_timing-metrics) | The time spent creating the physical plan                                  |
| [`PHYSICAL_PLANNER_RESOLVE_TYPES`](#::physical_planner_resolve_types)   | [phase_timing](#::phase_timing-metrics) | The time spent resolving the types in the logical plan to physical types   |
| [`PLANNER`](#::planner)                                                 | [phase_timing](#::phase_timing-metrics) | The time to generate the logical plan from the parsed SQL nodes.           |
| [`PLANNER_BINDING`](#::planner_binding)                                 | [phase_timing](#::phase_timing-metrics) | The time taken to bind the logical plan.                                   |
| [`OPERATOR_CARDINALITY`](#::operator_cardinality)                       | [operator](#::operator-metrics)         | Cardinality of the operator                                                |
| [`OPERATOR_NAME`](#::operator_name)                                     | [operator](#::operator-metrics)         | Name of the operator                                                       |
| [`OPERATOR_ROWS_SCANNED`](#::operator_rows_scanned)                     | [operator](#::operator-metrics)         | Number of rows scanned by the operator                                     |
| [`OPERATOR_TIMING`](#::operator_timing)                                 | [operator](#::operator-metrics)         | Time spent in the operator                                                 |
| [`OPERATOR_TYPE`](#::operator_type)                                     | [operator](#::operator-metrics)         | Type of the operator                                                       |



#### Metric Groups {#docs:current:dev:metrics::metric-groups}

The metrics are organized into groups, which can be used to enable or disable related metrics together.
The following is a list of the available metric groups:
- `ALL`: All metrics
- `DEFAULT`: The default set of metrics
- [`CORE`](#::core-metrics)
- [`EXECUTION`](#::execution-metrics)
- [`FILE`](#::file-metrics)
- [`OPERATOR`](#::operator-metrics)
- [`OPTIMIZER`](#::optimizer-metrics)
- [`PHASE_TIMING`](#::phase_timing-metrics)


##### Core Metrics {#docs:current:dev:metrics::core-metrics}

core metrics


###### `CPU_TIME` {#docs:current:dev:metrics::cpu_time}



|   |   |
|:--|:--------|
| **Description** |CPU time spent on the query |
| **Type** | double |
| **Unit** | seconds |
| **Default** | ✅ |
| **Query Node** | ✅ |
| **Operator Node** | ✅ |
| **[Cumulative](#::cumulative-metrics)** | ✅ |
| **Child** | OPERATOR_TIMING |

**Note:**

`CPU_TIME` measures the cumulative operator timings.
It does not include time spent in other stages, like parsing, query planning, etc.
Thus, for some queries, the `LATENCY` in the `QUERY_ROOT` can be greater than the `CPU_TIME`.



###### `CUMULATIVE_CARDINALITY` {#docs:current:dev:metrics::cumulative_cardinality}



|   |   |
|:--|:--------|
| **Description** |Cumulative cardinality of the query |
| **Type** | uint64 |
| **Unit** | absolute |
| **Default** | ✅ |
| **Query Node** | ✅ |
| **Operator Node** | ✅ |
| **[Cumulative](#::cumulative-metrics)** | ✅ |
| **Child** | OPERATOR_CARDINALITY |


###### `CUMULATIVE_ROWS_SCANNED` {#docs:current:dev:metrics::cumulative_rows_scanned}



|   |   |
|:--|:--------|
| **Description** |Cumulative number of rows scanned by the query |
| **Type** | uint64 |
| **Unit** | absolute |
| **Default** | ✅ |
| **Query Node** | ✅ |
| **Operator Node** | ✅ |
| **[Cumulative](#::cumulative-metrics)** | ✅ |
| **Child** | OPERATOR_ROWS_SCANNED |


###### `EXTRA_INFO` {#docs:current:dev:metrics::extra_info}



|   |   |
|:--|:--------|
| **Description** |Unique operator metrics |
| **Type** | Value::MAP |
| **Default** | ✅ |
| **Query Node** | ✅ |
| **Operator Node** | ✅ |


###### `LATENCY` {#docs:current:dev:metrics::latency}



|   |   |
|:--|:--------|
| **Description** |Time spent executing the entire query |
| **Type** | double |
| **Unit** | seconds |
| **Default** | ✅ |
| **Query Node** | ✅ |


###### `QUERY_NAME` {#docs:current:dev:metrics::query_name}



|   |   |
|:--|:--------|
| **Description** |The SQL string of the query |
| **Type** | string |
| **Default** | ✅ |
| **Query Node** | ✅ |


###### `RESULT_SET_SIZE` {#docs:current:dev:metrics::result_set_size}



|   |   |
|:--|:--------|
| **Description** |The size of the result |
| **Type** | uint64 |
| **Unit** | bytes |
| **Default** | ✅ |
| **Query Node** | ✅ |
| **Operator Node** | ✅ |
| **Child** | RESULT_SET_SIZE |


###### `ROWS_RETURNED` {#docs:current:dev:metrics::rows_returned}



|   |   |
|:--|:--------|
| **Description** |The number of rows returned by the query |
| **Type** | uint64 |
| **Unit** | absolute |
| **Default** | ✅ |
| **Query Node** | ✅ |
| **Child** | OPERATOR_CARDINALITY |


##### Execution Metrics {#docs:current:dev:metrics::execution-metrics}

Metrics that are collected during query execution


###### `BLOCKED_THREAD_TIME` {#docs:current:dev:metrics::blocked_thread_time}



|   |   |
|:--|:--------|
| **Description** |Time spent waiting for a thread to become available |
| **Type** | double |
| **Unit** | seconds |
| **Default** | ✅ |
| **Query Node** | ✅ |


###### `SYSTEM_PEAK_BUFFER_MEMORY` {#docs:current:dev:metrics::system_peak_buffer_memory}



|   |   |
|:--|:--------|
| **Description** |Peak memory usage of the system |
| **Type** | uint64 |
| **Unit** | bytes |
| **Default** | ✅ |
| **Query Node** | ✅ |
| **Operator Node** | ✅ |


###### `SYSTEM_PEAK_TEMP_DIR_SIZE` {#docs:current:dev:metrics::system_peak_temp_dir_size}



|   |   |
|:--|:--------|
| **Description** |Peak size of the temporary directory |
| **Type** | uint64 |
| **Unit** | bytes |
| **Default** | ✅ |
| **Query Node** | ✅ |
| **Operator Node** | ✅ |


###### `TOTAL_MEMORY_ALLOCATED` {#docs:current:dev:metrics::total_memory_allocated}



|   |   |
|:--|:--------|
| **Description** |The total memory allocated by the buffer manager. |
| **Type** | uint64 |
| **Unit** | bytes |
| **Default** | ✅ |
| **Query Node** | ✅ |


##### File Metrics {#docs:current:dev:metrics::file-metrics}

metrics that are collected during file operations


###### `ATTACH_LOAD_STORAGE_LATENCY` {#docs:current:dev:metrics::attach_load_storage_latency}



|   |   |
|:--|:--------|
| **Description** |Time spent loading from storage. |
| **Type** | double |
| **Unit** | seconds |
| **Default** | ✅ |
| **Query Node** | ✅ |


###### `ATTACH_REPLAY_WAL_LATENCY` {#docs:current:dev:metrics::attach_replay_wal_latency}



|   |   |
|:--|:--------|
| **Description** |Time spent replaying the WAL file. |
| **Type** | double |
| **Unit** | seconds |
| **Default** | ✅ |
| **Query Node** | ✅ |


###### `CHECKPOINT_LATENCY` {#docs:current:dev:metrics::checkpoint_latency}



|   |   |
|:--|:--------|
| **Description** |Time spent running checkpoints |
| **Type** | double |
| **Unit** | seconds |
| **Default** | ✅ |
| **Query Node** | ✅ |


###### `COMMIT_LOCAL_STORAGE_LATENCY` {#docs:current:dev:metrics::commit_local_storage_latency}



|   |   |
|:--|:--------|
| **Description** |Time spent committing the transaction-local storage. |
| **Type** | double |
| **Unit** | seconds |
| **Default** | ✅ |
| **Query Node** | ✅ |


###### `TOTAL_BYTES_READ` {#docs:current:dev:metrics::total_bytes_read}



|   |   |
|:--|:--------|
| **Description** |The total bytes read by the file system. |
| **Type** | uint64 |
| **Unit** | bytes |
| **Default** | ✅ |
| **Query Node** | ✅ |


###### `TOTAL_BYTES_WRITTEN` {#docs:current:dev:metrics::total_bytes_written}



|   |   |
|:--|:--------|
| **Description** |The total bytes written by the file system. |
| **Type** | uint64 |
| **Unit** | bytes |
| **Default** | ✅ |
| **Query Node** | ✅ |


###### `WAITING_TO_ATTACH_LATENCY` {#docs:current:dev:metrics::waiting_to_attach_latency}



|   |   |
|:--|:--------|
| **Description** |Time spent waiting to ATTACH a file. |
| **Type** | double |
| **Unit** | seconds |
| **Default** | ✅ |
| **Query Node** | ✅ |


###### `WAL_REPLAY_ENTRY_COUNT` {#docs:current:dev:metrics::wal_replay_entry_count}



|   |   |
|:--|:--------|
| **Description** |The total number of entries to replay in the WAL. |
| **Type** | uint64 |
| **Unit** | absolute |
| **Default** | ✅ |
| **Query Node** | ✅ |


###### `WRITE_TO_WAL_LATENCY` {#docs:current:dev:metrics::write_to_wal_latency}



|   |   |
|:--|:--------|
| **Description** |Time spent writing to the WAL. |
| **Type** | double |
| **Unit** | seconds |
| **Default** | ✅ |
| **Query Node** | ✅ |


##### Operator Metrics {#docs:current:dev:metrics::operator-metrics}

metrics that are collected for each operator


###### `OPERATOR_CARDINALITY` {#docs:current:dev:metrics::operator_cardinality}



|   |   |
|:--|:--------|
| **Description** |Cardinality of the operator |
| **Type** | uint64 |
| **Unit** | absolute |
| **Default** | ✅ |
| **Operator Node** | ✅ |


###### `OPERATOR_NAME` {#docs:current:dev:metrics::operator_name}



|   |   |
|:--|:--------|
| **Description** |Name of the operator |
| **Type** | string |
| **Default** | ✅ |
| **Operator Node** | ✅ |


###### `OPERATOR_ROWS_SCANNED` {#docs:current:dev:metrics::operator_rows_scanned}



|   |   |
|:--|:--------|
| **Description** |Number of rows scanned by the operator |
| **Type** | uint64 |
| **Unit** | absolute |
| **Default** | ✅ |
| **Operator Node** | ✅ |


###### `OPERATOR_TIMING` {#docs:current:dev:metrics::operator_timing}



|   |   |
|:--|:--------|
| **Description** |Time spent in the operator |
| **Type** | double |
| **Unit** | seconds |
| **Default** | ✅ |
| **Operator Node** | ✅ |


###### `OPERATOR_TYPE` {#docs:current:dev:metrics::operator_type}



|   |   |
|:--|:--------|
| **Description** |Type of the operator |
| **Type** | uint8 |
| **Default** | ✅ |
| **Operator Node** | ✅ |


##### Phase_timing Metrics {#docs:current:dev:metrics::phase_timing-metrics}

This group contains metrics related to the planner and the physical planner. The planner is responsible for generating the logical plan, whereas the physical planner is responsible for generating the physical plan from the logical plan.


###### `ALL_OPTIMIZERS` {#docs:current:dev:metrics::all_optimizers}



|   |   |
|:--|:--------|
| **Description** |Enables all optimizers |
| **Type** | double |
| **Query Node** | ✅ |


###### `CUMULATIVE_OPTIMIZER_TIMING` {#docs:current:dev:metrics::cumulative_optimizer_timing}



|   |   |
|:--|:--------|
| **Description** |Time spent in all optimizers |
| **Type** | double |
| **Unit** | milliseconds |
| **Query Node** | ✅ |
| **[Cumulative](#::cumulative-metrics)** | ✅ |


###### `PHYSICAL_PLANNER` {#docs:current:dev:metrics::physical_planner}



|   |   |
|:--|:--------|
| **Description** |The time spent generating the physical plan |
| **Type** | double |
| **Unit** | milliseconds |
| **Query Node** | ✅ |


###### `PHYSICAL_PLANNER_COLUMN_BINDING` {#docs:current:dev:metrics::physical_planner_column_binding}



|   |   |
|:--|:--------|
| **Description** |The time spent binding the columns in the logical plan to physical columns |
| **Type** | double |
| **Unit** | milliseconds |
| **Query Node** | ✅ |


###### `PHYSICAL_PLANNER_CREATE_PLAN` {#docs:current:dev:metrics::physical_planner_create_plan}



|   |   |
|:--|:--------|
| **Description** |The time spent creating the physical plan |
| **Type** | double |
| **Unit** | milliseconds |
| **Query Node** | ✅ |


###### `PHYSICAL_PLANNER_RESOLVE_TYPES` {#docs:current:dev:metrics::physical_planner_resolve_types}



|   |   |
|:--|:--------|
| **Description** |The time spent resolving the types in the logical plan to physical types |
| **Type** | double |
| **Unit** | milliseconds |
| **Query Node** | ✅ |


###### `PLANNER` {#docs:current:dev:metrics::planner}



|   |   |
|:--|:--------|
| **Description** |The time to generate the logical plan from the parsed SQL nodes. |
| **Type** | double |
| **Unit** | milliseconds |
| **Query Node** | ✅ |


###### `PLANNER_BINDING` {#docs:current:dev:metrics::planner_binding}



|   |   |
|:--|:--------|
| **Description** |The time taken to bind the logical plan. |
| **Type** | double |
| **Unit** | milliseconds |
| **Query Node** | ✅ |



##### Optimizer Metrics {#docs:current:dev:metrics::optimizer-metrics}

Optimizer metrics sit at the `QUERY_ROOT` level, and measure the time taken by each [optimizer](#docs:current:internals:overview::optimizer).
These metrics are only available when the specific optimizer is enabled.
The available optimizations can be queried using the [`duckdb_optimizers()`{:.language-sql .highlight} table function](#docs:current:sql:meta:duckdb_table_functions::duckdb_optimizers).

Each optimizer has a corresponding metric that follows the template: `OPTIMIZER_⟨OPTIMIZER_NAME⟩`{:.language-sql .highlight}.
For example, the `OPTIMIZER_JOIN_ORDER` metric corresponds to the `JOIN_ORDER` optimizer.

Additionally, the following metrics are available to support the optimizer metrics:
- [`ALL_OPTIMIZERS`](#::all_optimizers)
- [`CUMULATIVE_OPTIMIZER_TIMING`](#::cumulative_optimizer_timing)


#### Cumulative Metrics {#docs:current:dev:metrics::cumulative-metrics}

DuckDB also supports several cumulative metrics that are available in all nodes.
In the `QUERY_ROOT` node, these metrics represent the sum of the corresponding metrics across all operators in the query.
The `OPERATOR` nodes represent the sum of the operator's specific metric and those of all its children recursively.

These cumulative metrics can be enabled independently, even if the underlying specific metrics are disabled.

The following is a list of the available cumulative metrics:
- [`CPU_TIME`](#::cpu_time)
- [`CUMULATIVE_CARDINALITY`](#::cumulative_cardinality)
- [`CUMULATIVE_ROWS_SCANNED`](#::cumulative_rows_scanned)
- [`CUMULATIVE_OPTIMIZER_TIMING`](#::cumulative_optimizer_timing)


#### Examples {#docs:current:dev:metrics::examples}

The following examples demonstrate how to enable custom profiling and set the output format to `json`.
In the first example, we enable profiling and set the output to a file.
We only enable `EXTRA_INFO`, `OPERATOR_CARDINALITY`, and `OPERATOR_TIMING`.

```sql
CREATE TABLE students (name VARCHAR, sid INTEGER);
CREATE TABLE exams (eid INTEGER, subject VARCHAR, sid INTEGER);
INSERT INTO students VALUES ('Mark', 1), ('Joe', 2), ('Matthew', 3);
INSERT INTO exams VALUES (10, 'Physics', 1), (20, 'Chemistry', 2), (30, 'Literature', 3);

PRAGMA enable_profiling = 'json';
PRAGMA profiling_output = '/path/to/file.json';

PRAGMA configure_profiling = '{"CPU_TIME": "false", "EXTRA_INFO": "true", "OPERATOR_CARDINALITY": "true", "OPERATOR_TIMING": "true"}';

SELECT name
FROM students
JOIN exams USING (sid)
WHERE name LIKE 'Ma%';
```

The file's content after executing the query:

```json
{
    "extra_info": {},
    "query_name": "SELECT name\nFROM students\nJOIN exams USING (sid)\nWHERE name LIKE 'Ma%';",
    "children": [
        {
            "operator_timing": 0.000001,
            "operator_cardinality": 2,
            "operator_type": "PROJECTION",
            "extra_info": {
                "Projections": "name",
                "Estimated Cardinality": "1"
            },
            "children": [
                {
                    "extra_info": {
                        "Join Type": "INNER",
                        "Conditions": "sid = sid",
                        "Build Min": "1",
                        "Build Max": "3",
                        "Estimated Cardinality": "1"
                    },
                    "operator_cardinality": 2,
                    "operator_type": "HASH_JOIN",
                    "operator_timing": 0.00023899999999999998,
                    "children": [
...
```

The second example adds detailed metrics to the output.

```sql
PRAGMA profiling_mode = 'detailed';

SELECT name
FROM students
JOIN exams USING (sid)
WHERE name LIKE 'Ma%';
```

The contents of the outputted file:

```json
{
  "all_optimizers": 0.001413,
  "cumulative_optimizer_timing": 0.0014120000000000003,
  "planner": 0.000873,
  "planner_binding": 0.000869,
  "physical_planner": 0.000236,
  "physical_planner_column_binding": 0.000005,
  "physical_planner_resolve_types": 0.000001,
  "physical_planner_create_plan": 0.000226,
  "optimizer_expression_rewriter": 0.000029,
  "optimizer_filter_pullup": 0.000002,
  "optimizer_filter_pushdown": 0.000102,
...
  "optimizer_column_lifetime": 0.000009999999999999999,
  "rows_returned": 2,
  "latency": 0.003708,
  "cumulative_rows_scanned": 6,
  "cumulative_cardinality": 11,
  "extra_info": {},
  "cpu_time": 0.000095,
  "optimizer_build_side_probe_side": 0.000017,
  "result_set_size": 32,
  "blocked_thread_time": 0.0,
  "query_name": "SELECT name\nFROM students\nJOIN exams USING (sid)\nWHERE name LIKE 'Ma%';",
  "children": [
    {
      "operator_timing": 0.000001,
      "operator_rows_scanned": 0,
      "cumulative_rows_scanned": 6,
      "operator_cardinality": 2,
      "operator_type": "PROJECTION",
      "cumulative_cardinality": 11,
      "extra_info": {
        "Projections": "name",
        "Estimated Cardinality": "1"
      },
      "result_set_size": 32,
      "cpu_time": 0.000095,
      "children": [
...
```

## Profiling {#docs:current:dev:profiling}

Profiling is essential to help understand why certain queries exhibit specific performance characteristics.
DuckDB contains several built-in features to enable query profiling, which this page covers.
For a high-level example of using `EXPLAIN`, see the [“Inspect Query Plans” page](#docs:current:guides:meta:explain).

#### Statements {#docs:current:dev:profiling::statements}

##### The `EXPLAIN` Statement {#docs:current:dev:profiling::the-explain-statement}

The first step to profiling a query can include examining the query plan.
The [`EXPLAIN`](#docs:current:guides:meta:explain) statement shows the query plan and describes what is going on under the hood.

##### The `EXPLAIN ANALYZE` Statement {#docs:current:dev:profiling::the-explain-analyze-statement}

The query plan helps developers understand the performance characteristics of the query.
However, it is often also necessary to examine the performance numbers of individual operators and the cardinalities that pass through them.
The [`EXPLAIN ANALYZE`](#docs:current:guides:meta:explain_analyze) statement enables obtaining these, as it pretty-prints the query plan and also executes the query.
Thus, it provides the actual run-time performance numbers.

##### The `FORMAT` Option {#docs:current:dev:profiling::the-format-option}

The `EXPLAIN [ANALYZE]` statement allows exporting to several formats:

* `text` – default ASCII-art style output
* `graphviz` – produces a DOT output, which can be rendered with [Graphviz](https://graphviz.org/)
* `html` – produces an HTML output, which can be rendered with [treeflex](https://dumptyd.github.io/treeflex/)
* `json` – produces a JSON output
* `mermaid` – produces a [Mermaid](https://mermaid.js.org/) flowchart

To specify a format, use the `FORMAT` tag:

```sql
EXPLAIN (FORMAT html) SELECT 42 AS x;
```

#### Pragmas {#docs:current:dev:profiling::pragmas}

DuckDB supports several pragmas for turning profiling on and off and controlling the level of detail in the profiling output.

The following pragmas are available and can be set using either `PRAGMA` or `SET`.
They can also be reset using `RESET`, followed by the setting name.
For more information, see the [“Profiling”](#docs:current:configuration:pragmas::profiling) section of the pragmas page.

| Setting                                                                                                                                                                            | Description                                     | Default                                                  | Options                                                                                                                                                            |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------|----------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [`enable_profiling`](#docs:current:configuration:pragmas::enable_profiling), [`enable_profile`](#docs:current:configuration:pragmas::enable_profiling)     | Turn on profiling                               | `query_tree`                                             | `query_tree`, `json`, `query_tree_optimizer`, `no_output`                                                                                                          |
| [`profiling_coverage`](#docs:current:configuration:pragmas::profiling_coverage)                                                                                        | Set the operators to profile                    | `SELECT`                                                 | `SELECT`, `ALL`                                                                                                                                                    |
| [`profiling_output`](#docs:current:configuration:pragmas::profiling_output)                                                                                            | Set a profiling output file                     | Console                                                  | A filepath                                                                                                                                                         |
| [`profiling_mode`](#docs:current:configuration:pragmas::profiling_mode)                                                                                                | Toggle additional optimizer and planner metrics | `standard`                                               | `standard`, `detailed`                                                                                                                                             |
| [`configure_profiling`](#docs:current:configuration:pragmas::custom_profiling_metrics)                                                                                 | Enable or disable specific metrics              | All metrics except those activated by detailed profiling | A JSON object that matches the following: `{"METRIC_NAME": "boolean", ...}`. ([List of all available metrics](#docs:current:dev:metrics::all_metrics)) |
| [`disable_profiling`](#docs:current:configuration:pragmas::disable_profiling), [`disable_profile`](#docs:current:configuration:pragmas::disable_profiling) | Turn off profiling                              |                                                          |                                                                                                                                                                    |

#### Table Functions {#docs:current:dev:profiling::table-functions}

> These table functions were introduced in DuckDB v1.5.0.

DuckDB provides table functions to enable and disable profiling, consolidating multiple settings into a single call.

##### `enable_profiling()` {#docs:current:dev:profiling::enable_profiling}

The `enable_profiling()` function configures profiling with the specified options.

```sql
CALL enable_profiling(
    format := 'json',
    save_location := '/path/to/output.json',
    coverage := 'select',
    mode := 'standard',
    metrics := ['QUERY_NAME', 'LATENCY', 'OPERATOR_TIMING']
);
```

| Parameter | Type | Description |
|-----------|------|-------------|
| `metrics` | `LIST`, `STRUCT`, or JSON | Specifies which metrics to enable |
| `mode` | `VARCHAR` | Profiling level: `'standard'` or `'detailed'` |
| `save_location` | `VARCHAR` | File path for profiling output |
| `coverage` | `VARCHAR` | Query coverage: `'select'` or `'all'` |
| `format` | `VARCHAR` | Output format: `'query_tree'`, `'json'`, `'query_tree_optimizer'`, `'no_output'` |

All parameters are optional and named. You can also pass metrics as an unnamed parameter:

```sql
CALL enable_profiling(['LATENCY', 'RESULT_SET_SIZE']);
```

##### `disable_profiling()` {#docs:current:dev:profiling::disable_profiling}

The `disable_profiling()` function turns off profiling.

```sql
CALL disable_profiling();
```

#### Metrics {#docs:current:dev:profiling::metrics}

DuckDB supports a wide range of metrics that can be enabled or disabled independently. To learn more and to see the full list of available metrics, refer to the [metrics documentation](#docs:current:dev:metrics::all_metrics).

#### Detailed Profiling {#docs:current:dev:profiling::detailed-profiling}

When the `profiling_mode` is set to `detailed`, an extra set of metrics are enabled, which are only available in the `QUERY_ROOT` node.
These include all the metrics in the [Phase timing](#docs:current:dev:metrics::phase_timing_metrics) metric group.
It is possible to toggle each of these additional metrics individually.

#### Query Graphs {#docs:current:dev:profiling::query-graphs}

It is also possible to render the profiling output as a query graph.
The query graph visually represents the query plan, showing the operators and their relationships.
The query plan must be output in the `json` format and stored in a file.
After writing a profiling output to its designated file, the Python script can render it as a query graph.
The script requires the `duckdb` Python module to be installed.
It generates an HTML file and opens it in your web browser.

```batch
python -m duckdb.query_graph /path/to/file.json
```

#### Notation in Query Plans {#docs:current:dev:profiling::notation-in-query-plans}

In query plans, the [hash join](https://en.wikipedia.org/wiki/Hash_join) operators adhere to the following convention:
the _probe side_ of the join is the left operand, while the _build side_ is the right operand.

Join operators in the query plan show the join type used:

* Inner joins are denoted as `INNER`.
* Left outer joins and right outer joins are denoted as `LEFT` and `RIGHT`, respectively.
* Full outer joins are denoted as `FULL`.

> **Tip.** To visualize query plans, consider using the [DuckDB execution plan visualizer](https://db.cs.uni-tuebingen.de/explain/) developed by the [Database Systems Research Group at the University of Tübingen](https://github.com/DBatUTuebingen).

## Building DuckDB {#dev:building}

### Building DuckDB from Source {#docs:current:dev:building:overview}

#### When Should You Build DuckDB? {#docs:current:dev:building:overview::when-should-you-build-duckdb}

DuckDB binaries are available for _stable_ and _preview_ builds on the [installation page](https://duckdb.org/install).
In most cases, it's recommended to use these binaries.
When you are running on an experimental platform (e.g., [Raspberry Pi](#docs:current:dev:building:raspberry_pi)) or you would like to build the project for an unmerged pull request,
you can build DuckDB from source based on the [`duckdb/duckdb` repository hosted on GitHub](https://github.com/duckdb/duckdb/).
This page explains the steps for building DuckDB.

#### Prerequisites {#docs:current:dev:building:overview::prerequisites}

DuckDB needs CMake and a C++11-compliant compiler (e.g., GCC, Apple-Clang, MSVC).
Additionally, we recommend using the [Ninja build system](https://ninja-build.org/), which automatically parallelizes the build process.

#### Getting Started {#docs:current:dev:building:overview::getting-started}

A `Makefile` wraps the build process.
See [Build Configuration](#docs:current:dev:building:build_configuration) for targets and configuration flags.

```bash
make
make release # same as plain make
make relassert
make debug
GEN=ninja make # for use with Ninja
BUILD_BENCHMARK=1 make # build with benchmarks
```

> `debug` buids use a lot of disk space – make sure you have at least 25 GB available.

#### Platforms {#docs:current:dev:building:overview::platforms}

##### Platforms with Full Support {#docs:current:dev:building:overview::platforms-with-full-support}

DuckDB fully supports Linux, macOS and Windows. Both x86_64 (amd64) and AArch64 (ARM64) builds are available for these platforms, and almost all extensions are distributed for these platforms.

| Platform name      | Description                                                            |
|--------------------|------------------------------------------------------------------------|
| `linux_amd64`      | Linux x86_64 (AMD64) with [glibc](https://www.gnu.org/software/libc/)  |
| `linux_arm64`      | Linux AArch64 (ARM64) with [glibc](https://www.gnu.org/software/libc/) |
| `osx_amd64`        | macOS 12+ AMD64 (Intel CPUs)                                           |
| `osx_arm64`        | macOS 12+ ARM64 (Apple Silicon CPUs)                                   |
| `windows_amd64`    | Windows 10+ x86_64 (AMD64)                                             |
| `windows_arm64`    | Windows 10+ AArch64 (ARM64)                                            |

For these platforms, builds are available for both the latest stable version and the preview version (nightly build).
In some circumstances, you may still want to build DuckDB from source, e.g., to test an unmerged [pull request](https://github.com/duckdb/duckdb/pulls).
For build instructions on these platforms, see:

* [Linux](#docs:current:dev:building:linux)
* [macOS](#docs:current:dev:building:macos)
* [Windows](#docs:current:dev:building:windows)

##### Platforms with Partial Support {#docs:current:dev:building:overview::platforms-with-partial-support}

There are several partially supported platforms.
For some platforms, DuckDB binaries and extensions (or a [subset of extensions](#docs:current:extensions:extension_distribution::platforms)) are distributed.
For others, building from source is possible.

| Platform name          | Description                                                                                          |
|------------------------|------------------------------------------------------------------------------------------------------|
| `linux_amd64_musl`     | Linux x86_64 (AMD64) with [musl libc](https://musl.libc.org/), e.g., Alpine Linux                    |
| `linux_arm64_musl`     | Linux AArch64 (ARM64) with [musl libc](https://musl.libc.org/), e.g., Alpine Linux                   |
| `linux_arm64_android`  | Android AArch64 (ARM64)                                                                              |
| `wasm_eh`              | WebAssembly Exception Handling                                                                       |

Below, we provide detailed build instructions for some platforms:

* [Android](#docs:current:dev:building:android)
* [Raspberry Pi](#docs:current:dev:building:raspberry_pi)

##### Platforms with Best Effort Support {#docs:current:dev:building:overview::platforms-with-best-effort-support}

| Platform name          | Description                                                                                          |
|------------------------|------------------------------------------------------------------------------------------------------|
| `freebsd_amd64`        | FreeBSD x86_64 (AMD64)                                                                               |
| `freebsd_arm64`        | FreeBSD AArch64 (ARM64)                                                                              |
| `wasm_mvp`             | WebAssembly Minimum Viable Product                                                                   |
| `windows_amd64_mingw`  | Windows 10+ x86_64 (AMD64) with MinGW                                                                |
| `windows_arm64_mingw`  | Windows 10+ AArch64 (ARM64) with MinGW                                                               |

> These platforms are not covered by DuckDB's community support. For details on commercial support, see the [support policy page](https://duckdblabs.com/community_support_policy#platforms).

See also the [“Unofficial and Unsupported Platforms” page](#docs:current:dev:building:unofficial_and_unsupported_platforms) for details.

##### Outdated Platforms {#docs:current:dev:building:overview::outdated-platforms}

Some platforms were supported in older DuckDB versions but are no longer supported.

| Platform name          | Description                                                                                          |
|------------------------|------------------------------------------------------------------------------------------------------|
| `linux_amd64_gcc4`     | Linux x86_64 (AMD64) with GCC 4, e.g., CentOS 7                                                      |
| `linux_arm64_gcc4`     | Linux AArch64 (ARM64) with GCC 4, e.g., CentOS 7                                                     |
| `windows_amd64_rtools` | Windows 10+ x86_64 (AMD64) for [RTools](https://cran.r-project.org/bin/windows/Rtools/)              |

DuckDB can also be built for end-of-life platforms such as [macOS 11](https://endoflife.date/macos) and [CentOS 7/8](https://endoflife.date/centos) using the instructions provided for macOS and Linux.

#### Amalgamation Build {#docs:current:dev:building:overview::amalgamation-build}

DuckDB can be built as a single pair of C++ header and source code files (` duckdb.hpp` and `duckdb.cpp`) with approximately 0.5M lines of code.
To generate this file, run:

```bash
python scripts/amalgamation.py
```

Note that amalgamation build is provided on a best-effort basis and is not officially supported.

#### Limitations {#docs:current:dev:building:overview::limitations}

Currently, DuckDB has the following known compile-time limitations:

* The `-march=native` build flag, i.e., compiling DuckDB with the local machine's native instructions set, is not supported.

#### Troubleshooting Guides {#docs:current:dev:building:overview::troubleshooting-guides}

We provide troubleshooting guides for building DuckDB:

* [Generic issues](#docs:current:dev:building:troubleshooting)
* [Python](#docs:current:dev:building:python)
* [R](#docs:current:dev:building:r)

### Building Configuration {#docs:current:dev:building:build_configuration}

#### Build Types {#docs:current:dev:building:build_configuration::build-types}

DuckDB can be built in many different settings, most of these correspond directly to CMake but not all of them.

##### `release` {#docs:current:dev:building:build_configuration::release}

This build has been stripped of all the assertions and debug symbols and code, optimized for performance.

##### `debug` {#docs:current:dev:building:build_configuration::debug}

This build runs with all the debug information, including symbols, assertions and `#ifdef DEBUG` blocks.
Due to these, binaries of this build are expected to be slow.
Note: the special debug defines are not automatically set for this build.

##### `relassert` {#docs:current:dev:building:build_configuration::relassert}

This build does not trigger the `#ifdef DEBUG` code blocks but it still has debug symbols that make it possible to step through the execution with line number information and `D_ASSERT` lines are still checked in this build.
Binaries of this build mode are significantly faster than those of the `debug` mode.

##### `reldebug` {#docs:current:dev:building:build_configuration::reldebug}

This build is similar to `relassert` in many ways, only assertions are also stripped in this build.

##### `benchmark` {#docs:current:dev:building:build_configuration::benchmark}

This build is a shorthand for `release` with `BUILD_BENCHMARK=1` set.

##### `tidy-check` {#docs:current:dev:building:build_configuration::tidy-check}

This creates a build and then runs [Clang-Tidy](https://clang.llvm.org/extra/clang-tidy/) to check for issues or style violations through static analysis.
The CI will also run this check, causing it to fail if this check fails.

##### `format-fix` | `format-changes` | `format-main` {#docs:current:dev:building:build_configuration::format-fix--format-changes--format-main}

This doesn't actually create a build, but uses the following format checkers to check for style issues:

* [clang-format](https://clang.llvm.org/docs/ClangFormat.html) to fix format issues in the code.
* [cmake-format](https://cmake-format.readthedocs.io/en/latest/) to fix format issues in the `CMakeLists.txt` files.

The CI will also run this check, causing it to fail if this check fails.

#### Extension Selection {#docs:current:dev:building:build_configuration::extension-selection}

[Core DuckDB extensions](#docs:current:core_extensions:overview) are the ones maintained by the DuckDB team. These are hosted in the `duckdb` GitHub organization and are served by the `core` extension repository.

Additional extensions can be built as part of DuckDB via the `BUILD_EXTENSIONS` flag, then listing the names of the extensions that are to be built.

```batch
BUILD_EXTENSIONS='tpch;httpfs;fts;json;parquet' make
```

More on this topic at [building DuckDB extensions](#docs:current:dev:building:building_extensions).

#### Package Flags {#docs:current:dev:building:build_configuration::package-flags}

For every package that is maintained by core DuckDB, there exists a flag in the Makefile to enable building the package.
These can be enabled by either setting them in the current `env`, through set up files like `bashrc` or `zshrc`, or by setting them before the call to `make`, for example:

```batch
BUILD_PYTHON=1 make debug
```

##### `BUILD_PYTHON` {#docs:current:dev:building:build_configuration::build_python}

When this flag is set, the [Python](#docs:current:clients:python:overview) package is built.

##### `BUILD_SHELL` {#docs:current:dev:building:build_configuration::build_shell}

When this flag is set, the [CLI](#docs:current:clients:cli:overview) is built, this is usually enabled by default.

##### `BUILD_BENCHMARK` {#docs:current:dev:building:build_configuration::build_benchmark}

When this flag is set, DuckDB's in-house benchmark suite is built.
More information about this can be found [in the README](https://github.com/duckdb/duckdb/blob/main/benchmark/README.md).

##### `BUILD_JDBC` {#docs:current:dev:building:build_configuration::build_jdbc}

When this flag is set, the [Java](#docs:current:clients:java) package is built.

##### `BUILD_ODBC` {#docs:current:dev:building:build_configuration::build_odbc}

When this flag is set, the [ODBC](#docs:current:clients:odbc:overview) package is built.

#### Miscellaneous Flags {#docs:current:dev:building:build_configuration::miscellaneous-flags}

##### `DISABLE_UNITY` {#docs:current:dev:building:build_configuration::disable_unity}

To improve compilation time, we use [Unity Build](https://cmake.org/cmake/help/latest/prop_tgt/UNITY_BUILD.html) to combine translation units.
This can however hide include bugs, this flag disables using the unity build so these errors can be detected.

##### `DISABLE_SANITIZER` {#docs:current:dev:building:build_configuration::disable_sanitizer}

In some situations, running an executable that has been built with sanitizers enabled is not supported / can cause problems. Julia is an example of this.
With this flag enabled, the sanitizers are disabled for the build.

#### Overriding Git Hash and Version {#docs:current:dev:building:build_configuration::overriding-git-hash-and-version}

It is possible to override the Git hash and version when building from source using the `OVERRIDE_GIT_DESCRIBE` environment variable.
This is useful when building from sources that are not part of a complete Git repository (e.g., an archive file with no information on commit hashes and tags).
For example:

```batch
OVERRIDE_GIT_DESCRIBE=v0.10.0-843-g09ea97d0a9 GEN=ninja make
```

Will result in the following output when running `./build/release/duckdb`:

```text
v0.10.1-dev843 09ea97d0a9
...
```

### Building Extensions {#docs:current:dev:building:building_extensions}

[Extensions]({% link docs/current/extensions/overview.md %}) can be built from source and installed from the resulting local binary.

#### Building Extensions {#docs:current:dev:building:building_extensions::building-extensions}

To build using extension flags, set the `BUILD_EXTENSIONS` flag to the list of extensions that you want to be built. For example:

```bash
BUILD_EXTENSIONS='autocomplete;httpfs;icu;json;tpch' GEN=ninja make
```

This option also accepts out-of-tree extensions such as [`delta`](#docs:current:core_extensions:delta):

```bash
BUILD_EXTENSIONS='autocomplete;httpfs;icu;json;tpch;delta' GEN=ninja make
```

In most cases, extensions will be directly linked in the resulting DuckDB executable.

#### Special Extension Flags {#docs:current:dev:building:building_extensions::special-extension-flags}

##### `BUILD_JEMALLOC` {#docs:current:dev:building:building_extensions::build_jemalloc}

When this flag is set, the [`jemalloc` extension](#docs:current:core_extensions:jemalloc) is built.

##### `BUILD_TPCE` {#docs:current:dev:building:building_extensions::build_tpce}

When this flag is set, the [TPCE](https://www.tpc.org/tpce/) library is built. Unlike TPC-H and TPC-DS this is not a proper extension and it's not distributed as such. Enabling this allows TPC-E enabled queries through our test suite.

#### Debug Flags {#docs:current:dev:building:building_extensions::debug-flags}

##### `CRASH_ON_ASSERT` {#docs:current:dev:building:building_extensions::crash_on_assert}

`D_ASSERT(condition)` is used all throughout the code, these will throw an InternalException in debug builds.
With this flag enabled, when the assertion triggers it will instead directly cause a crash.

##### `DISABLE_STRING_INLINE` {#docs:current:dev:building:building_extensions::disable_string_inline}

In our execution format `string_t` has the feature to “inline” strings that are under a certain length (12 bytes), this means they don't require a separate allocation.
When this flag is set, we disable this and don't inline small strings.

##### `DISABLE_MEMORY_SAFETY` {#docs:current:dev:building:building_extensions::disable_memory_safety}

Our data structures that are used extensively throughout the non-performance-critical code have extra checks to ensure memory safety, these checks include:

* Making sure `nullptr` is never dereferenced.
* Making sure index out of bounds accesses don't trigger a crash.

With this flag enabled we remove these checks, this is mostly done to check that the performance hit of these checks is negligible.

##### `DESTROY_UNPINNED_BLOCKS` {#docs:current:dev:building:building_extensions::destroy_unpinned_blocks}

When previously pinned blocks in the BufferManager are unpinned, with this flag enabled we destroy them instantly to make sure that there aren't situations where this memory is still being used, despite not being pinned.

##### `DEBUG_STACKTRACE` {#docs:current:dev:building:building_extensions::debug_stacktrace}

When a crash or assertion hit occurs in a test, print a stack trace.
This is useful when debugging a crash that is hard to pinpoint with a debugger attached.

#### Using a CMake Configuration File {#docs:current:dev:building:building_extensions::using-a-cmake-configuration-file}

To build using a CMake configuration file, create an extension configuration file named `extension_config.cmake` with e.g., the following content:

```cmake
duckdb_extension_load(autocomplete)
duckdb_extension_load(fts)
duckdb_extension_load(inet)
duckdb_extension_load(icu)
duckdb_extension_load(json)
duckdb_extension_load(parquet)
```

Build DuckDB as follows:

```bash
GEN=ninja EXTENSION_CONFIGS="extension_config.cmake" make
```

Then, to install the extensions in one go, run:

```bash
# for release builds
cd build/release/extension/
# for debug builds
cd build/debug/extension/
# install extensions
for EXTENSION in *; do
    ../duckdb -c "INSTALL '${EXTENSION}/${EXTENSION}.duckdb_extension';"
done
```

### Android {#docs:current:dev:building:android}

DuckDB has experimental support for Android. Please use the latest `main` branch of DuckDB instead of the stable versions.

#### Building the DuckDB Library Using the Android NDK {#docs:current:dev:building:android::building-the-duckdb-library-using-the-android-ndk}

We provide build instructions for setups using macOS and Android Studio. For other setups, please adjust the steps accordingly.

1. Open [Android Studio](https://developer.android.com/studio).
   Select the **Tools** menu and pick **SDK Manager**.
   Select the SDK Tools tab and tick the **NDK (Side by side)** option.
   Click **OK** to install.

1. Set the Android NDK's location. For example:

   ```bash
   ANDROID_NDK=~/Library/Android/sdk/ndk/28.0.12433566/
   ```

1. Set the [Android ABI](https://developer.android.com/ndk/guides/abis). For example:

   ```bash
   ANDROID_ABI=arm64-v8a
   ```

   Or:

   ```bash
   ANDROID_ABI=x86_64
   ```

1. If you would like to use the [Ninja build system](#docs:current:dev:building:overview::prerequisites), make sure it is installed and available on the `PATH`.

1. Set the list of DuckDB extensions to build. These will be statically linked in the binary. For example:

   ```bash
   DUCKDB_EXTENSIONS="icu;json;parquet"
   ```

1. Navigate to DuckDB's directory and run the build as follows:

   ```bash
   PLATFORM_NAME="android_${ANDROID_ABI}"
   BUILDDIR=./build/${PLATFORM_NAME}
   mkdir -p ${BUILDDIR}
   cd ${BUILDDIR}
   cmake \
       -G "Ninja" \
       -DEXTENSION_STATIC_BUILD=1 \
       -DDUCKDB_EXTRA_LINK_FLAGS="-llog" \
       -DBUILD_EXTENSIONS=${DUCKDB_EXTENSIONS} \
       -DENABLE_EXTENSION_AUTOLOADING=1 \
       -DENABLE_EXTENSION_AUTOINSTALL=1 \
       -DCMAKE_VERBOSE_MAKEFILE=on \
       -DANDROID_PLATFORM=${ANDROID_PLATFORM} \
       -DLOCAL_EXTENSION_REPO="" \
       -DOVERRIDE_GIT_DESCRIBE="" \
       -DDUCKDB_EXPLICIT_PLATFORM=${PLATFORM_NAME} \
       -DBUILD_UNITTESTS=0 \
       -DBUILD_SHELL=1 \
       -DANDROID_ABI=${ANDROID_ABI} \
       -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK}/build/cmake/android.toolchain.cmake \
       -DCMAKE_BUILD_TYPE=Release ../..
   cmake \
       --build . \
       --config Release
   ```

1. For the `arm64-v8a` ABI, the build will produce the `build/android_arm64-v8a/duckdb` and `build/android_arm64-v8a/src/libduckdb.so` binaries.

#### Building the CLI in Termux {#docs:current:dev:building:android::building-the-cli-in-termux}

1. To build the [command line client](#docs:current:clients:cli:overview) in the [Termux application](https://termux.dev/), install the following packages:

   ```bash
   pkg install -y git ninja clang cmake python3
   ```

1. Set the list of DuckDB extensions to build. These will be statically linked in the binary. For example:

   ```bash
   DUCKDB_EXTENSIONS="icu;json"
   ```

1. Build DuckDB as follows:

   ```bash
   mkdir build
   cd build
   export LDFLAGS="-llog"
   cmake \
      -G "Ninja" \
      -DBUILD_EXTENSIONS="${DUCKDB_EXTENSIONS}" \
      -DDUCKDB_EXPLICIT_PLATFORM=linux_arm64_android \
      -DCMAKE_BUILD_TYPE=Release \
      ..
   cmake --build . --config Release
   ```

Note that you can also use the Python client on Termux:

```bash
pip install --pre --upgrade duckdb
```

#### Troubleshooting {#docs:current:dev:building:android::troubleshooting}

##### Log Library Is Missing {#docs:current:dev:building:android::log-library-is-missing}

**Problem:**
The build throws the following error:

```console
ld.lld: error: undefined symbol: __android_log_write
```

**Solution:**
Make sure the log library is linked:

```bash
export LDFLAGS="-llog"
```

### Linux {#docs:current:dev:building:linux}

#### Prerequisites {#docs:current:dev:building:linux::prerequisites}

On Linux, install the required packages with the package manager of your distribution.

##### Ubuntu and Debian {#docs:current:dev:building:linux::ubuntu-and-debian}

###### CLI Client {#docs:current:dev:building:linux::cli-client}

On Ubuntu and Debian (and also MX Linux, Linux Mint, etc.), the requirements for building the DuckDB CLI client are the following:

```bash
sudo apt-get update
sudo apt-get install -y git g++ cmake ninja-build libssl-dev libcurl4-openssl-dev
git clone https://github.com/duckdb/duckdb
cd duckdb
GEN=ninja make
```

##### Fedora, CentOS and Red Hat {#docs:current:dev:building:linux::fedora-centos-and-red-hat}

###### CLI Client {#docs:current:dev:building:linux::cli-client}

The requirements for building the DuckDB CLI client on Fedora, CentOS, Red Hat, AlmaLinux, Rocky Linux, etc. are the following:

```bash
sudo yum install -y git g++ cmake ninja-build openssl-devel
git clone https://github.com/duckdb/duckdb
cd duckdb
GEN=ninja make
```

Note that on older Red Hat-based distributions, you may have to change the package name for `g++` to `gcc-c++`,
skip Ninja and manually configure the number of Make jobs:

```bash
sudo yum install -y git gcc-c++ cmake openssl-devel
git clone https://github.com/duckdb/duckdb
cd duckdb
mkdir build
cd build
cmake ..
make -j`nproc`
```

##### Arch Linux {#docs:current:dev:building:linux::arch-linux}

The following instructions are intended for Arch Linux and Arch-based distributions (e.g., Manjaro, Omarchy).

###### CLI Client {#docs:current:dev:building:linux::cli-client}

DuckDB is [available in Arch's Extra package repository](https://archlinux.org/packages/extra/x86_64/duckdb/).
To install it, run:

```bash
sudo pacman -S duckdb
```

The requirements for building the DuckDB CLI client on Arch, Manjaro, etc. are the following:

```bash
sudo pacman -S git gcc cmake ninja openssl
git clone https://github.com/duckdb/duckdb
cd duckdb
GEN=ninja make
```

##### Alpine Linux {#docs:current:dev:building:linux::alpine-linux}

###### CLI Client {#docs:current:dev:building:linux::cli-client}

The requirements for building the DuckDB CLI client on Alpine Linux are the following:

```bash
apk add g++ git make cmake ninja
git clone https://github.com/duckdb/duckdb
cd duckdb
GEN=ninja make
```

###### Performance with musl libc {#docs:current:dev:building:linux::performance-with-musl-libc}

Note that Alpine Linux uses [musl libc](https://musl.libc.org/) as its C standard library.
DuckDB binaries built with musl libc have lower performance compared to the glibc variants: for some workloads, the slowdown can be more than 5×.
Therefore, it's recommended to use glibc for performance-oriented workloads.

###### Distribution for the `linux_*_musl` Platforms {#docs:current:dev:building:linux::distribution-for-the-linux__musl-platforms}

Starting with DuckDB v1.2.0, [_DuckDB extensions_ are distributed for the `linux_amd64_musl` platform](https://duckdb.org/2025/02/05/announcing-duckdb-120#musl-extensions) (but not yet for the `linux_arm64_musl` platform).
However, there are no official _DuckDB binaries_ distributed for musl libc but it can be built with it manually following the instructions on this page.

###### Python Client on Alpine Linux {#docs:current:dev:building:linux::python-client-on-alpine-linux}

Currently, installing the DuckDB Python on Alpine Linux requires compilation from source.
To do so, install the required packages before running `pip`:

```bash
apk add g++ py3-pip python3-dev
pip install duckdb
```

#### Using the DuckDB CLI Client on Linux {#docs:current:dev:building:linux::using-the-duckdb-cli-client-on-linux}

Once the build finishes successfully, you can find the `duckdb` binary in the `build` directory:

```bash
build/release/duckdb
```

For different build configurations (` debug`, `relassert`, etc.), please consult the [“Build Configurations” page](#docs:current:dev:building:build_configuration).

#### Building Extensions {#docs:current:dev:building:linux::building-extensions}

To build extensions, set the `BUILD_EXTENSIONS` flag to the list of extensions that you want to be built. For example:

```bash
BUILD_EXTENSIONS='autocomplete;httpfs;icu;json;tpch' GEN=ninja make
```

#### Troubleshooting {#docs:current:dev:building:linux::troubleshooting}

##### R Package on Linux AArch64: `too many GOT entries` Build Error {#docs:current:dev:building:linux::r-package-on-linux-aarch64-too-many-got-entries-build-error}

**Problem:**
Building the R package on Linux running on an ARM64 architecture (AArch64) may result in the following error message:

```console
/usr/bin/ld: /usr/include/c++/10/bits/basic_string.tcc:206:
warning: too many GOT entries for -fpic, please recompile with -fPIC
```

**Solution:**
Create or edit the `~/.R/Makevars` file. This example also contains the [`MAKEFLAGS` setting to parallelize the build](#docs:current:dev:building:r::the-build-only-uses-a-single-thread ):

```ini
ALL_CXXFLAGS = $(PKG_CXXFLAGS) -fPIC $(SHLIB_CXXFLAGS) $(CXXFLAGS)
MAKEFLAGS = -j$(nproc)
```

##### Building the httpfs Extension Fails {#docs:current:dev:building:linux::building-the-httpfs-extension-fails}

**Problem:**
When building the [`httpfs` extension](#docs:current:core_extensions:httpfs:overview) on Linux, the build may fail with the following error.

```console
CMake Error at /usr/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find OpenSSL, try to set the path to OpenSSL root folder in the
  system variable OPENSSL_ROOT_DIR (missing: OPENSSL_CRYPTO_LIBRARY
  OPENSSL_INCLUDE_DIR)
```

**Solution:**
Install the `libssl-dev` library.

```bash
sudo apt-get install -y libssl-dev
```

Then, build with:

```bash
GEN=ninja BUILD_EXTENSIONS="httpfs" make
```

### macOS {#docs:current:dev:building:macos}

#### Prerequisites {#docs:current:dev:building:macos::prerequisites}

Install Xcode and [Homebrew](https://brew.sh/). Then, install the required packages with:

```bash
brew install git cmake ninja
```

#### Building DuckDB {#docs:current:dev:building:macos::building-duckdb}

Clone and build DuckDB as follows.

```bash
git clone https://github.com/duckdb/duckdb
cd duckdb
GEN=ninja make
```

Once the build finishes successfully, you can find the `duckdb` binary in the `build` directory:

```bash
build/release/duckdb
```

For different build configurations (` debug`, `relassert`, etc.), please consult the [Build Configurations page](#docs:current:dev:building:build_configuration).

#### Troubleshooting {#docs:current:dev:building:macos::troubleshooting}

##### Build Failure: `'string' file not found` {#docs:current:dev:building:macos::build-failure-string-file-not-found}

**Problem:**
The build fails on macOS with the following error:

```console
FAILED: third_party/libpg_query/CMakeFiles/duckdb_pg_query.dir/src_backend_nodes_list.cpp.o
/Library/Developer/CommandLineTools/usr/bin/c++ -DDUCKDB_BUILD_LIBRARY -DEXT_VERSION_PARQUET=\"9cba6a2a03\" -I/Users/builder/external/duckdb/src/include -I/Users/builder/external/duckdb/third_party/fsst -I/Users/builder/external/duckdb/third_party/fmt/include -I/Users/builder/external/duckdb/third_party/hyperloglog -I/Users/builder/external/duckdb/third_party/fastpforlib -I/Users/builder/external/duckdb/third_party/skiplist -I/Users/builder/external/duckdb/third_party/fast_float -I/Users/builder/external/duckdb/third_party/re2 -I/Users/builder/external/duckdb/third_party/miniz -I/Users/builder/external/duckdb/third_party/utf8proc/include -I/Users/builder/external/duckdb/third_party/concurrentqueue -I/Users/builder/external/duckdb/third_party/pcg -I/Users/builder/external/duckdb/third_party/tdigest -I/Users/builder/external/duckdb/third_party/mbedtls/include -I/Users/builder/external/duckdb/third_party/jaro_winkler -I/Users/builder/external/duckdb/third_party/yyjson/include -I/Users/builder/external/duckdb/third_party/libpg_query/include -O3 -DNDEBUG -O3 -DNDEBUG   -std=c++11 -arch arm64 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX15.1.sdk -fPIC -fvisibility=hidden -fcolor-diagnostics -w -MD -MT third_party/libpg_query/CMakeFiles/duckdb_pg_query.dir/src_backend_nodes_list.cpp.o -MF third_party/libpg_query/CMakeFiles/duckdb_pg_query.dir/src_backend_nodes_list.cpp.o.d -o third_party/libpg_query/CMakeFiles/duckdb_pg_query.dir/src_backend_nodes_list.cpp.o -c /Users/builder/external/duckdb/third_party/libpg_query/src_backend_nodes_list.cpp
In file included from /Users/builder/external/duckdb/third_party/libpg_query/src_backend_nodes_list.cpp:35:
/Users/builder/external/duckdb/third_party/libpg_query/include/pg_functions.hpp:4:10: fatal error: 'string' file not found
    4 | #include <string>
```

**Solution:**
Users report that reinstalling Xcode fixed their problem.
See related discussions on the [DuckDB GitHub issues](https://github.com/duckdb/duckdb/issues/14665#issuecomment-2452679953) and on [Stack Overflow](https://stackoverflow.com/questions/78999694/cant-compile-c-hello-world-with-clang-on-mac-sequoia-15-0-and-vs-code).

> **Warning.** Attempting to reinstall your Xcode suite may impact other applications on your system. Proceed with caution.

```bash
sudo rm -rf /Library/Developer/CommandLineTools
xcode-select --install
```

##### Debug Build Prints malloc Warning {#docs:current:dev:building:macos::debug-build-prints-malloc-warning}

**Problem:**
The `debug` build on macOS prints a `malloc` warning, e.g.:

```text
duckdb(83082,0x205b30240) malloc: nano zone abandoned due to inability to reserve vm space.
```

**Solution:**
To prevent this, set the `MallocNanoZone` flag to 0:

```bash
MallocNanoZone=0 make debug
```

To apply this change for your future terminal sessions, you can add the following to your `~/.zshrc` file:

```bash
export MallocNanoZone=0
```

### Raspberry Pi {#docs:current:dev:building:raspberry_pi}

DuckDB is not officially distributed for the Raspberry Pi OS (previously called Raspbian).
You can build it following the instructions on this page.

#### Raspberry Pi (64-bit) {#docs:current:dev:building:raspberry_pi::raspberry-pi-64-bit}

First, install the required build packages:

```bash
sudo apt-get update
sudo apt-get install -y git g++ cmake ninja-build
```

Then, clone and build it as follows:

```bash
git clone https://github.com/duckdb/duckdb
cd duckdb
GEN=ninja BUILD_EXTENSIONS="icu;json" make
```

Finally, run it:

```bash
build/release/duckdb
```

#### Raspberry Pi (32-bit) {#docs:current:dev:building:raspberry_pi::raspberry-pi-32-bit}

On 32-bit Raspberry Pi boards, you need to add the [`-latomic` link flag](https://github.com/duckdb/duckdb/issues/13855#issuecomment-2341539339).
As extensions are not distributed for this platform, it's recommended to also include them in the build.
For example:

```bash
mkdir build
cd build
cmake .. \
    -DBUILD_EXTENSIONS="httpfs;json;parquet" \
    -DDUCKDB_EXTRA_LINK_FLAGS="-latomic"
make -j4
```

### Windows {#docs:current:dev:building:windows}

On Windows, DuckDB requires the [Microsoft Visual C++ Redistributable package](https://learn.microsoft.com/en-US/cpp/windows/latest-supported-vc-redist) both as a build-time and runtime dependency. Note that unlike the build process on UNIX-like systems, the Windows builds directly call CMake.

#### Visual Studio {#docs:current:dev:building:windows::visual-studio}

To build DuckDB on Windows, we recommend using the Visual Studio compiler.
To use it, follow the instructions in the [CI workflow](https://github.com/duckdb/duckdb/blob/52b43b166091c82b3f04bf8af15f0ace18207a64/.github/workflows/Windows.yml#L73):

```bash
python scripts/windows_ci.py
cmake \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_GENERATOR_PLATFORM=x64 \
    -DENABLE_EXTENSION_AUTOLOADING=1 \
    -DENABLE_EXTENSION_AUTOINSTALL=1 \
    -DDUCKDB_EXTENSION_CONFIGS="${GITHUB_WORKSPACE}/.github/config/bundled_extensions.cmake" \
    -DDISABLE_UNITY=1 \
    -DOVERRIDE_GIT_DESCRIBE="$OVERRIDE_GIT_DESCRIBE"
cmake --build . --config Release --parallel
```

#### MSYS2 and MinGW64 {#docs:current:dev:building:windows::msys2-and-mingw64}

DuckDB on Windows can also be built with [MSYS2](https://www.msys2.org/) and [MinGW64](https://www.mingw-w64.org/).
Note that this build is only supported for compatibility reasons and should only be used if the Visual Studio build is not feasible on a given platform.
To build DuckDB with MinGW64, install the required dependencies using Pacman.
When prompted with `Enter a selection (default=all)`, select the default option by pressing `Enter`.

```bash
pacman -Syu git mingw-w64-x86_64-toolchain mingw-w64-x86_64-cmake mingw-w64-x86_64-ninja
git clone https://github.com/duckdb/duckdb
cd duckdb
cmake -G "Ninja" -DCMAKE_BUILD_TYPE=Release -DBUILD_EXTENSIONS="icu;parquet;json"
cmake --build . --config Release
```

Once the build finishes successfully, you can find the `duckdb.exe` binary in the repository's directory:

```bash
./duckdb.exe
```

#### Building the Go Client {#docs:current:dev:building:windows::building-the-go-client}

Building on Windows may result in the following error:

```batch
go build
```

```console
collect2.exe: error: ld returned 5 exit status
```

GitHub user [vdmitriyev](https://github.com/vdmitriyev) shared instructions for [building the DuckDB Go client on Windows](https://github.com/marcboeker/go-duckdb/issues/4#issuecomment-2176409066):

1. Get four files (` .dll, .lib, .hpp, .h`) from the `libduckdb-windows-amd64.zip` archive.

2. Place them to, e.g.,: `C:\duckdb-go\libs\`.

3. Install the dependencies following the [`duckdb-go` project](https://github.com/duckdb/duckdb-go).

4. Build your project using the following instructions:

   ```bash
   set PATH=C:\duckdb-go\libs\;%PATH%
   set CGO_CFLAGS=-IC:\duckdb-go\libs\
   set CGO_LDFLAGS=-LC:\duckdb-go\libs\ -lduckdb
   go build
   ```

### Python {#docs:current:dev:building:python}

The DuckDB Python package has its own repository at [`duckdb/duckdb-python`](https://github.com/duckdb/duckdb-python) and uses [pybind11](https://pybind11.readthedocs.io/en/stable/) to create Python bindings with DuckDB.

#### Prerequisites {#docs:current:dev:building:python::prerequisites}

This guide assumes:

1. You have a working copy of the DuckDB Python package source (including git submodules and tags)
2. You have [Astral UV](https://docs.astral.sh/uv/) version >= 0.8.0 installed
3. You run commands from the root of the `duckdb-python` source

We are opinionated about using **Astral UV** for Python environment and dependency management. While using pip for a development environment with an editable install without build isolation is possible, we don't provide guidance for that approach in this guide.

We use **CLion** as our IDE. This guide doesn't include specific instructions for other IDEs, but the setup should be similar.

##### 1. DuckDB Python Repository {#docs:current:dev:building:python::1-duckdb-python-repository}

Start by [forking `duckdb-python`](https://github.com/duckdb/duckdb-python/fork) into a personal repository, then clone your fork:

```bash
git clone --recurse-submodules YOUR_FORK_URL
cd duckdb-python
git remote add upstream https://github.com/duckdb/duckdb-python.git
git fetch --all
```

If you've already cloned without submodules:

```bash
git submodule update --init --recursive
git remote add upstream https://github.com/duckdb/duckdb-python.git
git fetch --all
```

**Important notes:**
- DuckDB is vendored as a git submodule and must be initialized
- DuckDB version determination depends on local availability of git tags
- If switching between branches with different submodule refs, add the git hooks:

```bash
git config --local core.hooksPath .githooks/
```

##### 2. Install Astral uv {#docs:current:dev:building:python::2-install-astral-uv}

[Install uv](https://docs.astral.sh/uv/getting-started/installation/) version >= 0.8.0.

#### Development Environment Setup {#docs:current:dev:building:python::development-environment-setup}

##### 1. Platform-Specific Setup {#docs:current:dev:building:python::1-platform-specific-setup}

**All Platforms:**
- Python 3.9+ supported
- uv >= 0.8.0 required
- CMake and Ninja (installed via UV)
- C++ compiler toolchain

**Linux (Ubuntu 24.04):**

```bash
sudo apt-get update
sudo apt-get install ccache
```

**macOS:**

```bash
# Xcode command line tools
xcode-select --install
```

**Windows:**
- Visual Studio 2019+ with C++ support
- Git for Windows

##### 2. Install Dependencies and Build {#docs:current:dev:building:python::2-install-dependencies-and-build}

Set up the development environment in two steps:

```bash
# Install all development dependencies without building the project
uv sync --no-install-project

# Build and install the project without build isolation
uv sync --no-build-isolation
```

**Why two steps?**
- `uv sync` performs editable installs by default with scikit-build-core using a persistent build-dir
- The build happens in an isolated, ephemeral environment where cmake's paths point to non-existing directories
- Installing dependencies first, then building without isolation ensures proper cmake integration

##### 3. Enable Pre-Commit Hooks {#docs:current:dev:building:python::3-enable-pre-commit-hooks}

We run a number of linting, formatting and type-checking in CI. You can run all of these manually, but to make your life easier you can install the exact same checks we run in CI as git hooks with pre-commit, which is already installed as part of the dev dependencies:

```bash
uvx pre-commit install
```

This will run all required checks before letting your commit pass.

You can also install a post-checkout hook that always runs `git submodule update --init --recursive`. When you change branches between main and a bugfix branch, this makes sure the `duckdb` submodule is always correctly initialized:

```bash
uvx pre-commit install --hook-type post-checkout
```

##### 4. Verify Installation {#docs:current:dev:building:python::4-verify-installation}

```bash
uv run python -c "import duckdb; print(duckdb.sql('SELECT 42').fetchall())"
```

#### Development Workflow {#docs:current:dev:building:python::development-workflow}

##### Running Tests {#docs:current:dev:building:python::running-tests}

Run all tests:

```bash
uv run --no-build-isolation pytest ./tests --verbose
```

Run fast tests only (excludes slow directory):

```bash
uv run --no-build-isolation pytest ./tests --verbose --ignore=./tests/slow
```

##### Test Coverage {#docs:current:dev:building:python::test-coverage}

Run with coverage (compiles extension with `--coverage` for C++ coverage):

```bash
COVERAGE=1 uv run --no-build-isolation coverage run -m pytest ./tests --verbose
```

Check Python coverage:

```bash
uv run coverage html -d htmlcov-python
uv run coverage report --format=markdown
```

Check C++ coverage:

```bash
uv run gcovr \
  --gcov-ignore-errors all \
  --root "$PWD" \
  --filter "${PWD}/src/duckdb_py" \
  --exclude '.*/\.cache/.*' \
  --gcov-exclude '.*/\.cache/.*' \
  --gcov-exclude '.*/external/.*' \
  --gcov-exclude '.*/site-packages/.*' \
  --exclude-unreachable-branches \
  --exclude-throw-branches \
  --html --html-details -o coverage-cpp.html \
  build/coverage/src/duckdb_py \
  --print-summary
```

##### Building Wheels {#docs:current:dev:building:python::building-wheels}

Build wheel for your system:

```bash
uv build
```

Build for specific Python version:

```bash
uv build -p 3.9
```

##### Cleaning Build Artifacts {#docs:current:dev:building:python::cleaning-build-artifacts}

```bash
uv cache clean
rm -rf build .venv uv.lock
```

#### IDE Setup (CLion) {#docs:current:dev:building:python::ide-setup-clion}

For CLion users, the project can be configured for C++ debugging of the Python extension:

##### CMake Profile Configuration {#docs:current:dev:building:python::cmake-profile-configuration}

In **Settings** → **Build, Execution, Deployment** → **CMake**, create a Debug profile:

- **Name:** Debug
- **Build type:** Debug  
- **Generator:** Ninja
- **CMake Options:**
  ```text
  -DCMAKE_PREFIX_PATH=$CMakeProjectDir$/.venv;$CMAKE_PREFIX_PATH
  ```

##### Python Debug Configuration   {#docs:current:dev:building:python::python-debug-configuration--}

Create a **CMake Application** run configuration:

- **Name:** Python Debug
- **Target:** `All targets`
- **Executable:** `⟨PROJECT_DIR⟩/.venv/bin/python3`{:.language-sql .highlight}
- **Program arguments:** `$FilePath$`
- **Working directory:** `$ProjectFileDir$`

This allows setting C++ breakpoints and debugging Python scripts that use the DuckDB extension.

#### Debugging {#docs:current:dev:building:python::debugging}

##### Command Line Debugging {#docs:current:dev:building:python::command-line-debugging}

Set breakpoints and debug with lldb:

```bash
# Example Python script (test.py)
# import duckdb
# print(duckdb.sql("select * from range(1000)").df())

lldb -- .venv/bin/python3 test.py
```

In lldb:

```bash
# Set breakpoint (library loads when imported)
(lldb) br s -n duckdb::DuckDBPyRelation::FetchDF
(lldb) r
```

#### Cross-Platform Testing {#docs:current:dev:building:python::cross-platform-testing}

You can run the packaging workflow manually on your fork for any branch, choosing platforms and test suites via the GitHub Actions web interface.

#### Troubleshooting {#docs:current:dev:building:python::troubleshooting}

##### Build Issues {#docs:current:dev:building:python::build-issues}

**Missing git tags:** If you forked DuckDB Python, ensure you have the upstream tags:

```bash
git remote add upstream https://github.com/duckdb/duckdb-python.git
git fetch --tags upstream
git push --tags
```

##### Platform-Specific Issues {#docs:current:dev:building:python::platform-specific-issues}

**Windows compilation:** Ensure you have Visual Studio 2019+ with C++ support installed.

### R {#docs:current:dev:building:r}

This page contains instructions for building the R client library.

#### Parallelizing the Build {#docs:current:dev:building:r::parallelizing-the-build}

**Problem:**
By default, R compiles packages using a single thread, which causes the build to be slow.

**Solution:**
To parallelize the compilation, create or edit the `~/.R/Makevars` file, and add a line like the following:

```ini
MAKEFLAGS = -j8
```

The above will parallelize the compilation using 8 threads. On Linux/macOS, you can add the following to use all of the machine's threads:

```ini
MAKEFLAGS = -j$(nproc)
```

However, note that, the more threads that are used, the higher the RAM consumption. If the system runs out of RAM while compiling, then the R session will crash.

### Troubleshooting {#docs:current:dev:building:troubleshooting}

This page contains solutions to common problems reported by users. If you have platform-specific issues, make sure you also consult the troubleshooting guide for your platform such as the one for [Linux builds](#docs:current:dev:building:linux::troubleshooting).

#### The Build Runs Out of Memory {#docs:current:dev:building:troubleshooting::the-build-runs-out-of-memory}

**Problem:**
Ninja parallelizes the build, which can cause out-of-memory issues on systems with limited resources.
These issues have also been reported to occur on Alpine Linux, especially on machines with limited resources.

**Solution:**
Avoid using Ninja by setting the Makefile generator to empty via `GEN=`:

```bash
GEN= make
```

### Unofficial and Unsupported Platforms {#docs:current:dev:building:unofficial_and_unsupported_platforms}

> **Warning.** The platforms listed on this page are not officially supported.
> The build instructions are provided on a best-effort basis.
> Community contributions are very welcome.

DuckDB is built and distributed for several platforms with [different levels of support](#docs:current:dev:building:overview).
DuckDB _can be built_ for other platforms with varying levels of success.
This page provides an overview of these with the intent to clarify which platforms can be expected to work.

#### 32-bit Architectures {#docs:current:dev:building:unofficial_and_unsupported_platforms::32-bit-architectures}

[32-bit architectures](https://en.wikipedia.org/wiki/32-bit_computing) are officially not supported but it is possible to build DuckDB manually for some of these platforms.
For example, see the build instructions for [32-bit Raspberry Pi boards](#docs:current:dev:building:raspberry_pi::raspberry-pi-32-bit).

Note that 32-bit platforms are limited to using 4 GiB RAM due to the amount of addressable memory.

#### Big-Endian Architectures {#docs:current:dev:building:unofficial_and_unsupported_platforms::big-endian-architectures}

[Big-endian architectures](https://en.wikipedia.org/wiki/Endianness) (such as PowerPC) are [not supported](https://duckdblabs.com/community_support_policy#architectures) by DuckDB.
While DuckDB can likely be built on such architectures,
the resulting binary may exhibit [correctness](https://github.com/duckdb/duckdb/issues/5548) [errors](https://github.com/duckdb/duckdb/issues/9714) on certain operations.
Therefore, its use is not recommended.

#### RISC-V Architectures {#docs:current:dev:building:unofficial_and_unsupported_platforms::risc-v-architectures}

##### Native Build (Recommended) {#docs:current:dev:building:unofficial_and_unsupported_platforms::native-build-recommended}

DuckDB builds natively on RISC-V 64-bit boards without any special flags. Tested on a [BananaPi F3](https://wiki.banana-pi.org/Banana_Pi_BPI-F3) ([SpacemiT K1](https://www.spacemit.com/key-stone-k1), rv64gc, 8 cores @ 1.6 GHz, 16 GB RAM) running Debian Trixie:

```bash
sudo apt-get update
sudo apt-get install -y git g++ cmake ninja-build
git clone --depth=1 https://github.com/duckdb/duckdb
cd duckdb
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)
```

Build time is approximately 2 hours on an 8-core SpacemiT K1. The resulting binary works out of the box:

```bash
./duckdb -c “SELECT 'Hello from RISC-V' AS message;”
```

```text
┌───────────────────┐
│      message      │
│      varchar      │
├───────────────────┤
│ Hello from RISC-V │
└───────────────────┘
```

Aggregation queries work as expected:

```bash
./duckdb -c “CREATE TABLE test AS SELECT range AS id, range * 3.14 AS value FROM range(1000);
SELECT count(*) AS cnt, round(avg(value), 2) AS avg_val FROM test;”
```

```text
┌───────┬─────────┐
│  cnt  │ avg_val │
│ int64 │ double  │
├───────┼─────────┤
│  1000 │ 1568.43 │
└───────┴─────────┘
```

The DuckDB Python package also builds successfully from source on riscv64:

```bash
pip install duckdb --no-binary duckdb
```

##### Build with RVV (RISC-V Vector Extension) {#docs:current:dev:building:unofficial_and_unsupported_platforms::build-with-rvv-risc-v-vector-extension}

On boards with [RVV 1.0](https://github.com/riscvarchive/riscv-v-spec) support (e.g., [SpacemiT K3](https://www.spacemit.com/key-stone-k3) with vlen 256), you can enable vector instructions for better performance. The user [“LivingLinux” on Bluesky](https://bsky.app/profile/livinglinux.bsky.social) [built DuckDB](https://bsky.app/profile/livinglinux.bsky.social/post/3lak5q7mmg42j) with RVV and [published a video about it](https://www.youtube.com/watch?v=G6uVDH3kvNQ):

```bash
GEN=ninja \
    CC='gcc-14 -march=rv64gcv_zicsr_zifencei_zihintpause_zvl256b' \
    CXX='g++-14 -march=rv64gcv_zicsr_zifencei_zihintpause_zvl256b' \
    BUILD_EXTENSIONS='fts' \
    make
```

##### Cross-Compilation {#docs:current:dev:building:unofficial_and_unsupported_platforms::cross-compilation}

For those who do not have RISC-V hardware, you can cross-compile DuckDB using the [riscv-gnu-toolchain](https://github.com/riscv-collab/riscv-gnu-toolchain):

```bash
GEN=ninja \
    CC='riscv64-linux-gnu-gcc -march=rv64gcv_zicsr_zifencei_zihintpause_zvl256b' \
    CXX='riscv64-linux-gnu-g++ -march=rv64gcv_zicsr_zifencei_zihintpause_zvl256b' \
    make
```

For more reference information on DuckDB RISC-V cross-compiling, see the [mocusez/duckdb-riscv-ci](https://github.com/mocusez/duckdb-riscv-ci) and [DuckDB Pull Request #16549](https://github.com/duckdb/duckdb/pull/16549).

#### Known Issues {#docs:current:dev:building:unofficial_and_unsupported_platforms::known-issues}

##### Platform Reported Incorrectly {#docs:current:dev:building:unofficial_and_unsupported_platforms::platform-reported-incorrectly}

As of version 1.5.2, DuckDB [incorrectly reports the platform](https://github.com/duckdb/duckdb/pull/21802) as `amd64` for unsupported platforms such as RISC-V. This causes is to download an incorrect extension binary. Currently, extensions are only available for officially supported platforms, so to use an extension, you need to [build it e.g. using the `BUILD_EXTENSIONS` flag](#docs:current:dev:building:building_extensions).

## Benchmark Suite {#docs:current:dev:benchmark}

DuckDB has an extensive benchmark suite.
When making changes that have potential performance implications, it is important to run these benchmarks to detect potential performance regressions.

#### Getting Started {#docs:current:dev:benchmark::getting-started}

To build the benchmark suite, run the following command in the [DuckDB repository](https://github.com/duckdb/duckdb):

```batch
BUILD_BENCHMARK=1 BUILD_EXTENSIONS='tpch' make
```

#### Listing Benchmarks {#docs:current:dev:benchmark::listing-benchmarks}

To list all available benchmarks, run:

```batch
build/release/benchmark/benchmark_runner --list
```

#### Running Benchmarks {#docs:current:dev:benchmark::running-benchmarks}

##### Running a Single Benchmark {#docs:current:dev:benchmark::running-a-single-benchmark}

To run a single benchmark, issue the following command:

```batch
build/release/benchmark/benchmark_runner benchmark/micro/nulls/no_nulls_addition.benchmark
```

The output will be printed to `stdout` in CSV format, in the following format:

```text
name	run	timing
benchmark/micro/nulls/no_nulls_addition.benchmark	1	0.121234
benchmark/micro/nulls/no_nulls_addition.benchmark	2	0.121702
benchmark/micro/nulls/no_nulls_addition.benchmark	3	0.122948
benchmark/micro/nulls/no_nulls_addition.benchmark	4	0.122534
benchmark/micro/nulls/no_nulls_addition.benchmark	5	0.124102
```

You can also specify an output file using the `--out` flag. This will write only the timings (delimited by newlines) to that file.

```batch
build/release/benchmark/benchmark_runner benchmark/micro/nulls/no_nulls_addition.benchmark --out=timings.out
```

The output will contain the following:

```text
0.182472
0.185027
0.184163
0.185281
0.182948
```

##### Running Multiple Benchmarks Using a Regular Expression {#docs:current:dev:benchmark::running-multiple-benchmarks-using-a-regular-expression}

You can also use a regular expression to specify which benchmarks to run.
Be careful of shell expansion of certain regex characters (e.g., `*` will likely be expanded by your shell, hence this requires proper quoting or escaping).

```batch
build/release/benchmark/benchmark_runner "benchmark/micro/nulls/.*"
```

###### Running All Benchmarks {#docs:current:dev:benchmark::running-all-benchmarks}

Not specifying any argument will run all benchmarks.

```batch
build/release/benchmark/benchmark_runner
```

###### Other Options {#docs:current:dev:benchmark::other-options}

The `--info` flag gives you some other information about the benchmark.

```batch
build/release/benchmark/benchmark_runner benchmark/micro/nulls/no_nulls_addition.benchmark --info
```

```text
display_name:NULL Addition (no nulls)
group:micro
subgroup:nulls
```

The `--query` flag will print the query that is run by the benchmark.

```sql
SELECT min(i + 1) FROM integers;
```

The `--profile` flag will output a query tree.

#### Creating Benchmarks {#docs:current:dev:benchmark::creating-benchmarks}

Some development work is around performance,
and including a benchmark along with the other tests not only validates any improvements,
but also prevents future performance regressions in the feature.

##### Benchmark Example {#docs:current:dev:benchmark::benchmark-example}

To illustrate how to create a benchmark file, we can look at the benchmark for the `FILL` window function.
(The `FILL` function linearly interpolates missing values in an ordered partition.)

Benchmarks are similar to unit test files, and have the same type of header.

```python
# name: benchmark/micro/window/window_fill.benchmark
# description: Measure the performance of FILL
# group: [window]
```
The `make format-head` command can ensure that the header has the expected structure and prevent tidy check errors.

Below this header, there are a set of keywords summarizing the benchmark.

```text
name FillPerformance
group micro
subgroup window
```

While some benchmarks run a single query,
it can often be useful to _parameterize_ a benchmark using the `argument` keyword.
This allows the benchmark to be run with different settings, such as data volume.
For the `FILL` benchmark, there are three arguments:

```text
argument sf 10
argument errors 0.1
argument keys 4
```

For `FILL` these are
* The scale factor (millions of rows per partition)
* The error rate (fraction of the values that are missing)
* The number of partitions.

Benchmarks generally require some data preparation before running the query.
Data preparation is done in the `load` section of the benchmark file.
For the `FILL` benchmark, we create a table using the parameters and a random number generator.

```sql
load
select setseed(0.8675309);
create or replace table data as (
	select
		k::TINYINT as k,
		(case when random() > ${errors} then m - 1704067200000 else null end) as v,
		m,
	from range(1704067200000, 1704067200000 + ${sf} * 1_000_000 * 10, 10) times(m)
	cross join range(${keys}) keys(k)
);
```

The `argument` parameters are expanded in the query,
similar to the way that `foreach` values are expanded in unit tests.
Note that we can issue multiple SQL statements in the `load` section.

Once the data is prepared, we are finally ready to specify the query we will benchmark!
This is done in the `run` section, and the restrictions are the same as for a unit test
(e.g., no blank lines, etc.)
For the `FILL` benchmark, we want to find all places where the interpolation fails:

```sql
run
SELECT
	m,
	k,
	fill(v) OVER (PARTITION BY k ORDER BY m) as v
FROM
	data
qualify v <> m - 1704067200000;
```

If the interpolation is correct, then we will have no output, no matter the scale.
We can check this with the final `result` clause,
which has the same syntax as a unit test:

```text
result III
```

By providing no output rows, we can check the correctness of the query as well as its performance.

There are many other examples in the top level `benchmark/` directory,
and you may want to have a look to discover some other techniques.

## Testing {#dev:sqllogictest}

### Overview {#docs:current:dev:sqllogictest:overview}

#### How is DuckDB Tested? {#docs:current:dev:sqllogictest:overview::how-is-duckdb-tested}

Testing is vital to make sure that DuckDB works properly and keeps working properly. For that reason, we put a large emphasis on thorough and frequent testing:
* We run a batch of small tests on every commit using [GitHub Actions](https://github.com/duckdb/duckdb/actions), and run a more exhaustive batch of tests on pull requests and commits in the `main` branch.
* We use a [fuzzer](https://github.com/duckdb/duckdb-fuzzer), which automatically reports issues found through fuzzing DuckDB.
* We use [SQLsmith](#docs:current:core_extensions:sqlsmith) for generating random queries.

### sqllogictest Introduction {#docs:current:dev:sqllogictest:intro}

For testing plain SQL, we use an extended version of the SQL logic test suite, adopted from [SQLite](https://www.sqlite.org/sqllogictest/doc/trunk/about.wiki). Every test is a single self-contained file located in the `test/sql` directory.
To run tests located outside of the default `test` directory, specify `--test-dir <root_directory>` and make sure provided test file paths are relative to that root directory.

The test describes a series of SQL statements, together with either the expected result, a `statement ok` indicator, or a `statement error` indicator. An example of a test file is shown below:

```sql
# name: test/sql/projection/test_simple_projection.test
# group [projection]

# enable query verification
statement ok
PRAGMA enable_verification

# create table
statement ok
CREATE TABLE a (i INTEGER, j INTEGER);

# insertion: 1 affected row
statement ok
INSERT INTO a VALUES (42, 84);

query II
SELECT * FROM a;
----
42	84
```

In this example, three statements are executed. The first statements are expected to succeed (prefixed by `statement ok`). The third statement is expected to return a single row with two columns (indicated by `query II`). The values of the row are expected to be `42` and `84` (separated by a tab character). For more information on query result verification, see the [result verification section](#docs:current:dev:sqllogictest:result_verification).

The top of every file should contain a comment describing the name and group of the test. The name of the test is always the relative file path of the file. The group is the folder that the file is in. The name and group of the test are relevant because they can be used to execute *only* that test in the unittest group. For example, if we wanted to execute *only* the above test, we would run the command `unittest test/sql/projection/test_simple_projection.test`. If we wanted to run all tests in a specific directory, we would run the command `unittest "[projection]"`.

Any tests that are placed in the `test` directory are automatically added to the test suite. Note that the extension of the test is significant. The sqllogictests should either use the `.test` extension, or the `.test_slow` extension. The `.test_slow` extension indicates that the test takes a while to run, and will only be run when all tests are explicitly run using `unittest *`. Tests with the extension `.test` will be included in the fast set of tests.

#### Query Verification {#docs:current:dev:sqllogictest:intro::query-verification}

Many simple tests start by enabling query verification. This can be done through the following `PRAGMA` statement:

```sql
statement ok
PRAGMA enable_verification
```

Query verification performs extra validation to ensure that the underlying code runs correctly. The most important part of that is that it verifies that optimizers do not cause bugs in the query. It does this by running both an unoptimized and optimized version of the query, and verifying that the results of these queries are identical.

Query verification is very useful because it not only discovers bugs in optimizers, but also finds bugs in e.g., join implementations. This is because the unoptimized version will typically run using cross products instead. Because of this, query verification can be very slow to do when working with larger datasets. It is therefore recommended to turn on query verification for all unit tests, except those involving larger datasets (more than ~10-100 rows).

#### Editors & Syntax Highlighting {#docs:current:dev:sqllogictest:intro::editors--syntax-highlighting}

The sqllogictests are not exactly an industry standard, but several other systems have adopted them as well. Parsing sqllogictests is intentionally simple. All statements have to be separated by empty lines. For that reason, writing a syntax highlighter is not extremely difficult.

A syntax highlighter exists for [Visual Studio Code](https://marketplace.visualstudio.com/items?itemName=benesch.sqllogictest). We have also [made a fork that supports the DuckDB dialect of the sqllogictests](https://github.com/Mytherin/vscode-sqllogictest). You can use the fork by installing the original, then copying the `syntaxes/sqllogictest.tmLanguage.json` into the installed extension (on macOS this is located in `~/.vscode/extensions/benesch.sqllogictest-0.1.1`).

A syntax highlighter is also available for [CLion](https://plugins.jetbrains.com/plugin/15295-sqltest). It can be installed directly on the IDE by searching SQLTest on the marketplace. A [GitHub repository](https://github.com/pdet/SQLTest) is also available, with extensions and bug reports being welcome.

##### Temporary Files {#docs:current:dev:sqllogictest:intro::temporary-files}

For some tests (e.g., CSV/Parquet file format tests) it is necessary to create temporary files. Any temporary files should be created in the temporary testing directory. This directory can be used by placing the string `__TEST_DIR__` in a query. This string will be replaced by the path of the temporary testing directory.

```sql
statement ok
COPY csv_data TO '__TEST_DIR__/output_file.csv.gz' (COMPRESSION gzip);
```

##### Require & Extensions {#docs:current:dev:sqllogictest:intro::require--extensions}

To avoid bloating the core system, certain functionality of DuckDB is available only as an extension. Tests can be built for those extensions by adding a `require` field in the test. If the extension is not loaded, any statements that occur after the require field will be skipped. Examples of this are `require parquet` or `require icu`.

Another usage is to limit a test to a specific vector size. For example, adding `require vector_size 512` to a test will prevent the test from being run unless the vector size is greater than or equal to 512. This is useful because certain functionality is not supported for low vector sizes, but we run tests using a vector size of 2 in our CI.

### Writing Tests {#docs:current:dev:sqllogictest:writing_tests}

#### Development and Testing {#docs:current:dev:sqllogictest:writing_tests::development-and-testing}

It is crucial that any new features that get added have correct tests that not only test the “happy path”, but also test edge cases and incorrect usage of the feature. In this section, we describe how DuckDB tests are structured and how to make new tests for DuckDB.

The tests can be run by running the `unittest` program located in the `test` folder. For the default compilations this is located in either `build/release/test/unittest` (release) or `build/debug/test/unittest` (debug).

#### Philosophy {#docs:current:dev:sqllogictest:writing_tests::philosophy}

When testing DuckDB, we aim to route all the tests through SQL. We try to avoid testing components individually because that makes those components more difficult to change later on. As such, almost all of our tests can (and should) be expressed in pure SQL. There are certain exceptions to this, which we will discuss in [Catch Tests](#docs:current:dev:sqllogictest:catch). However, in most cases you should write your tests in plain SQL.

#### Frameworks {#docs:current:dev:sqllogictest:writing_tests::frameworks}

SQL tests should be written using the [sqllogictest framework](#docs:current:dev:sqllogictest:intro).

C++ tests can be written using the [Catch framework](#docs:current:dev:sqllogictest:catch).

#### Client Connector Tests {#docs:current:dev:sqllogictest:writing_tests::client-connector-tests}

DuckDB also has tests for various client connectors. These are generally written in the relevant client language, and can be found in `tools/*/tests`.
They also double as documentation of what should be doable from a given client.

#### Functions for Generating Test Data {#docs:current:dev:sqllogictest:writing_tests::functions-for-generating-test-data}

DuckDB has built-in functions for generating test data.

##### `test_all_types` Function {#docs:current:dev:sqllogictest:writing_tests::test_all_types-function}

The `test_all_types` table function generates a table whose columns correspond to types (` BOOL`, `TINYINT`, etc.).
The table has three rows encoding the minimum value, the maximum value, and the `NULL` value for each type.

```sql
FROM test_all_types();
```

```text
┌─────────┬─────────┬──────────┬─────────────┬──────────────────────┬──────────────────────┬───┬──────────────────────┬──────────────────────┬──────────────────────┬──────────────────────┬──────────────────────┐
│  bool   │ tinyint │ smallint │     int     │        bigint        │       hugeint        │ … │        struct        │   struct_of_arrays   │   array_of_structs   │         map          │        union         │
│ boolean │  int8   │  int16   │    int32    │        int64         │        int128        │   │ struct(a integer, …  │ struct(a integer[]…  │ struct(a integer, …  │ map(varchar, varch…  │ union("name" varch…  │
├─────────┼─────────┼──────────┼─────────────┼──────────────────────┼──────────────────────┼───┼──────────────────────┼──────────────────────┼──────────────────────┼──────────────────────┼──────────────────────┤
│ false   │    -128 │   -32768 │ -2147483648 │ -9223372036854775808 │  -17014118346046923… │ … │ {'a': NULL, 'b': N…  │ {'a': NULL, 'b': N…  │ []                   │ {}                   │ Frank                │
│ true    │     127 │    32767 │  2147483647 │  9223372036854775807 │  170141183460469231… │ … │ {'a': 42, 'b': 🦆…   │ {'a': [42, 999, NU…  │ [{'a': NULL, 'b': …  │ {key1=🦆🦆🦆🦆🦆🦆…  │ 5                    │
│ NULL    │    NULL │     NULL │        NULL │                 NULL │                 NULL │ … │ NULL                 │ NULL                 │ NULL                 │ NULL                 │ NULL                 │
├─────────┴─────────┴──────────┴─────────────┴──────────────────────┴──────────────────────┴───┴──────────────────────┴──────────────────────┴──────────────────────┴──────────────────────┴──────────────────────┤
│ 3 rows                                                                                                                                                                                    44 columns (11 shown) │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
```

##### `test_vector_types` Function {#docs:current:dev:sqllogictest:writing_tests::test_vector_types-function}

The `test_vector_types` table function takes _n_ arguments `col1`, ..., `coln` and an optional `BOOLEAN` argument `all_flat`.
The function generates a table with _n_ columns `test_vector`, `test_vector2`, ..., `test_vectorn`.
In each row, each field contains values conforming to the type of their respective column.

```sql
FROM test_vector_types(NULL::BIGINT);
```

```text
┌──────────────────────┐
│     test_vector      │
│        int64         │
├──────────────────────┤
│ -9223372036854775808 │
│  9223372036854775807 │
│                 NULL │
│         ...          │
└──────────────────────┘
```

```sql
FROM test_vector_types(NULL::ROW(i INTEGER, j VARCHAR, k DOUBLE), NULL::TIMESTAMP);
```

```text
┌──────────────────────────────────────────────────────────────────────┬──────────────────────────────┐
│                             test_vector                              │         test_vector2         │
│                struct(i integer, j varchar, k double)                │          timestamp           │
├──────────────────────────────────────────────────────────────────────┼──────────────────────────────┤
│ {'i': -2147483648, 'j': 🦆🦆🦆🦆🦆🦆, 'k': -1.7976931348623157e+308} │ 290309-12-22 (BC) 00:00:00   │
│ {'i': 2147483647, 'j': goo\0se, 'k': 1.7976931348623157e+308}        │ 294247-01-10 04:00:54.775806 │
│ {'i': NULL, 'j': NULL, 'k': NULL}                                    │ NULL                         │
│                                                  ...                                                │
└─────────────────────────────────────────────────────────────────────────────────────────────────────┘
```

`test_vector_types` has an optional argument called `all_flat` of type `BOOL`. This only affects the internal representation of the vector.

```sql
FROM test_vector_types(NULL::ROW(i INTEGER, j VARCHAR, k DOUBLE), NULL::TIMESTAMP, all_flat = true);
-- the output is the same as above but with a different internal representation
```

### Debugging {#docs:current:dev:sqllogictest:debugging}

The purpose of the tests is to figure out when things break. Inevitably changes made to the system will cause one of the tests to fail, and when that happens the test needs to be debugged.

First, it is always recommended to run in debug mode. This can be done by compiling the system using the command `make debug`. Second, it is recommended to only run the test that breaks. This can be done by passing the filename of the breaking test to the test suite as a command line parameter (e.g., `build/debug/test/unittest test/sql/projection/test_simple_projection.test`). For more options on running a subset of the tests see the [Triggering which tests to run](#::triggering-which-tests-to-run) section.

After that, a debugger can be attached to the program and the test can be debugged. In the sqllogictests it is normally difficult to break on a specific query, however, we have expanded the test suite so that a function called `query_break` is called with the line number `line` as parameter for every query that is run. This allows you to put a conditional breakpoint on a specific query. For example, if we want to break on line number 43 of the test file we can create the following break point:

```text
gdb: break query_break if line==43
lldb: break s -n query_break -c line==43
```

You can also skip certain queries from executing by placing `mode skip` in the file, followed by an optional `mode unskip`. Any queries between the two statements will not be executed.

#### Triggering Which Tests to Run {#docs:current:dev:sqllogictest:debugging::triggering-which-tests-to-run}

When running the unittest program, by default all the fast tests are run. A specific test can be run by adding the name of the test as an argument. For the sqllogictests, this is the relative path to the test file.
To run only a single test:

```batch
build/debug/test/unittest test/sql/projection/test_simple_projection.test
```

All tests in a given directory can be executed by providing the directory as a parameter with square brackets.
To run all tests in the “projection” directory:

```batch
build/debug/test/unittest "[projection]"
```

All tests, including the slow tests, can be run by running the tests with an asterisk.
To run all tests, including the slow tests:

```batch
build/debug/test/unittest "*"
```

We can run a subset of the tests using the `--start-offset` and `--end-offset` parameters.
To run the tests 200..250:

```batch
build/debug/test/unittest --start-offset=200 --end-offset=250
```

These are also available in percentages. To run tests 10% - 20%:

```batch
build/debug/test/unittest --start-offset-percentage=10 --end-offset-percentage=20
```

The set of tests to run can also be loaded from a file containing one test name per line, and loaded using the `-f` command.

```batch
cat test.list
```

```text
test/sql/join/full_outer/test_full_outer_join_issue_4252.test
test/sql/join/full_outer/full_outer_join_cache.test
test/sql/join/full_outer/test_full_outer_join.test
```

To run only the tests labeled in the file:

```batch
build/debug/test/unittest -f test.list
```

### Result Verification {#docs:current:dev:sqllogictest:result_verification}

The standard way of verifying results of queries is using the `query` statement, followed by the letter `I` times the number of columns that are expected in the result. After the query, four dashes (` ----`) are expected followed by the result values separated by tabs. For example,

```sql
query II
SELECT 42, 84 UNION ALL SELECT 10, 20;
----
42	84
10	20
```

For legacy reasons the letters `R` and `T` are also accepted to denote columns.

> **Deprecated.** DuckDB deprecated the usage of types in the sqllogictest. The DuckDB test runner does not use or need them internally – therefore, only `I` should be used to denote columns.

#### NULL Values and Empty Strings {#docs:current:dev:sqllogictest:result_verification::null-values-and-empty-strings}

Empty lines have special significance for the SQLLogic test runner: they signify an end of the current statement or query. For that reason, empty strings and NULL values have special syntax that must be used in result verification. NULL values should use the string `NULL`, and empty strings should use the string `(empty)`, e.g.:

```sql
query II
SELECT NULL, ''
----
NULL
(empty)
```

#### Error Verification {#docs:current:dev:sqllogictest:result_verification::error-verification}

In order to signify that an error is expected, the `statement error` indicator can be used. The `statement error` also takes an optional expected result – which is interpreted as the *expected error message*. Similar to `query`, the expected error should be placed after the four dashes (` ----`) following the query. The test passes if the error message *contains* the text under `statement error` – the entire error message does not need to be provided. It is recommended that you only use a subset of the error message, so that the test does not break unnecessarily if the formatting of error messages is changed.

```sql
statement error
SELECT * FROM non_existent_table;
----
Table with name non_existent_table does not exist!
```

#### Regex {#docs:current:dev:sqllogictest:result_verification::regex}

In certain cases result values might be very large or complex, and we might only be interested in whether or not the result *contains* a snippet of text. In that case, we can use the `<REGEX>:` modifier followed by a certain regex. If the result value matches the regex the test is passed. This is primarily used for query plan analysis.

```sql
query II
EXPLAIN SELECT tbl.a FROM 'data/parquet-testing/arrow/alltypes_plain.parquet' tbl(a) WHERE a = 1 OR a = 2
----
physical_plan	<REGEX>:.*PARQUET_SCAN.*Filters: a=1 OR a=2.*
```

If we instead want the result *not* to contain a snippet of text, we can use the `<!REGEX>:` modifier.

#### File {#docs:current:dev:sqllogictest:result_verification::file}

As results can grow quite large, and we might want to re-use results over multiple files, it is also possible to read expected results from files using the `<FILE>` command. The expected result is read from the given file. As convention the file path should be provided as relative to the root of the GitHub repository.

```sql
query I
PRAGMA tpch(1)
----
<FILE>:extension/tpch/dbgen/answers/sf1/q01.csv
```

#### Row-Wise vs. Value-Wise Result Ordering {#docs:current:dev:sqllogictest:result_verification::row-wise-vs-value-wise-result-ordering}

The result values of a query can be either supplied in row-wise order, with the individual values separated by tabs, or in value-wise order. In value wise order the individual *values* of the query must appear in row, column order each on an individual line. Consider the following example in both row-wise and value-wise order:

```sql
# row-wise
query II
SELECT 42, 84 UNION ALL SELECT 10, 20;
----
42	84
10	20

# value-wise
query II
SELECT 42, 84 UNION ALL SELECT 10, 20;
----
42
84
10
20
```

#### Hashes and Outputting Values {#docs:current:dev:sqllogictest:result_verification::hashes-and-outputting-values}

Besides direct result verification, the sqllogic test suite also has the option of using MD5 hashes for value comparisons. A test using hashes for result verification looks like this:

```sql
query I
SELECT g, string_agg(x,',') FROM strings GROUP BY g
----
200 values hashing to b8126ea73f21372cdb3f2dc483106a12
```

This approach is useful for reducing the size of tests when results have many output rows. However, it should be used sparingly, as hash values make the tests more difficult to debug if they do break.

After it is ensured that the system outputs the correct result, hashes of the queries in a test file can be computed by adding `mode output_hash` to the test file. For example:

```sql
mode output_hash

query II
SELECT 42, 84 UNION ALL SELECT 10, 20;
----
42	84
10	20
```

The expected output hashes for every query in the test file will then be printed to the terminal, as follows:

```text
================================================================================
SQL Query
SELECT 42, 84 UNION ALL SELECT 10, 20;
================================================================================
4 values hashing to 498c69da8f30c24da3bd5b322a2fd455
================================================================================
```

In a similar manner, `mode output_result` can be used to force the program to print the result to the terminal for every query run in the test file.

#### Result Sorting {#docs:current:dev:sqllogictest:result_verification::result-sorting}

Queries can have an optional field that indicates that the result should be sorted in a specific manner. This field goes in the same location as the connection label. Because of that, connection labels and result sorting cannot be mixed.

The possible values of this field are `nosort`, `rowsort` and `valuesort`. An example of how this might be used is given below:

```sql
query I rowsort
SELECT 'world' UNION ALL SELECT 'hello'
----
hello
world
```

In general, we prefer not to use this field and rely on `ORDER BY` in the query to generate deterministic query answers. However, existing sqllogictests use this field extensively, hence it is important to know of its existence.

#### Query Labels {#docs:current:dev:sqllogictest:result_verification::query-labels}

Another feature that can be used for result verification are `query labels`. These can be used to verify that different queries provide the same result. This is useful for comparing queries that are logically equivalent, but formulated differently. Query labels are provided after the connection label or sorting specifier.

Queries that have a query label do not need to have a result provided. Instead, the results of each of the queries with the same label are compared to each other. For example, the following script verifies that the queries `SELECT 42+1` and `SELECT 44-1` provide the same result:

```sql
query I nosort r43
SELECT 42+1;
----

query I nosort r43
SELECT 44-1;
----
```

### Persistent Testing {#docs:current:dev:sqllogictest:persistent_testing}

By default, all tests are run in in-memory mode (unless `--force-storage` is enabled). In certain cases, we want to force the usage of a persistent database. We can initiate a persistent database using the `load` command, and trigger a reload of the database using the `restart` command.

```sql
# load the DB from disk
load __TEST_DIR__/storage_scan.db

statement ok
CREATE TABLE test (a INTEGER);

statement ok
INSERT INTO test VALUES (11), (12), (13), (14), (15), (NULL)

# ...

restart

query I
SELECT * FROM test ORDER BY a
----
NULL
11
12
13
14
15
```

Note that by default the tests run with `SET wal_autocheckpoint = '0KB'` – meaning a checkpoint is triggered after every statement. WAL tests typically run with the following settings to disable this behavior:

```sql
statement ok
PRAGMA disable_checkpoint_on_shutdown

statement ok
PRAGMA wal_autocheckpoint = '1TB'
```

### Loops {#docs:current:dev:sqllogictest:loops}

Loops can be used in sqllogictests when it is required to execute the same query many times but with slight modifications in constant values. For example, suppose we want to fire off 100 queries that check for the presence of the values `0..100` in a table:

```sql
# create the table 'integers' with values 0..100
statement ok
CREATE TABLE integers AS SELECT * FROM range(0, 100, 1) t1(i);

# verify individually that all 100 values are there
loop i 0 100

# execute the query, replacing the value
query I
SELECT count(*) FROM integers WHERE i = ${i};
----
1

# end the loop (note that multiple statements can be part of a loop)
endloop
```

Similarly, `foreach` can be used to iterate over a set of values.

```sql
foreach partcode millennium century decade year quarter month day hour minute second millisecond microsecond epoch

query III
SELECT i, date_part('${partcode}', i) AS p, date_part(['${partcode}'], i) AS st
FROM intervals
WHERE p <> st['${partcode}'];
----

endloop
```

`foreach` also has a number of preset combinations that should be used when required. In this manner, when new combinations are added to the preset, old tests will automatically pick up these new combinations.



|     Preset     |                          Expansion                           |
|----------------|--------------------------------------------------------------|
| ⟨compression⟩  | none uncompressed rle bitpacking dictionary fsst chimp patas |
| ⟨signed⟩       | tinyint smallint integer bigint hugeint                      |
| ⟨unsigned⟩     | utinyint usmallint uinteger ubigint uhugeint                 |
| ⟨integral⟩     | ⟨signed⟩ ⟨unsigned⟩                                          |
| ⟨numeric⟩      | ⟨integral⟩ float double                                      |
| ⟨alltypes⟩     | ⟨numeric⟩ bool interval varchar json                         |

> Use large loops sparingly. Executing hundreds of thousands of SQL statements will slow down tests unnecessarily. Do not use loops for inserting data.

#### Data Generation without Loops {#docs:current:dev:sqllogictest:loops::data-generation-without-loops}

Loops should be used sparingly. While it might be tempting to use loops for inserting data using insert statements, this will considerably slow down the test cases. Instead, it is better to generate data using the built-in `range` and `repeat` functions.

To create the table `integers` with the values `[0, 1, ..., 98,  99]`, run:

```sql
CREATE TABLE integers AS SELECT * FROM range(0, 100, 1) t1(i);
```

To create the table `strings` with 100 times the value `hello`, run:

```sql
CREATE TABLE strings AS SELECT * FROM repeat('hello', 100) t1(s);
```

Using these two functions, together with clever use of cross products and other expressions, many different types of datasets can be efficiently generated. The `random()` function can also be used to generate random data.

An alternative option is to read data from an existing CSV or Parquet file. There are several large CSV files that can be loaded from the directory `test/sql/copy/csv/data/real` using a `COPY INTO` statement or the `read_csv_auto` function.

The TPC-H and TPC-DS extensions can also be used to generate synthetic data, using e.g. `CALL dbgen(sf = 1)` or `CALL dsdgen(sf = 1)`.

### Multiple Connections {#docs:current:dev:sqllogictest:multiple_connections}

For tests whose purpose is to verify that the transactional management or versioning of data works correctly, it is generally necessary to use multiple connections. For example, if we want to verify that the creation of tables is correctly transactional, we might want to start a transaction and create a table in `con1`, then fire a query in `con2` that checks that the table is not accessible yet until committed.

We can use multiple connections in the sqllogictests using `connection labels`. The connection label can be optionally appended to any `statement` or `query`. All queries with the same connection label will be executed in the same connection. A test that would verify the above property would look as follows:

```sql
statement ok con1
BEGIN TRANSACTION

statement ok con1
CREATE TABLE integers (i INTEGER);

statement error con2
SELECT * FROM integers;
```

#### Concurrent Connections {#docs:current:dev:sqllogictest:multiple_connections::concurrent-connections}

Using connection modifiers on the statement and queries will result in testing of multiple connections, but all the queries will still be run *sequentially* on a single thread. If we want to run code from multiple connections *concurrently* over multiple threads, we can use the `concurrentloop` construct. The queries in `concurrentloop` will be run concurrently on separate threads at the same time.

```sql
concurrentloop i 0 10

statement ok
CREATE TEMP TABLE t2 AS (SELECT 1);

statement ok
INSERT INTO t2 VALUES (42);

statement ok
DELETE FROM t2

endloop
```

One caveat with `concurrentloop` is that results are often unpredictable – as multiple clients can hammer the database at the same time we might end up with (expected) transaction conflicts. `statement maybe` can be used to deal with these situations. `statement maybe` essentially accepts both a success, and a failure with a specific error message.

```sql
concurrentloop i 1 10

statement maybe
CREATE OR REPLACE TABLE t2 AS (SELECT -54124033386577348004002656426531535114 FROM t2 LIMIT 70%);
----
write-write conflict

endloop
```

### Catch C/C++ Tests {#docs:current:dev:sqllogictest:catch}

While we prefer the sqllogic tests for testing most functionality, for certain tests only SQL is not sufficient. This typically happens when you want to test the C++ API. When using pure SQL is really not an option it might be necessary to make a C++ test using Catch.

Catch tests reside in the test directory as well. Here is an example of a catch test that tests the storage of the system:

```cpp
#include "catch.hpp"
#include "test_helpers.hpp"

TEST_CASE("Test simple storage", "[storage]") {
    auto config = GetTestConfig();
    unique_ptr<QueryResult> result;
    auto storage_database = TestCreatePath("storage_test");

    // make sure the database does not exist
    DeleteDatabase(storage_database);
    {
        // create a database and insert values
        DuckDB db(storage_database, config.get());
        Connection con(db);
        REQUIRE_NO_FAIL(con.Query("CREATE TABLE test (a INTEGER, b INTEGER);"));
        REQUIRE_NO_FAIL(con.Query("INSERT INTO test VALUES (11, 22), (13, 22), (12, 21), (NULL, NULL)"));
        REQUIRE_NO_FAIL(con.Query("CREATE TABLE test2 (a INTEGER);"));
        REQUIRE_NO_FAIL(con.Query("INSERT INTO test2 VALUES (13), (12), (11)"));
    }
    // reload the database from disk a few times
    for (idx_t i = 0; i < 2; i++) {
        DuckDB db(storage_database, config.get());
        Connection con(db);
        result = con.Query("SELECT * FROM test ORDER BY a");
        REQUIRE(CHECK_COLUMN(result, 0, {Value(), 11, 12, 13}));
        REQUIRE(CHECK_COLUMN(result, 1, {Value(), 22, 21, 22}));
        result = con.Query("SELECT * FROM test2 ORDER BY a");
        REQUIRE(CHECK_COLUMN(result, 0, {11, 12, 13}));
    }
    DeleteDatabase(storage_database);
}
```

The test uses the `TEST_CASE` wrapper to create each test. The database is created and queried using the C++ API. Results are checked using either `REQUIRE_FAIL` / `REQUIRE_NO_FAIL` (corresponding to statement ok and statement error) or `REQUIRE(CHECK_COLUMN(...))` (corresponding to query with a result check). Every test that is created in this way needs to be added to the corresponding `CMakeLists.txt`.

# Internals {#internals}

## Overview of DuckDB Internals {#docs:current:internals:overview}

On this page is a brief description of the internals of the DuckDB engine.

#### Parser {#docs:current:internals:overview::parser}

The parser converts a query string into the following tokens:

* [`SQLStatement`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/parser/sql_statement.hpp)
* [`QueryNode`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/parser/query_node.hpp)
* [`TableRef`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/parser/tableref.hpp)
* [`ParsedExpression`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/parser/parsed_expression.hpp)

The parser is not aware of the catalog or any other aspect of the database. It will not throw errors if tables do not exist, and will not resolve **any** types of columns yet. It only transforms a query string into a set of tokens as specified.

##### ParsedExpression {#docs:current:internals:overview::parsedexpression}

The ParsedExpression represents an expression within a SQL statement. This can be e.g., a reference to a column, an addition operator or a constant value. The type of the ParsedExpression indicates what it represents, e.g., a comparison is represented as a [`ComparisonExpression`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/parser/expression/comparison_expression.hpp).

ParsedExpressions do **not** have types, except for nodes with explicit types such as `CAST` statements. The types for expressions are resolved in the Binder, not in the Parser.

##### TableRef {#docs:current:internals:overview::tableref}

The TableRef represents any table source. This can be a reference to a base table, but it can also be a join, a table-producing function or a subquery.

##### QueryNode {#docs:current:internals:overview::querynode}

The QueryNode represents either (1) a `SELECT` statement, or (2) a set operation (i.e., `UNION`, `INTERSECT` or `DIFFERENCE`).

##### SQL Statement {#docs:current:internals:overview::sql-statement}

The SQLStatement represents a complete SQL statement. The type of the SQL Statement represents what kind of statement it is (e.g., `StatementType::SELECT` represents a `SELECT` statement). A single SQL string can be transformed into multiple SQL statements in case the original query string contains multiple queries.

#### Binder {#docs:current:internals:overview::binder}

The binder converts all nodes into their **bound** equivalents. In the binder phase:

* The tables and columns are resolved using the catalog
* Types are resolved
* Aggregate/window functions are extracted

The following conversions happen:

* SQLStatement → [`BoundStatement`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/planner/bound_statement.hpp)
* QueryNode → [`BoundQueryNode`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/planner/bound_query_node.hpp)
* TableRef → [`BoundTableRef`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/planner/bound_tableref.hpp)
* ParsedExpression → [`Expression`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/planner/expression.hpp)

#### Logical Planner {#docs:current:internals:overview::logical-planner}

The logical planner creates [`LogicalOperator`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/planner/logical_operator.hpp) nodes from the bound statements. In this phase, the actual logical query tree is created.

#### Optimizer {#docs:current:internals:overview::optimizer}

After the logical planner has created the logical query tree, the optimizers are run over that query tree to create an optimized query plan. The following query optimizers are run:

* **Expression Rewriter**: Simplifies expressions, performs constant folding
* **Filter Pushdown**: Pushes filters down into the query plan and duplicates filters over equivalency sets. Also prunes subtrees that are guaranteed to be empty (because of filters that statically evaluate to false).
* **Join Order Optimizer**: Reorders joins using dynamic programming. Specifically, the `DPccp` algorithm from the paper [Dynamic Programming Strikes Back](https://15721.courses.cs.cmu.edu/spring2017/papers/14-optimizer1/p539-moerkotte.pdf) is used.
* **Common Sub Expressions**: Extracts common subexpressions from projection and filter nodes to prevent unnecessary duplicate execution.
* **In Clause Rewriter**: Rewrites large static IN clauses to a MARK join or INNER join.

#### Column Binding Resolver {#docs:current:internals:overview::column-binding-resolver}

The column binding resolver converts logical [`BoundColumnRefExpression`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/planner/expression/bound_columnref_expression.hpp) nodes that refer to a column of a specific table into [`BoundReferenceExpression`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/planner/expression/bound_reference_expression.hpp) nodes that refer to a specific index into the DataChunks that are passed around in the execution engine.

#### Physical Plan Generator {#docs:current:internals:overview::physical-plan-generator}

The physical plan generator converts the resulting logical operator tree into a [`PhysicalOperator`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/execution/physical_operator.hpp) tree.

#### Execution {#docs:current:internals:overview::execution}

In the execution phase, the physical operators are executed to produce the query result.
DuckDB uses a push-based vectorized model, where [`DataChunks`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/common/types/data_chunk.hpp) are pushed through the operator tree.
For more information, see the talk [Push-Based Execution in DuckDB](https://www.youtube.com/watch?v=1kDrPgRUuEI).

## Storage Versions and Format {#docs:current:internals:storage}

#### Compatibility {#docs:current:internals:storage::compatibility}

##### Backward Compatibility {#docs:current:internals:storage::backward-compatibility}

_Backward compatibility_ refers to the ability of a newer DuckDB version to read storage files created by an older DuckDB version. Version 0.10 is the first release of DuckDB that supports backward compatibility in the storage format. DuckDB v0.10 can read and operate on files created by the previous DuckDB version – DuckDB v0.9.

For future DuckDB versions, our goal is to ensure that any DuckDB version released **after** can read files created by previous versions, starting from this release. We want to ensure that the file format is fully backward compatible. This allows you to keep data stored in DuckDB files around and guarantees that you will be able to read the files without having to worry about which version the file was written with or having to convert files between versions.

##### Forward Compatibility {#docs:current:internals:storage::forward-compatibility}

_Forward compatibility_ refers to the ability of an older DuckDB version to read storage files produced by a newer DuckDB version. DuckDB v0.9 is [**partially** forward compatible with DuckDB v0.10](https://duckdb.org/2024/02/13/announcing-duckdb-0100#forward-compatibility). Certain files created by DuckDB v0.10 can be read by DuckDB v0.9.

Forward compatibility is provided on a **best effort** basis. While stability of the storage format is important – there are still many improvements and innovations that we want to make to the storage format in the future. As such, forward compatibility may be (partially) broken on occasion.

#### How to Move between Storage Formats {#docs:current:internals:storage::how-to-move-between-storage-formats}

When you update DuckDB and open an old database file, you might encounter an error message about incompatible storage formats, pointing to this page.
To move your database(s) to newer format you only need the older and the newer DuckDB executable.

Open your database file with the older DuckDB and run the SQL statement `EXPORT DATABASE 'tmp'`. This allows you to save the whole state of the current database in use inside folder `tmp`.
The content of the `tmp` folder will be overridden, so choose an empty/non yet existing location. Then, start the newer DuckDB and execute `IMPORT DATABASE 'tmp'` (pointing to the previously populated folder) to load the database, which can be then saved to the file you pointed DuckDB to.

A Bash script to achieve this (to be adapted with the file names and executable locations) is the following

```bash
/older/duckdb mydata.old.db -c "EXPORT DATABASE 'tmp'"
/newer/duckdb mydata.new.db -c "IMPORT DATABASE 'tmp'"
```

After this, `mydata.old.db` will remain in the old format, `mydata.new.db` will contain the same data but in a format accessible by the more recent DuckDB version, and the folder `tmp` will hold the same data in a universal format as different files.

Check [`EXPORT` documentation](#docs:current:sql:statements:export) for more details on the syntax.

##### Explicit Storage Versions {#docs:current:internals:storage::explicit-storage-versions}

[DuckDB v1.2.0 introduced the `STORAGE_VERSION` option](https://duckdb.org/2025/02/05/announcing-duckdb-120#explicit-storage-versions), which allows explicitly specifying the storage version.
Using this, you can opt-in to newer forwards-incompatible features:

```sql
ATTACH 'file.db' (STORAGE_VERSION 'v1.2.0');
```

With the [command line client](#docs:current:clients:cli:overview), you can use the `-storage-version` argument:

```batch
duckdb -storage-version v1.2.0 my_database.duckdb
```

The storage version setting specifies the minimum DuckDB version that should be able to read the database file. When database files are written with this option, the resulting files cannot be opened by older DuckDB released versions than the specified version. They can be read by the specified version and all newer versions of DuckDB.

If you attach to DuckDB databases, you can query the storage versions using the following command:

```sql
SELECT database_name, tags
FROM duckdb_databases();
```

This shows the storage versions:

```text
┌───────────────┬───────────────────────────────────┐
│ database_name │               tags                │
│    varchar    │       map(varchar, varchar)       │
├───────────────┼───────────────────────────────────┤
│ file1         │ {storage_version=v1.2.0}          │
│ file2         │ {storage_version=v1.0.0 - v1.1.3} │
│ ...           │ ...                               │
└───────────────┴───────────────────────────────────┘
```

This means that `file2` can be opened by past DuckDB versions while `file1` is compatible only with `v1.2.0` (or future versions).

The `storage_compatibility_version` [configuration option](#docs:current:configuration:overview::configuration-reference) can also be used to specify the storage version to use. It can be specified in various ways, for example at connect time using the Python bindings it looks as follows:

```python
duckdb.connect("file.db", config={'storage_compatibility_version': 'latest'})
```

When using the [commandline client](#docs:current:clients:cli:overview), the storage version can be specified using the `-storage-version` option.

##### Converting between Storage Versions {#docs:current:internals:storage::converting-between-storage-versions}

To convert from the new format to the old format for compatibility, use the following sequence in DuckDB v1.2.0+:

```sql
ATTACH 'file1.db';
ATTACH 'converted_file.db' (STORAGE_VERSION 'v1.0.0');
COPY FROM DATABASE file1 TO converted_file;
```

#### Storage Header {#docs:current:internals:storage::storage-header}

DuckDB files start with a `uint64_t` which contains a checksum for the main header, followed by four magic bytes (` DUCK`), followed by the storage version number in a `uint64_t`.

```batch
hexdump -n 20 -C mydata.db
```

```text
00000000  01 d0 e2 63 9c 13 39 3e  44 55 43 4b 2b 00 00 00  |...c..9>DUCK+...|
00000010  00 00 00 00                                       |....|
00000014
```

A simple example of reading the storage version using Python is below.

```python
import struct

pattern = struct.Struct('<8x4sQ')

with open('test/sql/storage_version/storage_version.db', 'rb') as fh:
    print(pattern.unpack(fh.read(pattern.size)))
```

#### Storage Version Table {#docs:current:internals:storage::storage-version-table}

For changes in each given release, check out the [change log](https://github.com/duckdb/duckdb/releases) on GitHub.
To see the commits that changed each storage version, see the [commit log](https://github.com/duckdb/duckdb/commits/main/src/storage/storage_info.cpp).

| Storage version | DuckDB version(s)               |
|----------------:|---------------------------------|
| 68              | v1.5.x                          |
| 67              | v1.4.x                          |
| 66              | v1.3.x                          |
| 65              | v1.2.x                          |
| 64              | v0.9.x, v0.10.x, v1.0.0, v1.1.x |
| 51              | v0.8.x                          |
| 43              | v0.7.x                          |
| 39              | v0.6.x                          |
| 38              | v0.5.x                          |
| 33              | v0.3.3, v0.3.4, v0.4.0          |
| 31              | v0.3.2                          |
| 27              | v0.3.1                          |
| 25              | v0.3.0                          |
| 21              | v0.2.9                          |
| 18              | v0.2.8                          |
| 17              | v0.2.7                          |
| 15              | v0.2.6                          |
| 13              | v0.2.5                          |
| 11              | v0.2.4                          |
| 6               | v0.2.3                          |
| 4               | v0.2.2                          |
| 1               | v0.2.1 and prior                |

#### Compression {#docs:current:internals:storage::compression}

DuckDB uses [lightweight compression](https://duckdb.org/2022/10/28/lightweight-compression).
By default, compression is only applied to persistent databases and is **not applied to in-memory instances**.
To turn on compression for in-memory databases, use `ATTACH` with the [`COMPRESS` option](#docs:current:sql:statements:attach::options).

Note that available compression algorithms depend on the storage version used, so you might need to set an explicit storage version to get access to all compression algorithms.

##### Compression Algorithms {#docs:current:internals:storage::compression-algorithms}

The compression algorithms supported by DuckDB include the following:

* [Constant Encoding](https://duckdb.org/2022/10/28/lightweight-compression#constant-encoding)
* [Run-Length Encoding (RLE)](https://duckdb.org/2022/10/28/lightweight-compression#run-length-encoding-rle)
* [Bit Packing](https://duckdb.org/2022/10/28/lightweight-compression#bit-packing)
* [Frame of Reference (FOR)](https://duckdb.org/2022/10/28/lightweight-compression#frame-of-reference)
* [Dictionary Encoding](https://duckdb.org/2022/10/28/lightweight-compression#dictionary-encoding)
* [Fast Static Symbol Table (FSST)](https://duckdb.org/2022/10/28/lightweight-compression#fsst) – [VLDB 2020 paper](https://www.vldb.org/pvldb/vol13/p2649-boncz.pdf)
* [Adaptive Lossless Floating-Point Compression (ALP)](https://duckdb.org/2024/02/13/announcing-duckdb-0100#adaptive-lossless-floating-point-compression-alp) – [SIGMOD 2024 paper](https://ir.cwi.nl/pub/33334/33334.pdf)
* [Chimp](https://duckdb.org/2022/10/28/lightweight-compression#chimp--patas) – [VLDB 2022 paper](https://www.vldb.org/pvldb/vol15/p3058-liakos.pdf)
* [Patas](https://duckdb.org/2022/11/14/announcing-duckdb-060#compression-improvements)
* [Zstd](https://duckdb.org/2025/02/05/announcing-duckdb-120#zstd-compression)

#### Disk Usage {#docs:current:internals:storage::disk-usage}

The disk usage of DuckDB's format depends on a number of factors, including the data type and the data distribution, the compression methods used, etc.
As a rough approximation, loading 100 GB of uncompressed CSV files into a DuckDB database file will require 25 GB of disk space, while loading 100 GB of Parquet files will require 120 GB of disk space.

#### Row Groups {#docs:current:internals:storage::row-groups}

DuckDB's storage format stores the data in _row groups,_ i.e., horizontal partitions of the data.
This concept is equivalent to [Parquet's row groups](https://parquet.apache.org/docs/concepts/).
Several features in DuckDB, including [parallelism](#docs:current:guides:performance:how_to_tune_workloads) and [compression](https://duckdb.org/2022/10/28/lightweight-compression) are based on row groups.

The row group size can be specified as an option of the `ATTACH` statement: 

```sql
ATTACH '/tmp/somefile.db' AS db (ROW_GROUP_SIZE 16384);
```

#### Troubleshooting {#docs:current:internals:storage::troubleshooting}

##### Error Message When Opening an Incompatible Database File {#docs:current:internals:storage::error-message-when-opening-an-incompatible-database-file}

When opening a database file that has been written by a different DuckDB version from the one you are using, the following error message may occur:

```console
Error: unable to open database "...": Serialization Error: Failed to deserialize: ...
```

The message implies that the database file was created with a newer DuckDB version and uses features that are backward incompatible with the DuckDB version used to read the file.

There are two potential workarounds:

1. Update your DuckDB version to the latest stable version.
2. Open the database with the latest version of DuckDB, export it to a standard format (e.g., Parquet), then import it to any version of DuckDB. See the [`EXPORT/IMPORT DATABASE` statements](#docs:current:sql:statements:export) for details.

## Execution Format {#docs:current:internals:vector}

`Vector` is the container format used to store in-memory data during execution.
`DataChunk` is a collection of Vectors, used for instance to represent a column list in a `PhysicalProjection` operator.

#### Data Flow {#docs:current:internals:vector::data-flow}

DuckDB uses a vectorized query execution model.
All operators in DuckDB are optimized to work on Vectors of a fixed size.

This fixed size is commonly referred to in the code as `STANDARD_VECTOR_SIZE`.
The default `STANDARD_VECTOR_SIZE` is 2048 tuples.

#### Vector Format {#docs:current:internals:vector::vector-format}

Vectors logically represent arrays that contain data of a single type. DuckDB supports different *vector formats*, which allow the system to store the same logical data with a different *physical representation*. This allows for a more compressed representation, and potentially allows for compressed execution throughout the system. Below the list of supported vector formats is shown.

##### Flat Vectors {#docs:current:internals:vector::flat-vectors}

Flat vectors are physically stored as a contiguous array, this is the standard uncompressed vector format.
For flat vectors the logical and physical representations are identical.

![](../images/internals/flat.png)


##### Constant Vectors {#docs:current:internals:vector::constant-vectors}

Constant vectors are physically stored as a single constant value.

![](../images/internals/constant.png)


Constant vectors are useful when data elements are repeated – for example, when representing the result of a constant expression in a function call, the constant vector allows us to only store the value once.

```sql
SELECT lst || 'duckdb'
FROM range(1000) tbl(lst);
```

Since `duckdb` is a string literal, the value of the literal is the same for every row. In a flat vector, we would have to duplicate the literal 'duckdb' once for every row. The constant vector allows us to only store the literal once.

Constant vectors are also emitted by the storage when decompressing from constant compression.

##### Dictionary Vectors {#docs:current:internals:vector::dictionary-vectors}

Dictionary vectors are physically stored as a child vector, and a selection vector that contains indexes into the child vector.

![](../images/internals/dictionary.png)


Dictionary vectors are emitted by the storage when decompressing from dictionary compression.

Just like constant vectors, dictionary vectors are also emitted by the storage.
When deserializing a dictionary compressed column segment, we store this in a dictionary vector so we can keep the data compressed during query execution.

##### Sequence Vectors {#docs:current:internals:vector::sequence-vectors}

Sequence vectors are physically stored as an offset and an increment value.

![](../images/internals/sequence.png)


Sequence vectors are useful for efficiently storing incremental sequences. They are generally emitted for row identifiers.

##### Unified Vector Format {#docs:current:internals:vector::unified-vector-format}

These properties of the different vector formats are great for optimization purposes, for example you can imagine the scenario where all the parameters to a function are constant, we can just compute the result once and emit a constant vector.
But writing specialized code for every combination of vector types for every function is unfeasible due to the combinatorial explosion of possibilities.

Instead of doing this, whenever you want to generically use a vector regardless of the type, the UnifiedVectorFormat can be used.
This format essentially acts as a generic view over the contents of the Vector. Every type of Vector can convert to this format.

#### Complex Types {#docs:current:internals:vector::complex-types}

##### String Vectors {#docs:current:internals:vector::string-vectors}

To efficiently store strings, we make use of our `string_t` class.

```cpp
struct string_t {
    union {
        struct {
            uint32_t length;
            char prefix[4];
            char *ptr;
        } pointer;
        struct {
            uint32_t length;
            char inlined[12];
        } inlined;
    } value;
};
```

Short strings (` <= 12 bytes`) are inlined into the structure, while larger strings are stored with a pointer to the data in the auxiliary string buffer. The length is used throughout the functions to avoid having to call `strlen` and having to continuously check for null-pointers. The prefix is used for comparisons as an early out (when the prefix does not match, we know the strings are not equal and don't need to chase any pointers).

##### List Vectors {#docs:current:internals:vector::list-vectors}

List vectors are stored as a series of *list entries* together with a child Vector. The child vector contains the *values* that are present in the list, and the list entries specify how each individual list is constructed.

```cpp
struct list_entry_t {
    idx_t offset;
    idx_t length;
};
```

The offset refers to the start row in the child Vector, the length keeps track of the size of the list of this row.

List vectors can be stored recursively. For nested list vectors, the child of a list vector is again a list vector.

For example, consider this mock representation of a Vector of type `BIGINT[][]`:

```json
{
   "type": "list",
   "data": "list_entry_t",
   "child": {
      "type": "list",
      "data": "list_entry_t",
      "child": {
         "type": "bigint",
         "data": "int64_t"
      }
   }
}
```

##### Struct Vectors {#docs:current:internals:vector::struct-vectors}

Struct vectors store a list of child vectors. The number and types of the child vectors is defined by the schema of the struct.

##### Map Vectors {#docs:current:internals:vector::map-vectors}

Internally map vectors are stored as a `LIST[STRUCT(key KEY_TYPE, value VALUE_TYPE)]`.

##### Union Vectors {#docs:current:internals:vector::union-vectors}

Internally `UNION` utilizes the same structure as a `STRUCT`.
The first “child” is always occupied by the Tag Vector of the `UNION`, which records for each row which of the `UNION`'s types apply to that row.

## Pivot Internals {#docs:current:internals:pivot}

#### `PIVOT` {#docs:current:internals:pivot::pivot}

[Pivoting](#docs:current:sql:statements:pivot) is implemented as a combination of SQL query re-writing and a dedicated `PhysicalPivot` operator for higher performance.
Each `PIVOT` is implemented as set of aggregations into lists and then the dedicated `PhysicalPivot` operator converts those lists into column names and values.
Additional pre-processing steps are required if the columns to be created when pivoting are detected dynamically (which occurs when the `IN` clause is not in use).

DuckDB, like most SQL engines, requires that all column names and types be known at the start of a query.
To automatically detect the columns that should be created as a result of a `PIVOT` statement, it must be translated into multiple queries.
[`ENUM` types](#docs:current:sql:data_types:enum) are used to find the distinct values that should become columns.
Each `ENUM` is then injected into one of the `PIVOT` statement's `IN` clauses.

After the `IN` clauses have been populated with `ENUM`s, the query is re-written again into a set of aggregations into lists.

For example:

```sql
PIVOT cities
ON year
USING sum(population);
```

is initially translated into:

```sql
CREATE TEMPORARY TYPE __pivot_enum_0_0 AS ENUM (
    SELECT DISTINCT
        year::VARCHAR
    FROM cities
    ORDER BY
        year
    );
PIVOT cities
ON year IN __pivot_enum_0_0
USING sum(population);
```

and finally translated into:

```sql
SELECT country, name, list(year), list(population_sum)
FROM (
    SELECT country, name, year, sum(population) AS population_sum
    FROM cities
    GROUP BY ALL
)
GROUP BY ALL;
```

This produces the result:

| country |     name      |    list("year")    | list(population_sum) |
|---------|---------------|--------------------|----------------------|
| NL      | Amsterdam     | [2000, 2010, 2020] | [1005, 1065, 1158]   |
| US      | Seattle       | [2000, 2010, 2020] | [564, 608, 738]      |
| US      | New York City | [2000, 2010, 2020] | [8015, 8175, 8772]   |

The `PhysicalPivot` operator converts those lists into column names and values to return this result:

| country |     name      | 2000 | 2010 | 2020 |
|---------|---------------|-----:|-----:|-----:|
| NL      | Amsterdam     | 1005 | 1065 | 1158 |
| US      | Seattle       | 564  | 608  | 738  |
| US      | New York City | 8015 | 8175 | 8772 |

#### `UNPIVOT` {#docs:current:internals:pivot::unpivot}

##### Internals {#docs:current:internals:pivot::internals}

Unpivoting is implemented entirely as rewrites into SQL queries.
Each `UNPIVOT` is implemented as set of `unnest` functions, operating on a list of the column names and a list of the column values.
If dynamically unpivoting, the `COLUMNS` expression is evaluated first to calculate the column list.

For example:

```sql
UNPIVOT monthly_sales
ON jan, feb, mar, apr, may, jun
INTO
    NAME month
    VALUE sales;
```

is translated into:

```sql
SELECT
    empid,
    dept,
    unnest(['jan', 'feb', 'mar', 'apr', 'may', 'jun']) AS month,
    unnest(["jan", "feb", "mar", "apr", "may", "jun"]) AS sales
FROM monthly_sales;
```

Note the single quotes to build a list of text strings to populate `month`, and the double quotes to pull the column values for use in `sales`.
This produces the same result as the initial example:

| empid |    dept     | month | sales |
|------:|-------------|-------|------:|
| 1     | electronics | jan   | 1     |
| 1     | electronics | feb   | 2     |
| 1     | electronics | mar   | 3     |
| 1     | electronics | apr   | 4     |
| 1     | electronics | may   | 5     |
| 1     | electronics | jun   | 6     |
| 2     | clothes     | jan   | 10    |
| 2     | clothes     | feb   | 20    |
| 2     | clothes     | mar   | 30    |
| 2     | clothes     | apr   | 40    |
| 2     | clothes     | may   | 50    |
| 2     | clothes     | jun   | 60    |
| 3     | cars        | jan   | 100   |
| 3     | cars        | feb   | 200   |
| 3     | cars        | mar   | 300   |
| 3     | cars        | apr   | 400   |
| 3     | cars        | may   | 500   |
| 3     | cars        | jun   | 600   |

# DuckDB Blog

## Testing Out DuckDB's Full Text Search Extension

**Publication date:** 2021-01-25

**Author:** Laurens Kuiper

**TL;DR:** DuckDB now has full-text search functionality, similar to the FTS5 extension in SQLite. The main difference is that our FTS extension is fully formulated in SQL. We tested it out on TREC disks 4 and 5.

Searching through textual data stored in a database can be cumbersome, as SQL does not provide a good way of formulating questions such as "Give me all the documents about __Mallard Ducks__": string patterns with `LIKE` will only get you so far. Despite SQL's shortcomings here, storing textual data in a database is commonplace. Consider the table `products (id INTEGER, name VARCHAR, description VARCHAR`) – it would be useful to search through the `name` and `description` columns for a website that sells these products.

We expect a search engine to return us results within milliseconds. For a long time databases were unsuitable for this task, because they could not search large inverted indexes at this speed: transactional database systems are not made for this use case. However, analytical database systems, can keep up with state-of-the art information retrieval systems. The company [Spinque](https://www.spinque.com/) is a good example of this. At Spinque, MonetDB is used as a computation engine for customized search engines.

DuckDB's FTS implementation follows the paper "[Old Dogs Are Great at New Tricks](https://www.duckdb.org/pdf/SIGIR2014-column-stores-ir-prototyping.pdf)". A keen observation there is that advances made to the database system, such as parallelization, will speed up your search engine "for free"!

Alright, enough about the "why", let's get to the "how".

#### Preparing the Data

The TREC 2004 Robust Retrieval Track has 250 "topics" (search queries) over TREC disks 4 and 5. The data consist of many text files stored in SGML format, along with a corresponding DTD (document type definition) file. This format is rarely used anymore, but it is similar to XML. We will use OpenSP's command line tool `osx` to convert it to XML. Because there are many files, I wrote a Bash script:

```bash
mkdir -p latimes/xml
for i in $(seq -w 1 9); do
    cat dtds/la.dtd latimes-$i | osx > latimes/xml/latimes-$i.xml
done
```

This sorts the `latimes` files. Repeat for the `fbis`, `cr`, `fr94`, and `ft` files.

To parse the XML I used BeautifulSoup. Each document has a `docno` identifier, and a `text` field. Because the documents do not come from the same source, they differ in what other fields they have. I chose to take all of the fields.

```python
import duckdb
import multiprocessing
import pandas as pd
import re
from bs4 import BeautifulSoup as bs
from tqdm import tqdm

# fill variable 'files' with the path to each .xml file that we created here

def process_file(fpath):
    dict_list = []
    with open(fpath, 'r') as f:
        content = f.read()
        bs_content = bs(content, "html.parser")
        # find all 'doc' nodes
        for doc in bs_content.findChildren('doc', recursive=True):
            row_dict = {}
            for c in doc.findChildren(recursive=True):
                row_dict[c.name] = ''.join(c.findAll(text=True, recursive=False)).trim()
            dict_list.append(row_dict)
    return dict_list

# process documents (in parallel to speed things up)
pool = multiprocessing.Pool(multiprocessing.cpu_count())
list_of_dict_lists = []
for x in tqdm(pool.imap_unordered(process_file, files), total=len(files)):
    list_of_dict_lists.append(x)
pool.close()
pool.join()

# create pandas dataframe from the parsed data
documents_df = pd.DataFrame([x for sublist in list_of_dict_lists for x in sublist])
```

Now that we have a dataframe, we can register it in DuckDB.

```python
# create database connection and register the dataframe
con = duckdb.connect(database='db/trec04_05.db', read_only=False)
con.register('documents_df', documents_df)

# create a table from the dataframe so that it persists
con.execute("CREATE TABLE documents AS (SELECT * FROM documents_df)")
con.close()
```

This is the end of my preparation script, so I closed the database connection.

#### Building the Search Engine

We can now build the inverted index and the retrieval model using a `PRAGMA` statement.
The extension is [documented here](#docs:lts:core_extensions:full_text_search).
We create an index table on table `documents` or `main.documents` that we created with our script.
The column that identifies our documents is called `docno`, and we wish to create an inverted index on the fields supplied.
I supplied all fields by using the '\*' shortcut.

```python
con = duckdb.connect(database='db/trec04_05.db', read_only=False)
con.execute("PRAGMA create_fts_index('documents', 'docno', '*', stopwords='english')")
```

Under the hood, a parameterized SQL script is called. The schema `fts_main_documents` is created, along with tables `docs`, `terms`, `dict`, and `stats`, that make up the inverted index. If you're curious what this look like, take a look at our source code under the `extension` folder in DuckDB's source code!

#### Running the Benchmark

The data is now fully prepared. Now we want to run the queries in the benchmark, one by one. We load the topics file as follows:

```python
# the 'topics' file is not structured nicely, therefore we need parse some of it using regex
def after_tag(s, tag):
    m = re.findall(r'<' + tag + r'>([\s\S]*?)<.*>', s)
    return m[0].replace('\n', '').strip()

topic_dict = {}
with open('../../trec/topics', 'r') as f:
    bs_content = bs(f.read(), "lxml")
    for top in bs_content.findChildren('top'):
        top_content = top.getText()
        # we need the number and title of each topic
        num = after_tag(str(top), 'num').split(' ')[1]
        title = after_tag(str(top), 'title')
        topic_dict[num] = title
```

This gives us a dictionary that has query number as keys, and query strings as values, e.g., `301 -> 'International Organized Crime'`.

We want to store the results in a specific format, so that they can be evaluated by [trec eval](https://github.com/usnistgov/trec_eval.git):

```python
# create a prepared statement to make querying our document collection easier
con.execute("""
    PREPARE fts_query AS (
        WITH scored_docs AS (
            SELECT *, fts_main_documents.match_bm25(docno, ?) AS score FROM documents)
        SELECT docno, score
        FROM scored_docs
        WHERE score IS NOT NULL
        ORDER BY score DESC
        LIMIT 1000)
    """)

# enable parallelism
con.execute('PRAGMA threads=32')
results = []
for query in topic_dict:
    q_str = topic_dict[query].replace('\'', ' ')
    con.execute("EXECUTE fts_query('" + q_str + "')")
    for i, row in enumerate(con.fetchall()):
        results.append(query + " Q0 " + row[0].trim() + " " + str(i) + " " + str(row[1]) + " STANDARD")
con.close()

with open('results', 'w+') as f:
    for r in results:
        f.write(r + '\n')
```

#### Results

Now that we have created our 'results' file, we can compare them to the relevance assessments `qrels` using `trec_eval`.

```bash
./trec_eval -m P.30 -m map qrels results
```

```text
map                     all 0.2324
P_30                    all 0.2948
```

Not bad! While these results are not as high as the reproducible by [Anserini](https://github.com/castorini/anserini), they are definitely acceptable. The difference in performance can be explained by differences in

1. Which stemmer was used (we used 'porter')
2. Which stopwords were used (we used the list of 571 English stopwords used in the SMART system)
3. Pre-processing (removal of accents, punctuation, numbers)
4. BM25 parameters (we used the default k=1.2 and b=0.75, non-conjunctive)
5. Which fields were indexed (we used all columns by supplying '\*')

Retrieval time for each query was between 0.5 and 1.3 seconds on our machine, which will be improved with further improvements to DuckDB. I hope you enjoyed reading this blog, and become inspired to test out the extension as well!

## Efficient SQL on Pandas with DuckDB

**Publication date:** 2021-05-14

**Authors:** Mark Raasveldt, Hannes Mühleisen

**TL;DR:** DuckDB, a free and open source analytical data management system, can efficiently run SQL queries directly on Pandas DataFrames.

Recently, an article was published [advocating for using SQL for Data Analysis](https://hakibenita.com/sql-for-data-analysis). Here at team DuckDB, we are huge fans of [SQL](https://en.wikipedia.org/wiki/SQL). It is a versatile and flexible language that allows the user to efficiently perform a wide variety of data transformations, without having to care about how the data is physically represented or how to do these data transformations in the most optimal way.

While you can very effectively perform aggregations and data transformations in an external database system such as Postgres if your data is stored there, at some point you will need to convert that data back into [Pandas](https://pandas.pydata.org) and [NumPy](https://numpy.org). These libraries serve as the standard for data exchange between the vast ecosystem of Data Science libraries in Python<sup>1</sup> such as [scikit-learn](https://scikit-learn.org/stable/) or [TensorFlow](https://www.tensorflow.org).

<sup>1</sup>[Apache Arrow](https://arrow.apache.org) is gaining significant traction in this domain as well, and DuckDB also quacks Arrow.

If you are reading from a file (e.g., a CSV or Parquet file) often your data will never be loaded into an external database system at all, and will instead be directly loaded into a Pandas DataFrame.

#### SQL on Pandas

After your data has been converted into a Pandas DataFrame often additional data wrangling and analysis still need to be performed. SQL is a very powerful tool for performing these types of data transformations. Using DuckDB, it is possible to run SQL efficiently right on top of Pandas DataFrames.

As a short teaser, here is a code snippet that allows you to do exactly that: run arbitrary SQL queries directly on Pandas DataFrames using DuckDB.

```python
# to install: pip install duckdb
import pandas as pd
import duckdb

mydf = pd.DataFrame({'a' : [1, 2, 3]})
print(duckdb.query("SELECT sum(a) FROM mydf").to_df())
```

In the rest of the article, we will go more in-depth into how this works and how fast it is.

#### Data Integration & SQL on Pandas

One of the core goals of DuckDB is that accessing data in common formats should be easy. DuckDB is fully capable of running queries in parallel *directly* on top of a Pandas DataFrame (or on a Parquet/CSV file, or on an Arrow table, …). A separate (time-consuming) import step is not necessary.

DuckDB can also write query results directly to any of these formats. You can use DuckDB to process a Pandas DataFrame in parallel using SQL, and convert the result back to a Pandas DataFrame again, so you can then use the result in other Data Science libraries.

When you run a query in SQL, DuckDB will look for Python variables whose name matches the table names in your query and automatically start reading your Pandas DataFrames. Looking back at the previous example we can see this in action:

```python
import pandas as pd
import duckdb

mydf = pd.DataFrame({'a' : [1, 2, 3]})
print(duckdb.query("SELECT sum(a) FROM mydf").to_df())
```

The SQL table name `mydf` is interpreted as the local Python variable `mydf` that happens to be a Pandas DataFrame, which DuckDB can read and query directly. The column names and types are also extracted automatically from the DataFrame.

Not only is this process painless, it is highly efficient. For many queries, you can use DuckDB to process data faster than Pandas, and with a much lower total memory usage, *without ever leaving the Pandas DataFrame binary format* ("Pandas-in, Pandas-out"). Unlike when using an external database system such as Postgres, the data transfer time of the input or the output is negligible (see Appendix A for details).

#### SQL on Pandas Performance

To demonstrate the performance of DuckDB when executing SQL on Pandas DataFrames, we now present a number of benchmarks. The source code for the benchmarks is available for interactive use [in Google Colab](https://colab.research.google.com/drive/1eg_TJpPQr2tyYKWjISJlX8IEAi8Qln3U?usp=sharing). In these benchmarks, we operate *purely* on Pandas DataFrames. Both the DuckDB code and the Pandas code operates fully on a `Pandas-in, Pandas-out` basis.

##### Benchmark Setup and Data Set

We run the benchmark entirely from within the Google Colab environment. For our benchmark dataset, we use the [infamous TPC-H data set](http://www.tpc.org/tpch/). Specifically, we focus on the `lineitem` and `orders` tables as these are the largest tables in the benchmark. The total dataset size is around 1 GB in uncompressed CSV format ("scale factor" 1).

As DuckDB is capable of using multiple processors (multi-threading), we include both a single-threaded variant and a variant with two threads. Note that while DuckDB can scale far beyond two threads, Google Colab only supports two.

##### Setup

First we need to install DuckDB. This is a simple one-liner.

```bash
pip install duckdb
```

To set up the dataset for processing we download two Parquet files using `wget`. After that, we load the data into a Pandas DataFrame using the built-in Parquet reader of DuckDB. The system automatically infers that we are reading a Parquet file by looking at the `.parquet` extension of the file.

```python
lineitem = duckdb.query(
    "SELECT * FROM 'lineitemsf1.snappy.parquet'"
).to_df()
orders = duckdb.query(
    "SELECT * FROM 'orders.parquet'"
).to_df()
```

##### Ungrouped Aggregates

For our first query, we will run a set of ungrouped aggregates over the Pandas DataFrame. Here is the SQL query:

```sql
SELECT
    sum(l_extendedprice),
    min(l_extendedprice),
    max(l_extendedprice),
    avg(l_extendedprice)
FROM lineitem;
```

The Pandas code looks similar:

```python
lineitem.agg(
  Sum=('l_extendedprice', 'sum'),
  Min=('l_extendedprice', 'min'),
  Max=('l_extendedprice', 'max'),
  Avg=('l_extendedprice', 'mean')
)
```

| Name               | Time (s) |
| :----------------- | -------: |
| DuckDB (1 Thread)  |    0.079 |
| DuckDB (2 Threads) |    0.048 |
| Pandas             |    0.070 |

This benchmark involves a very simple query, and Pandas performs very well here. These simple queries are where Pandas excels (ha), as it can directly call into the numpy routines that implement these aggregates, which are highly efficient. Nevertheless, we can see that DuckDB performs similar to Pandas in the single-threaded scenario, and benefits from its multi-threading support when enabled.

##### Grouped Aggregate

For our second query, we will run the same set of aggregates, but this time include a grouping condition. In SQL, we can do this by adding a GROUP BY clause to the query.

```sql
SELECT
    l_returnflag,
    l_linestatus,
    sum(l_extendedprice),
    min(l_extendedprice),
    max(l_extendedprice),
    avg(l_extendedprice)
FROM lineitem
GROUP BY
    l_returnflag,
    l_linestatus;
```

In Pandas, we use the groupby function before we perform the aggregation.

```python
lineitem.groupby(
  ['l_returnflag', 'l_linestatus']
).agg(
  Sum=('l_extendedprice', 'sum'),
  Min=('l_extendedprice', 'min'),
  Max=('l_extendedprice', 'max'),
  Avg=('l_extendedprice', 'mean')
)
```

| Name                     | Time (s) |
| :----------------------- | -------: |
| DuckDB (1 Thread)        |     0.43 |
| DuckDB (2 Threads)&nbsp; |     0.32 |
| Pandas                   |     0.84 |

This query is already getting more complex, and while Pandas does a decent job, it is a factor two slower than the single-threaded version of DuckDB. DuckDB has a highly optimized aggregate hash-table implementation that will perform both the grouping and the computation of all the aggregates in a single pass over the data.

##### Grouped Aggregate with a Filter

Now suppose that we don't want to perform an aggregate over all of the data, but instead only want to select a subset of the data to aggregate. We can do this by adding a filter clause that removes any tuples we are not interested in. In SQL, we can accomplish this through the `WHERE` clause.

```sql
SELECT
    l_returnflag,
    l_linestatus,
    sum(l_extendedprice),
    min(l_extendedprice),
    max(l_extendedprice),
    avg(l_extendedprice)
FROM lineitem
WHERE
    l_shipdate <= DATE '1998-09-02'
GROUP BY
    l_returnflag,
    l_linestatus;
```

 In Pandas, we can create a filtered variant of the DataFrame by using the selection brackets.

```python
# filter out the rows
filtered_df = lineitem[
  lineitem['l_shipdate'] < "1998-09-02"]
# perform the aggregate
result = filtered_df.groupby(
  ['l_returnflag', 'l_linestatus']
).agg(
  Sum=('l_extendedprice', 'sum'),
  Min=('l_extendedprice', 'min'),
  Max=('l_extendedprice', 'max'),
  Avg=('l_extendedprice', 'mean')
)
```

In DuckDB, the query optimizer will combine the filter and aggregation into a single pass over the data, only reading relevant columns. In Pandas, however, we have no such luck. The filter as it is executed will actually subset the entire lineitem table, *including any columns we are not using!* As a result of this, the filter operation is much more time-consuming than it needs to be.

We can manually perform this optimization ("projection pushdown" in database literature). To do this, we first need to select only the columns that are relevant to our query and then subset the lineitem dataframe. We will end up with the following code snippet:

```python
# projection pushdown
pushed_down_df = lineitem[
  ['l_shipdate',
   'l_returnflag',
   'l_linestatus',
   'l_extendedprice']
]
# perform the filter
filtered_df = pushed_down_df[
  pushed_down_df['l_shipdate'] < "1998-09-02"]
# perform the aggregate
result = filtered_df.groupby(
  ['l_returnflag', 'l_linestatus']
).agg(
  Sum=('l_extendedprice', 'sum'),
  Min=('l_extendedprice', 'min'),
  Max=('l_extendedprice', 'max'),
  Avg=('l_extendedprice', 'mean')
)
```

| Name                                 | Time (s) |
| :----------------------------------- | -------: |
| DuckDB (1 Thread)                    |     0.60 |
| DuckDB (2 Threads)                   |     0.42 |
| Pandas                               |     3.57 |
| Pandas (manual pushdown)&nbsp;&nbsp; |     2.23 |

While the manual projection pushdown significantly speeds up the query in Pandas, there is still a significant time penalty for the filtered aggregate. To process a filter, Pandas will write a copy of the entire DataFrame (minus the filtered out rows) back into memory. This operation can be time consuming when the filter is not very selective.

Due to its holistic query optimizer and efficient query processor, DuckDB performs significantly better on this query.

##### Joins

For the final query, we will join (` merge` in Pandas) the lineitem table with the orders table, and apply a filter that only selects orders which have the status we are interested in. This leads us to the following query in SQL:

```sql
SELECT
    l_returnflag,
    l_linestatus,
    sum(l_extendedprice),
    min(l_extendedprice),
    max(l_extendedprice),
    avg(l_extendedprice)
FROM lineitem
JOIN orders ON (l_orderkey = o_orderkey)
WHERE l_shipdate <= DATE '1998-09-02'
  AND o_orderstatus='O'
GROUP BY
    l_returnflag,
    l_linestatus;
```

For Pandas, we have to add a `merge` step. In a basic approach, we merge lineitem and orders together, then apply the filters, and finally apply the grouping and aggregation. This will give us the following code snippet:

```python
# perform the join
merged = lineitem.merge(
  orders,
  left_on='l_orderkey',
  right_on='o_orderkey')
# filter out the rows
filtered_a = merged[
  merged['l_shipdate'] < "1998-09-02"]
filtered_b = filtered_a[
  filtered_a['o_orderstatus'] == "O"]
# perform the aggregate
result = filtered_b.groupby(
  ['l_returnflag', 'l_linestatus']
).agg(
  Sum=('l_extendedprice', 'sum'),
  Min=('l_extendedprice', 'min'),
  Max=('l_extendedprice', 'max'),
  Avg=('l_extendedprice', 'mean')
)
```

Now we have missed two performance opportunities:

* First, we are merging far too many columns, because we are merging columns that are not required for the remainder of the query (projection pushdown).
* Second, we are merging far too many rows. We can apply the filters prior to the merge to reduce the amount of data that we need to merge (filter pushdown).

Applying these two optimizations manually results in the following code snippet:

```python
# projection & filter on lineitem table
lineitem_projected = lineitem[
  ['l_shipdate',
   'l_orderkey',
   'l_linestatus',
   'l_returnflag',
   'l_extendedprice']
]
lineitem_filtered = lineitem_projected[
  lineitem_projected['l_shipdate'] < "1998-09-02"]
# projection and filter on order table
orders_projected = orders[
  ['o_orderkey',
   'o_orderstatus']
]
orders_filtered = orders_projected[
  orders_projected['o_orderstatus'] == 'O']
# perform the join
merged = lineitem_filtered.merge(
  orders_filtered,
  left_on='l_orderkey',
  right_on='o_orderkey')
# perform the aggregate
result = merged.groupby(
  ['l_returnflag', 'l_linestatus']
).agg(
  Sum=('l_extendedprice', 'sum'),
  Min=('l_extendedprice', 'min'),
  Max=('l_extendedprice', 'max'),
  Avg=('l_extendedprice', 'mean')
)
```

Both of these optimizations are automatically applied by DuckDB's query optimizer.

| Name                     | Time (s) |
| :----------------------- | -------: |
| DuckDB (1 Thread)        |     1.05 |
| DuckDB (2 Threads)       |     0.53 |
| Pandas                   |     15.2 |
| Pandas (manual pushdown) |     3.78 |

We see that the basic approach is extremely time consuming compared to the optimized version. This demonstrates the usefulness of the automatic query optimizer. Even after optimizing, the Pandas code is still significantly slower than DuckDB because it stores intermediate results in memory after the individual filters and joins.

##### Takeaway

Using DuckDB, you can take advantage of the powerful and expressive SQL language without having to worry about moving your data in – and out – of Pandas. DuckDB is extremely simple to install, and offers many advantages such as a query optimizer, automatic multi-threading and larger-than-memory computation. DuckDB uses the Postgres SQL parser, and offers many of the same SQL features as Postgres, including advanced features such as window functions, correlated subqueries, (recursive) common table expressions, nested types and sampling. If you are missing a feature, please [open an issue](https://github.com/duckdb/duckdb/issues).

#### Appendix A: There and Back Again: Transferring Data from Pandas to a SQL Engine and Back

Traditional SQL engines use the Client-Server paradigm, which means that a client program connects through a socket to a server. Queries are run on the server, and results are sent back down to the client afterwards. This is the same when using for example Postgres from Python. Unfortunately, this transfer [is a serious bottleneck](http://www.vldb.org/pvldb/vol10/p1022-muehleisen.pdf). In-process engines such as SQLite or DuckDB do not run into this problem.

To showcase how costly this data transfer over a socket is, we have run a benchmark involving Postgres, SQLite and DuckDB. The source code for the benchmark can be found on [GitHub](https://gist.github.com/hannes/a95a39a1eda63aeb0ca13fd82d1ba49c).

In this benchmark we copy a (fairly small) Pandas data frame consisting of 10M 4-byte integers (40 MB) from Python to the PostgreSQL, SQLite and DuckDB databases. Since the default Pandas `to_sql` was rather slow, we added a separate optimization in which we tell Pandas to write the data frame to a temporary CSV file, and then tell PostgreSQL to directly copy data from that file into a newly created table. This of course will only work if the database server is running on the same machine as Python.

| Name                                                    | Time (s) |
| :------------------------------------------------------ | -------: |
| Pandas to Postgres using to_sql                         |   111.25 |
| Pandas to Postgres using temporary CSV file&nbsp;&nbsp; |     5.57 |
| Pandas to SQLite using to_sql                           |     6.80 |
| Pandas to DuckDB                                        |     0.03 |

While SQLite performs significantly better than Postgres here, it is still rather slow. That is because the `to_sql` function in Pandas runs a large number of `INSERT INTO` statements, which involves transforming all the individual values of the Pandas DataFrame into a row-wise representation of  Python objects which are then passed onto the system. DuckDB on the other hand directly reads the underlying array from Pandas, which makes this operation almost instant.

Transferring query results or tables back from the SQL system into Pandas is another potential bottleneck. Using the built-in `read_sql_query` is extremely slow, but even the more optimized CSV route still takes at least a second for this tiny data set. DuckDB, on the other hand, also performs this transformation almost instantaneously.

| Name                                          | Time (s) |
| :-------------------------------------------- | -------: |
| PostgreSQL to Pandas using read_sql_query     |     7.08 |
| PostgreSQL to Pandas using temporary CSV file |     1.29 |
| SQLite to Pandas using read_sql_query         |     5.20 |
| DuckDB to Pandas                              |     0.04 |

#### Appendix B: Comparison to PandaSQL

There is a package called [PandaSQL](https://pypi.org/project/pandasql/) that also provides the facilities of running SQL directly on top of Pandas. However, it is built using the to_sql and from_sql infrastructure that we have seen is extremely slow in Appendix A.

Nevertheless, for good measure we have run the first Ungrouped Aggregate query in PandaSQL to time it. When we first tried to run the query on the original dataset, however, we ran into an out-of-memory error that crashed our colab session. For that reason, we have decided to run the benchmark again for PandaSQL using a sample of 10% of the original data set size (600K rows). Here are the results:

| Name                     | Time (s) |
| :----------------------- | -------: |
| DuckDB (1 Thread)        |    0.023 |
| DuckDB (2 Threads)&nbsp; |    0.014 |
| Pandas                   |    0.017 |
| PandaSQL                 |    24.43 |

We can see that PandaSQL (powered by SQLite) is around 1000× slower than either Pandas or DuckDB on this straightforward benchmark. The performance difference was so large we have opted not to run the other benchmarks for PandaSQL.

#### Appendix C: Query on Parquet Directly

In the benchmarks above, we fully read the Parquet files into Pandas. However, DuckDB also has the capability of directly running queries on top of Parquet files (in parallel!). In this appendix, we show the performance of this compared to loading the file into Python first.

For the benchmark, we will run two queries: the simplest query (the ungrouped aggregate) and the most complex query (the final join) and compare the cost of running this query directly on the Parquet file, compared to loading it into Pandas using the `read_parquet` function.

##### Setup

In DuckDB, we can create a view over the Parquet file using the following query. This allows us to run queries over the Parquet file as if it was a regular table. Note that we do not need to worry about projection pushdown at all: we can just do a `SELECT *` and DuckDB's optimizer will take care of only projecting the required columns at query time.

```sql
CREATE VIEW lineitem_parquet AS
    SELECT * FROM 'lineitemsf1.snappy.parquet';
CREATE VIEW orders_parquet AS
    SELECT * FROM 'orders.parquet';
```

##### Ungrouped Aggregate

After we have set up this view, we can run the same queries we ran before, but this time against the `lineitem_parquet` table.

```sql
SELECT sum(l_extendedprice), min(l_extendedprice), max(l_extendedprice), avg(l_extendedprice) FROM lineitem_parquet;
```

For Pandas, we will first need to run `read_parquet` to load the data into Pandas. To do this, we use the Parquet reader powered by Apache Arrow. After that, we can run the query as we did before.

```python
lineitem_pandas_parquet = pd.read_parquet('lineitemsf1.snappy.parquet')
result = lineitem_pandas_parquet.agg(Sum=('l_extendedprice', 'sum'), Min=('l_extendedprice', 'min'), Max=('l_extendedprice', 'max'), Avg=('l_extendedprice', 'mean'))
```

However, we now again run into the problem where Pandas will read the Parquet file in its entirety. In order to circumvent this, we will need to perform projection pushdown manually again by providing the `read_parquet` method with the set of columns that we want to read.

The optimizer in DuckDB will figure this out by itself by looking at the query you are executing.

```python
lineitem_pandas_parquet = pd.read_parquet('lineitemsf1.snappy.parquet', columns=['l_extendedprice'])
result = lineitem_pandas_parquet.agg(Sum=('l_extendedprice', 'sum'), Min=('l_extendedprice', 'min'), Max=('l_extendedprice', 'max'), Avg=('l_extendedprice', 'mean'))
```

| Name                     | Time (s) |
| :----------------------- | -------: |
| DuckDB (1 Thread)        |     0.16 |
| DuckDB (2 Threads)       |     0.14 |
| Pandas                   |     7.87 |
| Pandas (manual pushdown) |     0.17 |

We can see that the performance difference between doing the pushdown and not doing the pushdown is dramatic. When we perform the pushdown, Pandas has performance in the same ballpark as DuckDB. Without the pushdown, however, it is loading the entire file from disk, including the other 15 columns that are not required to answer the query.

#### Joins

Now for the final query that we saw in the join section previously. To recap:

```sql
SELECT
    l_returnflag,
    l_linestatus,
    sum(l_extendedprice),
    min(l_extendedprice),
    max(l_extendedprice),
    avg(l_extendedprice)
FROM lineitem
JOIN orders ON (l_orderkey = o_orderkey)
WHERE l_shipdate <= DATE '1998-09-02'
  AND o_orderstatus='O'
GROUP BY
    l_returnflag,
    l_linestatus;
```

For Pandas we again create two versions. A naive version, and a manually optimized version. The exact code used can be found [in Google Colab](https://colab.research.google.com/drive/1eg_TJpPQr2tyYKWjISJlX8IEAi8Qln3U?usp=sharing).

| Name                     | Time (s) |
| :----------------------- | -------: |
| DuckDB (1 Thread)        |     1.04 |
| DuckDB (2 Threads)       |     0.89 |
| Pandas                   |     20.4 |
| Pandas (manual pushdown) |     3.95 |

We see that for this more complex query the slight difference in performance between running over a Pandas DataFrame and a Parquet file vanishes, and the DuckDB timings become extremely similar to the timings we saw before. The added Parquet read again increases the necessity of manually performing optimizations on the Pandas code, which is not required at all when running SQL in DuckDB.

## Querying Parquet with Precision Using DuckDB

**Publication date:** 2021-06-25

**Authors:** Hannes Mühleisen, Mark Raasveldt

**TL;DR:** DuckDB, a free and open source analytical data management system, can run SQL queries directly on Parquet files and automatically take advantage of the advanced features of the Parquet format.

Apache Parquet is the most common "Big Data" storage format for analytics. In Parquet files, data is stored in a columnar-compressed binary format. Each Parquet file stores a single table. The table is partitioned into row groups, which each contain a subset of the rows of the table. Within a row group, the table data is stored in a columnar fashion.

![](../images/blog/parquet.svg)


The Parquet format has a number of properties that make it suitable for analytical use cases:

1. The columnar representation means that individual columns can be (efficiently) read. No need to always read the entire file!
2. The file contains per-column statistics in every row group (min/max value, and the number of `NULL` values). These statistics allow the reader to skip row groups if they are not required.
3. The columnar compression significantly reduces the file size of the format, which in turn reduces the storage requirement of data sets. This can often turn Big Data into Medium Data.

#### DuckDB and Parquet

DuckDB's zero-dependency Parquet reader is able to directly execute SQL queries on Parquet files without any import or analysis step. Because of the natural columnar format of Parquet, this is very fast!

DuckDB will read the Parquet files in a streaming fashion, which means you can perform queries on large Parquet files that do not fit in your main memory.

DuckDB is able to automatically detect which columns and rows are required for any given query. This allows users to analyze much larger and more complex Parquet files without needing to perform manual optimizations or investing in more hardware.

And as an added bonus, DuckDB is able to do all of this using parallel processing and over multiple Parquet files at the same time using the glob syntax.

As a short teaser, here is a code snippet that allows you to directly run a SQL query on top of a Parquet file.

To install the DuckDB package:

```bash
pip install duckdb
```

To download the Parquet file:

```bash
wget https://blobs.duckdb.org/data/taxi_2019_04.parquet
```

Then, run the following Python script:

```python
import duckdb

print(duckdb.query('''
   SELECT count(*)
   FROM 'taxi_2019_04.parquet'
   WHERE pickup_at BETWEEN '2019-04-15' AND '2019-04-20'
   ''').fetchall())
```

#### Automatic Filter & Projection Pushdown

Let us dive into the previous query to better understand the power of the Parquet format when combined with DuckDB's query optimizer.

```sql
SELECT count(*)
FROM 'taxi_2019_04.parquet'
WHERE pickup_at BETWEEN '2019-04-15' AND '2019-04-20';
```

In this query, we read a single column from our Parquet file (` pickup_at`). Any other columns stored in the Parquet file can be entirely skipped, as we do not need them to answer our query.

![](../images/blog/parquet-filter-svg.svg)


In addition, only rows that have a `pickup_at` between the 15th and the 20th of April 2019 influence the result of the query. Any rows that do not satisfy this predicate can be skipped.

We can use the statistics inside the Parquet file to great advantage here. Any row groups that have a max value of `pickup_at` lower than `2019-04-15`, or a min value higher than `2019-04-20`, can be skipped. In some cases, that allows us to skip reading entire files.

#### DuckDB versus Pandas

To illustrate how effective these automatic optimizations are, we will run a number of queries on top of Parquet files using both Pandas and DuckDB.

In these queries, we use a part of the infamous New York Taxi dataset stored as Parquet files, specifically data from April, May and June 2019. These files are ca. 360 MB in size together and contain around 21 million rows of 18 columns each. The three files are placed into the `taxi/` folder.

The examples are available as [an interactive notebook over at Google Colab](https://colab.research.google.com/drive/1e1beWqYOcFidKl2IxHtxT5s9i_6KYuNY). The timings reported here are from this environment for reproducibility.

#### Reading Multiple Parquet Files

First we look at some rows in the dataset. There are three Parquet files in the `taxi/` folder. [DuckDB supports the globbing syntax](#docs:lts:data:parquet:overview), which allows it to query all three files simultaneously.

```python
con.execute("""
   SELECT *
   FROM 'taxi/*.parquet'
   LIMIT 5""").df()
```

| pickup_at           | dropoff_at          | passenger_count | trip_distance | rate_code_id |
| ------------------- | ------------------- | --------------- | ------------- | ------------ |
| 2019-04-01 00:04:09 | 2019-04-01 00:06:35 | 1               | 0.5           | 1            |
| 2019-04-01 00:22:45 | 2019-04-01 00:25:43 | 1               | 0.7           | 1            |
| 2019-04-01 00:39:48 | 2019-04-01 01:19:39 | 1               | 10.9          | 1            |
| 2019-04-01 00:35:32 | 2019-04-01 00:37:11 | 1               | 0.2           | 1            |
| 2019-04-01 00:44:05 | 2019-04-01 00:57:58 | 1               | 4.8           | 1            |

Despite the query selecting all columns from three (rather large) Parquet files, the query completes instantly. This is because DuckDB processes the Parquet file in a streaming fashion, and will stop reading the Parquet file after the first few rows are read as that is all required to satisfy the query.

If we try to do the same in Pandas, we realize it is not so straightforward, as Pandas cannot read multiple Parquet files in one call. We will first have to use `pandas.concat` to concatenate the three Parquet files together:

```python
import pandas
import glob
df = pandas.concat(
    [pandas.read_parquet(file)
     for file
     in glob.glob('taxi/*.parquet')])
print(df.head(5))
```

Below are the timings for both of these queries.

| System | Time (s) |
| :----- | -------: |
| DuckDB |    0.015 |
| Pandas |   12.300 |

Pandas takes significantly longer to complete this query. That is because Pandas not only needs to read each of the three Parquet files in their entirety, it has to concatenate these three separate Pandas DataFrames together.

#### Concatenate into a Single File

We can address the concatenation issue by creating a single big Parquet file from the three smaller parts. We can use the `pyarrow` library for this, which has support for reading multiple Parquet files and streaming them into a single large file. Note that the `pyarrow` Parquet reader is the very same Parquet reader that is used by Pandas internally.

```python
import pyarrow.parquet as pq

# concatenate all three Parquet files
pq.write_table(pq.ParquetDataset('taxi/').read(), 'alltaxi.parquet', row_group_size=100000)
```

Note that [DuckDB also has support for writing Parquet files](#docs:lts:data:parquet:overview::writing-to-parquet-files) using the COPY statement.

#### Querying the Large File

Now let us repeat the previous experiment, but using the single file instead.

```python
# DuckDB
con.execute("""
   SELECT *
   FROM 'alltaxi.parquet'
   LIMIT 5""").df()

# Pandas
pandas.read_parquet('alltaxi.parquet')
      .head(5)
```

| System | Time (s) |
| :----- | -------: |
| DuckDB |     0.02 |
| Pandas |     7.50 |

We can see that Pandas performs better than before, as the concatenation is avoided. However, the entire file still needs to be read into memory, which takes both a significant amount of time and memory.

For DuckDB it does not really matter how many Parquet files need to be read in a query.

#### Counting Rows

Now suppose we want to figure out how many rows are in our data set. We can do that using the following code:

```python
# DuckDB
con.execute("""
   SELECT count(*)
   FROM 'alltaxi.parquet'
""").df()

# Pandas
len(pandas.read_parquet('alltaxi.parquet'))
```

| System | Time (s) |
| :----- | -------: |
| DuckDB |    0.015 |
| Pandas |    7.500 |

DuckDB completes the query very quickly, as it automatically recognizes  what needs to be read from the Parquet file and minimizes the required reads. Pandas has to read the entire file again, which causes it to take  the same amount of time as the previous query.

For this query, we can improve Pandas' time through manual optimization. In order to get a count, we only need a single column from the file. By manually specifying a single column to be read in the `read_parquet` command, we can get the same result but much faster.

```python
len(pandas.read_parquet('alltaxi.parquet', columns=['vendor_id']))
```

| System             | Time (s) |
| :----------------- | -------: |
| DuckDB             |    0.015 |
| Pandas             |    7.500 |
| Pandas (optimized) |    1.200 |

While this is much faster, this still takes more than a second as the entire `vendor_id` column has to be read into memory as a Pandas column only to count the number of rows.

#### Filtering Rows

It is common to use some sort of filtering predicate to only look at the interesting parts of a data set. For example, imagine we want to know how many taxi rides occur after the 30th of June 2019. We can do that using the following query in DuckDB:

```python
con.execute("""
   SELECT count(*)
   FROM 'alltaxi.parquet'
   WHERE pickup_at > '2019-06-30'
""").df()
```

The query completes in `45ms` and yields the following result:

|  count |
| -----: |
| 167022 |

In Pandas, we can perform the same operation using a naive approach.

```python
# pandas naive
len(pandas.read_parquet('alltaxi.parquet')
          .query("pickup_at > '2019-06-30'"))
```

This again reads the entire file into memory, however, causing this query to take 7.5 s. With the manual projection pushdown we can bring this down to 0.9 s. Still significantly higher than DuckDB.

```python
# pandas projection pushdown
len(pandas.read_parquet('alltaxi.parquet', columns=['pickup_at'])
          .query("pickup_at > '2019-06-30'"))
```

The `pyarrow` Parquet reader also allows us to perform filter pushdown into the scan, however. Once we add this we end up with a much more competitive `70ms` to complete the query.

```python
len(pandas.read_parquet('alltaxi.parquet', columns=['pickup_at'], filters=[('pickup_at', '>', '2019-06-30')]))
```

| System                                | Time (s) |
| :------------------------------------ | -------: |
| DuckDB                                |     0.05 |
| Pandas                                |     7.50 |
| Pandas (projection pushdown)          |     0.90 |
| Pandas (projection & filter pushdown) |     0.07 |

This shows that the results here are not due to DuckDB's Parquet reader being faster than the `pyarrow` Parquet reader. The reason that DuckDB performs better on these queries is because its optimizers automatically extract all required columns and filters from the SQL query, which then get automatically utilized in the Parquet reader with no manual effort required.

Interestingly, both the `pyarrow` Parquet reader and DuckDB are significantly faster than performing this operation natively in Pandas on a materialized DataFrame.

```python
# read the entire Parquet file into Pandas
df = pandas.read_parquet('alltaxi.parquet')
# run the query natively in Pandas
# note: we only time this part
print(len(df[['pickup_at']].query("pickup_at > '2019-06-30'")))
```

| System                                | Time (s) |
| :------------------------------------ | -------: |
| DuckDB                                |     0.05 |
| Pandas                                |     7.50 |
| Pandas (projection pushdown)          |     0.90 |
| Pandas (projection & filter pushdown) |     0.07 |
| Pandas (native)                       |     0.26 |

#### Aggregates

Finally lets look at a more complex aggregation. Say we want to compute the number of rides per passenger. With DuckDB and SQL, it looks like this:

```python
con.execute("""
    SELECT passenger_count, count(*)
    FROM 'alltaxi.parquet'
    GROUP BY passenger_count""").df()
```

The query completes in `220ms` and yields the following result:

| passenger_count |    count |
| --------------: | -------: |
|               0 |   408742 |
|               1 | 15356631 |
|               2 |  3332927 |
|               3 |   944833 |
|               4 |   439066 |
|               5 |   910516 |
|               6 |   546467 |
|               7 |      106 |
|               8 |       72 |
|               9 |       64 |

For the SQL-averse and as a teaser for a future blog post, DuckDB also has a "Relational API" that allows for a more Python-esque declaration of queries. Here's the equivalent to the above SQL query, that provides the exact same result and performance:

```python
con.from_parquet('alltaxi.parquet')
   .aggregate('passenger_count, count(*)')
   .df()
```

Now as a comparison, let's run the same query in Pandas in the same way we did previously.

```python
# naive
pandas.read_parquet('alltaxi.parquet')
      .groupby('passenger_count')
      .agg({'passenger_count' : 'count'})

# projection pushdown
pandas.read_parquet('alltaxi.parquet', columns=['passenger_count'])
      .groupby('passenger_count')
      .agg({'passenger_count' : 'count'})

# native (parquet file pre-loaded into memory)
df.groupby('passenger_count')
  .agg({'passenger_count' : 'count'})
```

| System                       | Time (s) |
| :--------------------------- | -------: |
| DuckDB                       |     0.22 |
| Pandas                       |     7.50 |
| Pandas (projection pushdown) |     0.58 |
| Pandas (native)              |     0.51 |

We can see that DuckDB is faster than Pandas in all three scenarios, without needing to perform any manual optimizations and without needing to load the Parquet file into memory in its entirety.

#### Conclusion

DuckDB can efficiently run queries directly on top of Parquet files without requiring an initial loading phase. The system will automatically take advantage of all of Parquet's advanced features to speed up query execution.

DuckDB is a free and open source database management system (MIT licensed). It aims to be the SQLite for Analytics, and provides a fast and efficient database system with zero external dependencies. It is available not just for Python, but also for C/C++, R, Java, and more.

## Fastest Table Sort in the West – Redesigning DuckDB’s Sort

**Publication date:** 2021-08-27

**Author:** Laurens Kuiper

**TL;DR:** DuckDB, a free and open-source analytical data management system, has a new highly efficient parallel sorting implementation that can sort much more data than fits in main memory.

Database systems use sorting for many purposes, the most obvious purpose being when a user adds an `ORDER BY` clause to their query.
Sorting is also used within operators, such as window functions.
DuckDB recently improved its sorting implementation, which is now able to sort data in parallel and sort more data than fits in memory.
In this post, we will take a look at how DuckDB sorts, and how this compares to other data management systems.

Not interested in the implementation? [Jump straight to the experiments!](#comparison)

#### Sorting Relational Data

Sorting is one of the most well-studied problems in computer science, and it is an important aspect of data management. There are [entire communities](https://sortbenchmark.org) dedicated to who sorts fastest.
Research into sorting algorithms tends to focus on sorting large arrays or key/value pairs.
While important, this does not cover how to implement sorting in a database system.
There is a lot more to sorting tables than just sorting a large array of integers!

Consider the following example query on a snippet of a TPC-DS table:

```sql
SELECT c_customer_sk, c_birth_country, c_birth_year
FROM customer
ORDER BY c_birth_country DESC,
         c_birth_year    ASC NULLS LAST;
```

Which yields:

| c_customer_sk | c_birth_country | c_birth_year |
| ------------: | --------------- | -----------: |
|         64760 | NETHERLANDS     |         1991 |
|         75011 | NETHERLANDS     |         1992 |
|         89949 | NETHERLANDS     |         1992 |
|         90766 | NETHERLANDS     |         NULL |
|         42927 | GERMANY         |         1924 |

In other words: `c_birth_country` is ordered descendingly, and where `c_birth_country` is equal, we sort on `c_birth_year` ascendingly.
By specifying `NULLS LAST`, null values are treated as the lowest value in the `c_birth_year` column.
Whole rows are thus reordered, not just the columns in the `ORDER BY` clause. The columns that are not in the `ORDER BY` clause we call "payload columns".
Therefore, payload column `c_customer_sk` has to be reordered too.

It is easy to implement something that can evaluate the example query using any sorting implementation, for instance, __C++__'s `std::sort`.
While `std::sort` is excellent algorithmically, it is still a single-threaded approach that is unable to efficiently sort by multiple columns because function call overhead would quickly dominate sorting time.
Below we will discuss why that is.

To achieve good performance when sorting tables, a custom sorting implementation is needed. We are – of course – not the first to implement relational sorting, so we dove into the literature to look for guidance.

In 2006 the famous Goetz Graefe wrote a survey on [implementing sorting in database systems](http://wwwlgis.informatik.uni-kl.de/archiv/wwwdvs.informatik.uni-kl.de/courses/DBSREAL/SS2005/Vorlesungsunterlagen/Implementing_Sorting.pdf).
In this survey, he collected many sorting techniques that are known to the community. This is a great guideline if you are about to start implementing sorting for tables.

The cost of sorting is dominated by comparing values and moving data around.
Anything that makes these two operations cheaper will have a big impact on the total runtime.

There are two obvious ways to go about implementing a comparator when we have multiple `ORDER BY` clauses:

1. Loop through the clauses: Compare columns until we find one that is not equal, or until we have compared all columns.
This is fairly complex already, as this requires a loop with an if/else inside of it for every single row of data.
If we have columnar storage, this comparator has to jump between columns, [causing random access in memory](https://ir.cwi.nl/pub/13805).
2. Entirely sort the data by the first clause, then sort by the second clause, but only where the first clause was equal, and so on.
This approach is especially inefficient when there are many duplicate values, as it requires multiple passes over the data.

#### Binary String Comparison

The binary string comparison technique improves sorting performance by simplifying the comparator. It encodes *all* columns in the `ORDER BY` clause into a single binary sequence that, when compared using `memcmp` will yield the correct overall sorting order. Encoding the data is not free, but since we are using the comparator so much during sorting, it will pay off.
Let us take another look at 3 rows of the example:

| c_birth_country | c_birth_year |
| --------------- | -----------: |
| NETHERLANDS     |         1991 |
| NETHERLANDS     |         1992 |
| GERMANY         |         1924 |

On [little-endian](https://en.wikipedia.org/wiki/Endianness) hardware, the bytes that represent these values look like this in memory, assuming 32-bit integer representation for the year:

```sql
c_birth_country
-- NETHERLANDS
01001110 01000101 01010100 01001000 01000101 01010010 01001100 01000001 01001110 01000100 01010011 00000000
-- GERMANY
01000111 01000101 01010010 01001101 01000001 01001110 01011001 00000000

c_birth_year
-- 1991
11000111 00000111 00000000 00000000
-- 1992
11001000 00000111 00000000 00000000
-- 1924
10000100 00000111 00000000 00000000
```

The trick is to convert these to a binary string that encodes the sorting order:

```sql
-- NETHERLANDS | 1991
10110001 10111010 10101011 10110111 10111010 10101101 10110011 10111110 10110001 10111011 10101100 11111111
10000000 00000000 00000111 11000111
-- NETHERLANDS | 1992
10110001 10111010 10101011 10110111 10111010 10101101 10110011 10111110 10110001 10111011 10101100 11111111
10000000 00000000 00000111 11001000
-- GERMANY     | 1924
10111000 10111010 10101101 10110010 10111110 10110001 10100110 11111111 11111111 11111111 11111111 11111111
10000000 00000000 00000111 10000100
```

The binary string is fixed-size because this makes it much easier to move it around during sorting. 

The string "GERMANY" is shorter than "NETHERLANDS", therefore it is padded with `00000000`'s.
All bits in column `c_birth_country` are subsequently inverted because this column is sorted descendingly.
If a string is too long we encode its prefix and only look at the whole string if the prefixes are equal.

The bytes in `c_birth_year` are swapped because we need the big-endian representation to encode the sorting order.
The first bit is also flipped, to preserve order between positive and negative integers for [signed integers](https://en.wikipedia.org/wiki/Signed_number_representations).
If there are `NULL` values, these must be encoded using an additional byte (not shown in the example).

With this binary string, we can now compare both columns at the same time by comparing only the binary string representation. 
This can be done with a single `memcmp` in __C++__! The compiler will emit efficient assembly for single function call, even auto-generating [SIMD instructions](https://en.wikipedia.org/wiki/SIMD).

This technique solves one of the problems mentioned above, namely the function call overhead when using complex comparators.

#### Radix Sort

Now that we have a cheap comparator, we have to choose our sorting algorithm.
Every computer science student learns about [comparison-based](https://en.wikipedia.org/wiki/Sorting_algorithm#Comparison_sorts) sorting algorithms like [Quicksort](https://en.wikipedia.org/wiki/Quicksort) and [Merge sort](https://en.wikipedia.org/wiki/Merge_sort), which have _O (n_ log _n)_ time complexity, where _n_ is the number of records being sorted.

However, there are also [distribution-based](https://en.wikipedia.org/wiki/Sorting_algorithm#Non-comparison_sorts) sorting algorithms, which typically have a time complexity of _O (n k)_, where _k_ is the width of the sorting key.
This class of sorting algorithms scales much better with a larger _n_ because _k_ is constant, whereas log _n_ is not.

One such algorithm is [Radix sort](https://en.wikipedia.org/wiki/Radix_sort).
This algorithm sorts the data by computing the data distribution with [Counting sort](https://en.wikipedia.org/wiki/Counting_sort), multiple times until all digits have been counted.

It may sound counter-intuitive to encode the sorting key columns such that we have a cheap comparator, and then choose a sorting algorithm that does not compare records.
However, the encoding is necessary for Radix sort: Binary strings that produce a correct order with `memcmp` will produce a correct order if we do a byte-by-byte Radix sort.

#### Two-Phase Parallel Sorting

DuckDB uses [Morsel-Driven Parallelism](https://15721.courses.cs.cmu.edu/spring2016/papers/p743-leis.pdf), a framework for parallel query execution.
For the sorting operator, this means that multiple threads collect roughly an equal amount of data, in parallel, from the table.

We use this parallelism for sorting by first having each thread sort the data it collects using our Radix sort.
After this first sorting phase, each thread has one or more sorted blocks of data, which must be combined into the final sorted result.
[Merge sort](https://en.wikipedia.org/wiki/Merge_sort) is the algorithm of choice for this task.
There are two main ways of implementing merge sort: [K-way merge](https://en.wikipedia.org/wiki/K-way_merge_algorithm) and [Cascade merge](https://en.wikipedia.org/wiki/Cascade_merge_sort).

K-way merge merges K lists into one sorted list in one pass, and is traditionally [used for external sorting (sorting more data than fits in memory)](https://en.wikipedia.org/wiki/External_sorting#External_merge_sort) because it minimizes I/O.
Cascade merge merges two lists of sorted data at a time until only one sorted list remains, and is used for in-memory sorting because it is more efficient than K-way merge.
We aim to have an implementation that has high in-memory performance, which gracefully degrades as we go over the limit of available memory.
Therefore, we choose cascade merge.

In a cascade merge sort, we merge two blocks of sorted data at a time until only one sorted block remains.
Naturally, we want to use all available threads to compute the merge.
If we have many more sorted blocks than threads, we can assign each thread to merge two blocks.
However, as the blocks get merged, we will not have enough blocks to keep all threads busy.
This is especially slow when the final two blocks are merged: One thread has to process all the data.

To fully parallelize this phase, we have implemented [Merge Path](https://arxiv.org/pdf/1406.2628.pdf) by Oded Green et al.
Merge Path pre-computes *where* the sorted lists will intersect while merging, shown in the image below (taken from the paper).

![](../images/blog/sorting/merge_path.png)


The intersections along the merge path can be efficiently computed using [Binary Search](https://en.wikipedia.org/wiki/Binary_search_algorithm).
If we know where the intersections are, we can merge partitions of the sorted data independently in parallel.
This allows us to use all available threads effectively for the entire merge phase.
For another trick to improve merge sort, see [the appendix](#::predication).

#### Columns or Rows?

Besides comparisons, the other big cost of sorting is moving data around.
DuckDB has a vectorized execution engine.
Data is stored in a columnar layout, which is processed in batches (called chunks) at a time.
This layout is great for analytical query processing because the chunks fit in the CPU cache, and it gives a lot of opportunities for the compiler to generate SIMD instructions.
However, when the table is sorted, entire rows are shuffled around, rather than columns.

We could stick to the columnar layout while sorting: Sort the key columns, then re-order the payload columns one by one.
However, re-ordering will cause a random access pattern in memory for each column.
If there are many payload columns, this will be slow.
Converting the columns to rows will make re-ordering rows much easier.
This conversion is of course not free: Columns need to be copied to rows, and back from rows to columns again after sorting.

Because we want to support external sorting, we have to store data in [buffer-managed](https://research.cs.wisc.edu/coral/minibase/bufMgr/bufMgr.html) blocks that can be offloaded to disk.
Because we have to copy the input data to these blocks anyway, converting the rows to columns is effectively free.

There are a few operators that are inherently row-based, such as joins and aggregations.
DuckDB has a unified internal row layout for these operators, and we decided to use it for the sorting operator as well.
This layout has only been used in memory so far.
In the next section, we will explain how we got it to work on disk as well. We should note that we will only write sorting data to disk if main memory is not able to hold it.

#### External Sorting

The buffer manager can unload blocks from memory to disk.
This is not something we actively do in our sorting implementation, but rather something that the buffer manager decides to do if memory would fill up otherwise.
It uses a least-recently-used queue to decide which blocks to write.
More on how to properly use this queue in [the appendix](#::zigzag).

When we need a block, we "pin" it, which reads it from disk if it is not loaded already.
Accessing disk is much slower than accessing memory, therefore it is crucial that we minimize the number of reads and writes.

Unloading data to disk is easy for fixed-size columns like integers, but more difficult for variable-sized columns like strings.
Our row layout uses fixed-size rows, which cannot fit strings with arbitrary sizes.
Therefore, strings are represented by a pointer, which points into a separate block of memory where the actual string data lives, a so-called "string heap".

We have changed our heap to also store strings row-by-row in buffer-managed blocks:

![](../images/blog/sorting/heap-light.svg)



Each row has an additional 8-byte field `pointer` which points to the start of this row in the heap.
This is useless in the in-memory representation, but we will see why it is useful for the on-disk representation in just a second.

If the data fits in memory, the heap blocks stay pinned, and only the fixed-size rows are re-ordered while sorting.
If the data does not fit in memory, the blocks need to be offloaded to disk, and the heap will also be re-ordered while sorting.
When a heap block is offloaded to disk, the pointers pointing into it are invalidated.
When we load the block back into memory, the pointers will have changed.

This is where our row-wise layout comes into play.
The 8-byte `pointer` field is overwritten with an 8-byte `offset` field, denoting where in the heap block strings of this row can be found.
This technique is called ["pointer swizzling"](https://en.wikipedia.org/wiki/Pointer_swizzling).
When we swizzle the pointers, the row layout and heap block look like this:

![](../images/blog/sorting/heap_swizzled-light.svg)



The pointers to the subsequent string values are also overwritten with an 8-byte relative offset, denoting how far this string is offset from the start of the row in the heap (hence every `stringA` has an offset of `0`: It is the first string in the row).
Using relative offsets within rows rather than absolute offsets is very useful during sorting, as these relative offsets stay constant, and do not need to be updated when a row is copied.

When the blocks need to be scanned to read the sorted result, we "unswizzle" the pointers, making them point to the string again.

With this dual-purpose row-wise representation, we can easily copy around both the fixed-size rows and the variable-sized rows in the heap.
Besides having the buffer manager load/unload blocks, the only difference between in-memory and external sorting is that we swizzle/unswizzle pointers to the heap blocks, and copy data from the heap blocks during merge sort. 

All this reduces overhead when blocks need to be moved in and out of memory, which will lead to graceful performance degradation as we approach the limit of available memory.

<a name="comparison"></a>

#### Comparison with Other Systems

Now that we have covered most of the techniques that are used in our sorting implementation, we want to know how we compare to other systems.
DuckDB is often used for interactive data analysis, and is therefore often compared to tools like [dplyr](https://dplyr.tidyverse.org).

In this setting, people are usually running on laptops or PCs, therefore we will run these experiments on a 2020 MacBook Pro.
This laptop has an [Apple M1 CPU](https://en.wikipedia.org/wiki/Apple_M1), which is [ARM](https://en.wikipedia.org/wiki/ARM_architecture)-based.
The M1 processor has 8 cores: 4 high-performance (Firestorm) cores, and 4 energy-efficient (Icestorm) cores.
The Firestorm cores have very, very fast single-thread performance, so this should level the playing field between single- and multi-threaded sorting implementations somewhat.
The MacBook has 16 GB of memory, and [one of the fastest SSDs found in a laptop](https://eclecticlight.co/2020/12/12/how-fast-is-the-ssd-inside-an-m1-mac/).

We will be comparing against the following systems:
1. [ClickHouse](https://clickhouse.tech), version 21.7.5
2. [HyPer](https://dbdb.io/db/hyper), version 2021.2.1.12564
3. [Pandas](https://pandas.pydata.org), version 1.3.2
4. [SQLite](https://www.sqlite.org/index.html), version 3.36.0

ClickHouse and HyPer are included in our comparison because they are analytical SQL engines with an emphasis on performance.
Pandas and SQLite are included in our comparison because they can be used to perform relational operations within Python, like DuckDB.
Pandas operates fully in memory, whereas SQLite is a more traditional disk-based system.
This list of systems should give us a good mix of single-/multi-threaded, and in-memory/external sorting.

ClickHouse was built for M1 using [this guide](https://clickhouse.tech/docs/en/development/build-osx/).
We have set the memory limit to 12 GB, and `max_bytes_before_external_sort` to 10 GB, following [this suggestion](https://clickhouse.tech/docs/en/sql-reference/statements/select/order-by/#implementation-details).

HyPer is [Tableau's data engine](https://www.tableau.com/products/new-features/hyper), created by the [database group at the University of Munich](http://db.in.tum.de).
It does not run natively (yet) on ARM-based processors like the M1.
We will use [Rosetta 2](https://en.wikipedia.org/wiki/Rosetta_(software)#Rosetta_2), macOS's x86 emulator to run it.
Emulation causes some overhead, so we have included an experiment on an x86 machine in [the appendix](#::x86).

Benchmarking sorting in database systems is not straightforward.
Ideally, we would like to measure only the time it takes to sort the data, not the time it takes to read the input data and show the output.
Not every system has a profiler to measure the time of the sorting operator exactly, so this is not an option.

To approach a fair comparison, we will measure the end-to-end time of queries that sort the data and write the result to a temporary table, i.e.:

```sql
CREATE TEMPORARY TABLE output AS
SELECT ...
FROM ...
ORDER BY ...;
```

There is no perfect solution to this problem, but this should give us a good comparison because the end-to-end time of this query should be dominated by sorting.
For Pandas we will use `sort_values` with `inplace=False` to mimic this query.

In ClickHouse, temporary tables can only exist in memory, which is problematic for our out-of-core experiments.
Therefore we will use a regular `TABLE`, but then we also need to choose a table engine.
Most of the table engines apply compression or create an index, which we do not want to measure.
Therefore we have chosen the simplest on-disk engine, which is [File](https://clickhouse.tech/docs/en/engines/table-engines/special/file/#file), with format [Native](https://clickhouse.tech/docs/en/interfaces/formats/#native).

The table engine we chose for the input tables for ClickHouse is [MergeTree](https://clickhouse.tech/docs/en/engines/table-engines/mergetree-family/mergetree/#mergetree) with `ORDER BY tuple()`.
We chose this because we encountered strange behavior with `File(Native)` input tables, where there was no difference in runtime between the queries `SELECT * FROM ... ORDER BY` and `SELECT col1 FROM ... ORDER BY`.
Presumably, because all columns in the table were sorted regardless of how many there were selected.

To measure stable end-to-end query time, we run each query 5 times and report the median run time.
There are some differences in reading/writing tables between the systems.
For instance, Pandas cannot read/write from/to disk, so both the input and output data frame will be in memory.
DuckDB will not write the output table to disk unless there is not enough room to keep it in memory, and therefore also may have an advantage.
However, sorting dominates the total runtime, so these differences are not that impactful.

#### Random Integers

We will start with a simple example.
We have generated the first 100 million integers and shuffled them, and we want to know how well the systems can sort them.
This experiment is more of a micro-benchmark than anything else and is of little real-world significance.

For our first experiment, we will look at how the systems scale with the number of rows.
From the initial table with integers, we have made 9 more tables, with 10M, 20M, ..., 90M integers each.

![](../images/blog/sorting/randints_scaling.svg)


Being a traditional disk-based database system, SQLite always opts for an external sorting strategy.
It writes intermediate sorted blocks to disk even if they fit in main-memory, therefore it is much slower.
The performance of the other systems is in the same ballpark, with DuckDB and ClickHouse going toe-to-toe with \~3 and \~4 seconds for 100M integers.
Because SQLite is so much slower, we will not include it in our next set of experiments (TPC-DS).

DuckDB and ClickHouse both make very good use out of all available threads, with a single-threaded sort in parallel, followed by a parallel merge sort.
We are not sure what strategy HyPer uses.
For our next experiment, we will zoom in on multi-threading, and see how well ClickHouse and DuckDB scale with the number of threads (we were not able to set the number of threads for HyPer).

![](../images/blog/sorting/randints_threads.svg)


This plot demonstrates that Radix sort is very fast.
DuckDB sorts 100M integers in just under 5 seconds using a single thread, which is much faster than ClickHouse.
Adding threads does not improve performance as much for DuckDB, because Radix Sort is so much faster than Merge Sort.
Both systems end up at about the same performance at 4 threads.

Beyond 4 threads we do not see performance improve much more, due to the CPU architecture.
For all of the of other the experiments, we have set both DuckDB and ClickHouse to use 4 threads.

For our last experiment with random integers, we will see how the sortedness of the input may impact performance.
This is especially important to do in systems that use Quicksort because Quicksort performs much worse on inversely sorted data than on random data.

![](../images/blog/sorting/randints_sortedness.svg)


Not surprisingly, all systems perform better on sorted data, sometimes by a large margin.
ClickHouse, Pandas, and SQLite likely have some optimization here: e.g., keeping track of sortedness in the catalog, or checking sortedness while scanning the input.
DuckDB and HyPer have only a very small difference in performance when the input data is sorted, and do not have such an optimization.
For DuckDB the slightly improved performance can be explained due to a better memory access pattern during sorting: When the data is already sorted the access pattern is mostly sequential.

Another interesting result is that DuckDB sorts data faster than some of the other systems can read already sorted data.

#### TPC-DS

For the next comparison, we have improvised a relational sorting benchmark on two tables from the standard [TPC Decision Support benchmark (TPC-DS)](http://www.tpc.org/tpcds/).
TPC-DS is challenging for sorting implementations because it has wide tables (with many columns, unlike the tables in [TPC-H](http://www.tpc.org/tpch/)), and a mix of fixed- and variable-sized types.
The number of rows increases with the scale factor.
The tables used here are `catalog_sales` and `customer`.

`catalog_sales` has 34 columns, all fixed-size types (integer and double), and grows to have many rows as the scale factor increases.
`customer` has 18 columns (10 integers, and 8 strings), and a decent amount of rows as the scale factor increases.
The row counts of both tables at each scale factor are shown in the table below.

|   SF |  customer | catalog_sales |
| ---: | --------: | ------------: |
|    1 |   100,000 |     1,441,548 |
|   10 |   500,000 |    14,401,261 |
|  100 | 2,000,000 |   143,997,065 |
|  300 | 5,000,000 |   260,014,080 |

We will use `customer` at SF100 and SF300, which fits in memory at every scale factor.
We will use `catalog_sales` table at SF10 and SF100, which does not fit in memory anymore at SF100.

The data was generated using DuckDB's TPC-DS extension, then exported to CSV in a random order to undo any ordering patterns that could have been in the generated data.

#### Catalog Sales (Numeric Types)

Our first experiment on the `catalog_sales` table is selecting 1 column, then 2 columns, ..., up to all 34, always ordering by `cs_quantity` and `cs_item_sk`.
This experiment will tell us how well the different systems can re-order payload columns.

![](../images/blog/sorting/tpcds_catalog_sales_payload.svg)


We see similar trends at SF10 and SF100, but for SF100, at around 12 payload columns or so, the data does not fit in memory anymore, and ClickHouse and HyPer show a big drop in performance.
ClickHouse switches to an external sorting strategy, which is much slower than its in-memory strategy.
Therefore, adding a few payload columns results in a runtime that is orders of magnitude higher.
At 20 payload columns ClickHouse runs into the following error:

```console
DB::Exception: Memory limit (for query) exceeded: would use 11.18 GiB (attempt to allocate chunk of 4204712 bytes), maximum: 11.18 GiB: (while reading column cs_list_price): (while reading from part ./store/523/5230c288-7ed5-45fa-9230-c2887ed595fa/all_73_108_2/ from mark 4778 with max_rows_to_read = 8192): While executing MergeTreeThread.
```

HyPer also drops in performance before erroring out with the following message:

```console
ERROR:  Cannot allocate 333982248 bytes of memory: The `global memory limit` limit of 12884901888 bytes was exceeded.
```

As far as we are aware, HyPer uses [`mmap`](https://man7.org/linux/man-pages/man2/mmap.2.html), which creates a mapping between memory and a file.
This allows the operating system to move data between memory and disk.
While useful, it is no substitute for a proper external sort, as it creates random access to disk, which is very slow.

Pandas performs surprisingly well on SF100, despite the data not fitting in memory.
Pandas can only do this because macOS dynamically increases swap size.
Most operating systems do not do this and would fail to load the data at all.
Using swap usually slows down processing significantly, but the SSD is so fast that there is no visible performance drop!

While Pandas loads the data, swap size grows to an impressive \~40 GB: Both the file and the data frame are fully in memory/swap at the same time, rather than streamed into memory.
This goes down to \~20 GB of memory/swap when the file is done being read.
Pandas is able to get quite far into the experiment until it crashes with the following error:

```console
UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
```

DuckDB performs well both in-memory and external, and there is no clear visible point at which data no longer fits in memory: Runtime is fast and reliable.

#### Customer (Strings & Integers)

Now that we have seen how the systems handle large amounts of fixed-size types, it is time to see some variable-size types!
For our first experiment on the `customer` table, we will select all columns, and order them by either 3 integer columns (` c_birth_year`, `c_birth_month`, `c_birth_day`), or by 2 string columns (` c_first_name`, `c_last_name`).
Comparing strings is much, much more difficult than comparing integers, because strings can have variable sizes, and need to be compare byte-by-byte, whereas integers always have the same comparison.

![](../images/blog/sorting/tpcds_customer_type_sort_barplot.svg)


As expected, ordering by strings is more expensive than ordering by integers, except for HyPer, which is impressive.
Pandas has only a slightly bigger difference between ordering by integers and ordering by strings than ClickHouse and DuckDB.
This difference is explained by an expensive comparator between strings.
Pandas uses [NumPy](https://numpy.org)'s sort, which is efficiently implemented in __C__.
However, when this sorts strings, it has to use virtual function calls to compare a Python string object, which is slower than a simple "`<`" between integers in __C__.
Nevertheless, Pandas performs well on the `customer` table.

In our next experiment, we will see how the payload type affects performance.
`customer` has 10 integer columns and 8 string columns.
We will either select all integer columns or all string columns and order by (` c_birth_year`, `c_birth_month`, `c_birth_day`) every time.

![](../images/blog/sorting/tpcds_customer_type_payload_barplot.svg)


As expected, re-ordering strings takes much more time than re-ordering integers.
Pandas has an advantage here because it already has the strings in memory, and most likely only needs to re-order pointers to these strings.
The database systems need to copy strings twice: Once when reading the input table, and again when creating the output table.
Profiling in DuckDB reveals that the actual sorting takes less than a second at SF300, and most time is spent on (de)serializing strings.

#### Conclusion

DuckDB's new parallel sorting implementation can efficiently sort more data than fits in memory, making use of the speed of modern SSDs.
Where other systems crash because they run out of memory, or switch to an external sorting strategy that is much slower, DuckDB's performance gracefully degrades as it goes over the memory limit.

The code that was used to run the experiments can be found on [GitHub](https://github.com/lnkuiper/experiments/tree/master/sorting).
If we made any mistakes, please let us know!

DuckDB is a free and open-source database management system (MIT licensed). It aims to be the SQLite for Analytics, and provides a fast and efficient database system with zero external dependencies. It is available not just for Python, but also for C/C++, R, Java, and more.

[Discuss this post on Hacker News](https://news.ycombinator.com/item?id=28328657)

[Read our paper on sorting at ICDE '23](https://hannes.muehleisen.org/publications/ICDE2023-sorting.pdf)

Listen to Laurens' appearance on the Disseminate podcast:

* [Spotify](https://open.spotify.com/show/6IQIF9oRSf0FPjBUj0AkYA)
* [Google](https://podcasts.google.com/feed/aHR0cHM6Ly9mZWVkcy5hY2FzdC5jb20vcHVibGljL3Nob3dzL2Rpc3NlbWluYXRl)
* [Apple](https://podcasts.apple.com/us/podcast/disseminate-the-computer-science-research-podcast/id1631350873)

<a name="predication"></a>

#### Appendix A: Predication

Another technique we have used to speed up merge sort is _predication_.
With this technique, we turn code with _if/else_ branches into code without branches.
Modern CPUs try to predict whether the _if_, or the _else_ branch will be predicted.
If this is hard to predict, it can slow down the code.
Take a look at the example of pseudo-code with branches below.

```cpp
// continue until merged
while (l_ptr && r_ptr) {
  // check which side is smaller
  if (memcmp(l_ptr, r_ptr, entry) < 0) {
    // copy from left side and advance
    memcpy(result_ptr, l_ptr, entry);
    l_ptr += entry;
  } else {
    // copy from right side and advance
    memcpy(result_ptr, r_ptr, entry);
    r_ptr += entry;
  }
  // advance result
  result_ptr += entry;
}
```

We are merging the data from the left and right blocks into a result block, one entry at a time, by advancing pointers.
This code can be made _branchless_ by using the comparison boolean as a 0 or 1, shown in the pseudo-code below.

```cpp
// continue until merged
while (l_ptr && r_ptr) {
  // store comparison result in a bool
  bool left_less = memcmp(l_ptr, r_ptr, entry) < 0;
  bool right_less = 1 - left_less;
  // copy from either side
  memcpy(result_ptr, l_ptr, left_less * entry);
  memcpy(result_ptr, r_ptr, right_less * entry);
  // advance either one
  l_ptr += left_less * entry;
  l_ptr += right_less * entry;
  // advance result
  result_ptr += entry;
}
```

When `left_less` is true, it is equal to 1.
This means `right_less` is false, and therefore equal to 0.
We use this to copy `entry` bytes from the left side, and 0 bytes from the right side, and incrementing the left and right pointers accordingly.

With predicated code, the CPU does not have to predict which instructions to execute, which means there will be fewer instruction cache misses!

<a name="zigzag"></a>

#### Appendix B: Zig-Zagging

A simple trick to reduce I/O is zig-zagging through the pairs of blocks to merge in the cascaded merge sort.
This is illustrated in the image below (dashes arrows indicate in which order the blocks are merged).

![](../images/blog/sorting/zigzag-light.svg)



By zig-zagging through the blocks, we start an iteration by merging the last blocks that were merged in the previous iteration.
Those blocks are likely still in memory, saving us some precious read/write operations.

<a name="x86"></a>

#### Appendix C: x86 Experiment

We also ran the `catalog_sales` SF100 experiment on a machine with x86 CPU architecture, to get a more fair comparison with HyPer (without Rosetta 2 emulation).
The machine has an Intel(R) Xeon(R) W-2145 CPU @ 3.70 GHz, which has 8 cores (up to 16 virtual threads), and 128 GB of RAM, so this time the data fits fully in memory.
We have set the number of threads that DuckDB and ClickHouse use to 8 because we saw no visibly improved performance past 8.

![](../images/blog/sorting/jewels_payload.svg)


Pandas performs comparatively worse than on the MacBook, because it has a single-threaded implementation, and this CPU has a lower single-thread performance.
Again, Pandas crashes with an error (this machine does not dynamically increase swap):

```console
numpy.core._exceptions.MemoryError: Unable to allocate 6.32 GiB for an array with shape (6, 141430723) and data type float64
```

DuckDB, HyPer, and ClickHouse all make good use out of more available threads, being significantly faster than on the MacBook.

An interesting pattern in this plot is that DuckDB and HyPer scale very similarly with additional payload columns.
Although DuckDB is faster at sorting, re-ordering the payload seems to cost about the same for both systems.
Therefore it is likely that HyPer also uses a row layout.

ClickHouse scales worse with additional payload columns.
ClickHouse does not use a row layout, and therefore has to pay the cost of random access as each column is re-ordered after sorting.

## Windowing in DuckDB

**Publication date:** 2021-10-13

**Author:** Richard Wesley

**TL;DR:** DuckDB, a free and open-source analytical data management system, has a state-of-the-art windowing engine that can compute complex moving aggregates like inter-quartile ranges as well as simpler moving averages.

Window functions (those using the `OVER` clause) are important tools for analyzing data series,
but they can be slow if not implemented carefully.
In this post, we will take a look at how DuckDB implements windowing.
We will also see how DuckDB can leverage its aggregate function architecture
to compute useful moving aggregates such as moving inter-quartile ranges (IQRs).

#### Beyond Sets

The original relational model as developed by Codd in the 1970s treated relations as *unordered sets* of tuples.
While this was nice for theoretical computer science work,
it ignored the way humans think using physical analogies (the "embodied brain" model from neuroscience).
In particular, humans naturally order data to help them understand it and engage with it.
To help with this, SQL uses the `SELECT` clause for horizontal layout and the `ORDER BY` clause for vertical layout.

Still, the orderings that humans put on data are often more than neurological crutches.
For example, time places a natural ordering on measurements,
and wide swings in those measurements can themselves be important data,
or they may indicate that the data needs to be cleaned by smoothing.
Trends may be present or relative changes may be more important for analysis than raw values.
To help answer such questions, SQL introduced *analytic* (or *window*) functions in 2003.

##### Window Functions

Windowing works by breaking a relation up into independent *partitions*, *ordering* those partitions,
and then defining [various functions](#docs:lts:sql:functions:window_functions) that can be computed for each row
using the nearby values.
These functions include all the aggregate functions (such as `sum` and `avg`)
as well as some window-specific functions (such as `rank()` and `nth_value(<expression>, <N>)`).

Some window functions depend only on the partition boundary and the ordering,
but a few (including all the aggregates) also use a *frame*.
Frames are specified as a number of rows on either side (*preceding* or *following*) of the *current row*.
The distance can either be specified as a number of *rows* or a *range* of values
using the partition's ordering value and a distance.

![](../images/blog/windowing/framing.svg)


Framing is the most confusing part of the windowing environment,
so let's look at a very simple example and ignore the partitioning and ordering for a moment.

```sql
SELECT points,
    sum(points) OVER (
        ROWS BETWEEN 1 PRECEDING
                 AND 1 FOLLOWING) AS we
FROM results;
```

This query computes the `sum` of each point and the points on either side of it:

![](../images/blog/windowing/moving-sum.jpg)


Notice that at the edge of the partition, there are only two values added together.

##### Power Generation Example

Now let's look at a concrete example of a window function query.
Suppose we have some power plant generation data:

| Plant     | Date       |    MWh |
| :-------- | :--------- | -----: |
| Boston    | 2019-01-02 | 564337 |
| Boston    | 2019-01-03 | 507405 |
| Boston    | 2019-01-04 | 528523 |
| Boston    | 2019-01-05 | 469538 |
| Boston    | 2019-01-06 | 474163 |
| Boston    | 2019-01-07 | 507213 |
| Boston    | 2019-01-08 | 613040 |
| Boston    | 2019-01-09 | 582588 |
| Boston    | 2019-01-10 | 499506 |
| Boston    | 2019-01-11 | 482014 |
| Boston    | 2019-01-12 | 486134 |
| Boston    | 2019-01-13 | 531518 |
| Worcester | 2019-01-02 | 118860 |
| Worcester | 2019-01-03 | 101977 |
| Worcester | 2019-01-04 | 106054 |
| Worcester | 2019-01-05 |  92182 |
| Worcester | 2019-01-06 |  94492 |
| Worcester | 2019-01-07 |  99932 |
| Worcester | 2019-01-08 | 118854 |
| Worcester | 2019-01-09 | 113506 |
| Worcester | 2019-01-10 |  96644 |
| Worcester | 2019-01-11 |  93806 |
| Worcester | 2019-01-12 |  98963 |
| Worcester | 2019-01-13 | 107170 |

The data is noisy, so we want to compute a 7 day moving average for each plant.
To do this, we can use this window query:

```sql
SELECT "Plant", "Date",
    avg("MWh") OVER (
        PARTITION BY "Plant"
        ORDER BY "Date" ASC
        RANGE BETWEEN INTERVAL 3 DAYS PRECEDING
                  AND INTERVAL 3 DAYS FOLLOWING)
        AS "MWh 7-day Moving Average"
FROM "Generation History"
ORDER BY 1, 2;
```

This query computes the seven day moving average of the power generated by each power plant on each day.
The `OVER` clause is the way that SQL specifies that a function is to be computed in a window.
It partitions the data by `Plant` (to keep the different power plants' data separate),
orders each plant's partition by `Date` (to put the energy measurements next to each other),
and uses a `RANGE` frame of three days on either side of each day for the `avg`
(to handle any missing days).
Here is the result:

| Plant     | Date       | MWh 7-day<br>Moving Average |
| :-------- | :--------- | --------------------------: |
| Boston    | 2019-01-02 |                   517450.75 |
| Boston    | 2019-01-03 |                   508793.20 |
| Boston    | 2019-01-04 |                   508529.83 |
| Boston    | 2019-01-05 |                   523459.85 |
| Boston    | 2019-01-06 |                   526067.14 |
| Boston    | 2019-01-07 |                   524938.71 |
| Boston    | 2019-01-08 |                   518294.57 |
| Boston    | 2019-01-09 |                   520665.42 |
| Boston    | 2019-01-10 |                   528859.00 |
| Boston    | 2019-01-11 |                   532466.66 |
| Boston    | 2019-01-12 |                   516352.00 |
| Boston    | 2019-01-13 |                   499793.00 |
| Worcester | 2019-01-02 |                   104768.25 |
| Worcester | 2019-01-03 |                   102713.00 |
| Worcester | 2019-01-04 |                   102249.50 |
| Worcester | 2019-01-05 |                   104621.57 |
| Worcester | 2019-01-06 |                   103856.71 |
| Worcester | 2019-01-07 |                   103094.85 |
| Worcester | 2019-01-08 |                   101345.14 |
| Worcester | 2019-01-09 |                   102313.85 |
| Worcester | 2019-01-10 |                   104125.00 |
| Worcester | 2019-01-11 |                   104823.83 |
| Worcester | 2019-01-12 |                   102017.80 |
| Worcester | 2019-01-13 |                    99145.75 |

You can request multiple different `OVER` clauses in the same `SELECT`, and each will be computed separately.
Often, however, you want to use the same window for multiple functions,
and you can do this by using a `WINDOW` clause to define a *named* window:

```sql
SELECT "Plant", "Date",
    avg("MWh") OVER seven AS "MWh 7-day Moving Average"
FROM "Generation History"
WINDOW seven AS (
    PARTITION BY "Plant"
    ORDER BY "Date" ASC
    RANGE BETWEEN INTERVAL 3 DAYS PRECEDING
              AND INTERVAL 3 DAYS FOLLOWING)
ORDER BY 1, 2;
```

This would be useful, for example,
if one also wanted the 7-day moving  `min` and `max`  to show the bounds of the data.

#### Under the Feathers

That is a long list of complicated functionality!
Making it all work relatively quickly has many pieces,
so lets have a look at how they all get implemented in DuckDB.

##### Pipeline Breaking

The first thing to notice is that windowing is a "pipeline breaker".
That is, the `Window` operator has to read all of its inputs before it can start computing a function.
This means that if there is some other way to compute something,
it may well be faster to use a different technique.

One common analytic task is to find the last value in some group.
For example, suppose we want the last recorded power output for each plant.
It is tempting to use the `rank()` window function with a reverse sort for this task:

```sql
SELECT "Plant", "MWh"
FROM (
    SELECT "Plant", "MWh",
        rank() OVER (
            PARTITION BY "Plant"
            ORDER BY "Date" DESC) AS r
    FROM table) t
WHERE r = 1;
```

but this requires materialising the entire table, partitioning it, sorting the partitions,
and then pulling out a single row from those partitions.
A much faster way to do this is to use a self join to filter the table
to contain only the last (` max`) value of the `DATE` field:

```sql
SELECT table."Plant", "MWh"
FROM table,
    (SELECT "Plant", max("Date") AS "Date"
     FROM table GROUP BY 1) lasts
WHERE table."Plant" = lasts."Plant"
  AND table."Date" = lasts."Date";
```

This join query requires two scans of the table, but the only materialised data is the filtering table
(which is probably much smaller than the original table), and there is no sorting at all.

This type of query showed up [in a user's blog](https://bwlewis.github.io/duckdb_and_r/last/last.html)
and we found that the join query was over 20 times faster on their data set:

![](../images/blog/windowing/last-in-group.jpg)


Of course most analytic tasks that use windowing *do* require using the `Window` operator,
and DuckDB uses a collection of techniques to make the performance as fast as possible.

##### Partitioning and Sorting

At one time, windowing was implemented by sorting on both the partition and the ordering fields
and then finding the partition boundaries.
This is resource intensive, both because the entire relation must be sorted,
and because sorting is `O(N log N)` in the size of the relation.
Fortunately, there are faster ways to implement this step.

To reduce resource consumption, DuckDB uses the partitioning scheme from Leis et al.'s
[*Efficient Processing of Window Functions in Analytical SQL Queries*](http://www.vldb.org/pvldb/vol8/p1058-leis.pdf)
and breaks the partitions up into 1024 chunks using `O(N)` hashing.
The chunks still need to be sorted on all the fields because there may be hash collisions,
but each partition can now be 1024 times smaller, which reduces the runtime significantly.
Moreover, the partitions can easily be extracted and processed in parallel.

Sorting in DuckDB recently got a [big performance boost](https://duckdb.org/2021/08/27/external-sorting),
along with the ability to work on partitions that were larger than memory.
This functionality has been also added to the `Window` operator,
resulting in a 33% improvement in the last-in-group example:

![](../images/blog/windowing/last-in-group-sort.jpg)


As a final optimization, even though you can request multiple window functions,
DuckDB will collect functions that use the same partitioning and ordering,
and share the data layout between those functions.

##### Aggregation

Most of the [general-purpose window functions](#docs:lts:sql:functions:window_functions) are straightforward to compute,
but windowed aggregate functions can be expensive because they need to look at multiple values for each row.
They often need to look at the same value multiple times, or repeatedly look at a large number of values,
so over the years several approaches have been taken to improve performance.

###### Naïve Windowed Aggregation

Before explaining how DuckDB implements windowed aggregation,
we need to take a short detour through how ordinary aggregates are implemented.
Aggregate "functions" are implemented using three required operations and one optional operation:
* *Initialize* – Creates a state that will be updated.
For `sum`, this is the running total, starting at `NULL` (because a `sum` of zero items is `NULL`, not zero.)
* *Update* – Updates the state with a new value. For `sum`, this adds the value to the state.
* *Finalize* – Produces the final aggregate value from the state.
For `sum`, this just copies the running total.
* *Combine* – Combines two states into a single state.
Combine is optional, but when present it allows the aggregate to be computed in parallel.
For `sum`, this produces a new state with the sum of the two input values.

The simplest way to compute an individual windowed aggregate value is to *initialize* a state,
*update* the state with all the values in the window frame,
and then use *finalize* to produce the value of the windowed aggregate.
This naïve algorithm will always work, but it is quite inefficient.
For example, a running total will re-add all the values from the start of the partition
for each running total, and this has a run time of `O(N^2)`.

To improve on this, some databases add additional
["moving state" operations](https://www.postgresql.org/docs/14/sql-createaggregate.html)
that can add or remove individual values incrementally.
This reduces computation in some common cases,
but it can only be used for certain aggregates.
For example, it doesn't work for `min`) because you don't know if there are multiple duplicate minima.
Moreover, if the frame boundaries move around a lot, it can still degenerate to an `O(N^2)` run time.

###### Segment Tree Aggregation

Instead of adding more functions, DuckDB uses the *segment tree* approach from Leis et al. above.
This works by building a tree on top of the entire partition with the aggregated values at the bottom.
Values are combined into states at nodes above them in the tree until there is a single root:

![](../images/blog/windowing/segment-tree.png)


To compute a value, the algorithm generates states for the ragged ends of the frame,
*combines* states in the tree above the values in the frame,
and *finalizes* the result from the last remaining state.
So in the example above (Figure 5 from Leis et al.) only three values need to be added instead of 7.
This technique can be used for all *combinable* aggregates.

###### General Windowed Aggregation

The biggest drawback of segment trees is the need to manage a potentially large number of intermediate states.
For the simple states used for standard distributive aggregates like `sum`,
this is not a problem because the states are small,
the tree keeps the number of states logarithmically low,
and the state used to compute each value is also cheap.

For some aggregates, however, the state is not small.
Typically these are so-called *holistic* aggregates,
where the value depends on all the values of the frame.
Examples of such aggregates are `mode` and `quantile`,
where each state may have to contain a copy of *all* the values seen so far.
While segment trees *can* be used to implement moving versions of any combinable aggregate,
this can be quite expensive for large, complex states –
and this was not the original goal of the algorithm.

To solve this problem, we use the approach from Wesley and Xu's
[*Incremental Computation of Common Windowed Holistic Aggregates*](http://www.vldb.org/pvldb/vol9/p1221-wesley.pdf),
which generalises segment trees to aggregate-specific data structures.
The aggregate can define a fifth optional *window* operation,
which will be passed the bottom of the tree and the bounds of the current and previous frame.
The aggregate can then create an appropriate data structure for its implementation.

For example, the `mode` function maintains a hash table of counts that it can update efficiently,
and the `quantile` function maintains a partially sorted list of frame indexes.
Moreover, the `quantile` functions can take an array of quantile values,
which further increases performance by sharing the partially ordered results
among the different quantile values.

Because these aggregates can be used in a windowing context,
the moving average example above can be easily modified to produce a moving inter-quartile range:

```sql
SELECT "Plant", "Date",
    quantile_cont("MWh", [0.25, 0.5, 0.75]) OVER seven
        AS "MWh 7-day Moving IQR"
FROM "Generation History"
WINDOW seven AS (
    PARTITION BY "Plant"
    ORDER BY "Date" ASC
    RANGE BETWEEN INTERVAL 3 DAYS PRECEDING
              AND INTERVAL 3 DAYS FOLLOWING)
ORDER BY 1, 2;
```

Moving quantiles like this are
[more robust to anomalies](https://blogs.sas.com/content/iml/2021/05/26/running-median-smoother.html),
which makes them a valuable tool for data series analysis,
but they are not generally implemented in most database systems.
There are some approaches that can be used in some query engines,
but the lack of a general moving aggregation architecture means that these solutions can be
[unnatural](https://docs.oracle.com/cd/E57185_01/HIRUG/ch12s07s08.html)
or [complex](https://ndesmo.github.io/blog/oracle-moving-metrics/).
DuckDB's implementation uses the standard window notation,
which means you don't have to learn new syntax or pull the data out into another tool.

###### Ordered Set Aggregates

Window functions are often closely associated with some special
"[ordered set aggregates](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE)"
defined by the SQL standard.
Some databases implement these functions using the `Window` operator,
but this is rather inefficient because sorting the data (an `O(N log N)` operation) is not required –
it suffices to use
Hoare's `O(N)`
[`FIND`](https://courses.cs.vt.edu/~cs3114/Summer15/Notes/Supplemental/p321-hoare.pdf)
algorithm as used in the STL's
[`std::nth_element`](https://en.cppreference.com/w/cpp/algorithm/nth_element).
DuckDB translates these ordered set aggregates to use the faster `quantile_cont`, `quantile_disc`,
and `mode` regular aggregate functions, thereby avoiding using windowing entirely.

###### Extensions

This architecture also means that any new aggregates we add
can benefit from the existing windowing infrastructure.
DuckDB is an open source project, and we welcome submissions of useful aggregate functions –
or you can create your own domain-specific ones in your own fork.
At some point we hope to have a UDF architecture that will allow plug-in aggregates,
and the simplicity and power of the interface will let these plugins leverage the notational
simplicity and run time performance that the internal functions enjoy.

#### Conclusion

DuckDB's windowing implementation uses a variety of techniques
to speed up what can be the slowest part of an analytic query.
It is well integrated with the sorting subsystem and the aggregate function architecture,
which makes expressing advanced moving aggregates both natural and efficient.

DuckDB is a free and open-source database management system (MIT licensed).
It aims to be the SQLite for Analytics,
and provides a fast and efficient database system with zero external dependencies.
It is available not just for Python, but also for C/C++, R, Java, and more.

## DuckDB-Wasm: Efficient Analytical SQL in the Browser

**Publication date:** 2021-10-29

**Authors:** André Kohn, Dominik Moritz

**TL;DR:** [DuckDB-Wasm](https://github.com/duckdb/duckdb-wasm) is an in-process analytical SQL database for the browser. It is powered by WebAssembly, speaks Arrow fluently, reads Parquet, CSV and JSON files backed by Filesystem APIs or HTTP requests and has been tested with Chrome, Firefox, Safari and Node.js. You can try it at [shell.duckdb.org](https://shell.duckdb.org) or on [Observable](https://observablehq.com/@cmudig/duckdb).

![](../images/blog/duckdb_wasm-light.svg)



*DuckDB-Wasm is fast! If you're here for performance numbers, head over to our benchmarks at [shell.duckdb.org/versus](https://shell.duckdb.org/versus).*

#### Efficient Analytics in the Browser

The web browser has evolved to a universal computation platform that even runs in your car. Its rise has been accompanied by increasing requirements for the browser programming language JavaScript.
JavaScript was, first and foremost, designed to be very flexible which comes at the cost of a reduced processing efficiency compared to native languages like C++.
This becomes particularly apparent when considering the execution times of more complex data analysis tasks that often fall behind the native execution by orders of magnitude.
In the past, such analysis tasks have therefore been pushed to servers that tie any client-side processing to additional round-trips over the internet and introduce their own set of scalability problems.

The processing capabilities of browsers were boosted tremendously 4 years ago with the introduction of WebAssembly:

> WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable compilation target for programming languages, enabling deployment on the web for client and server applications.
>
> The Wasm stack machine is designed to be encoded in a size- and load-time efficient binary format. WebAssembly aims to execute at native speed by taking advantage of common hardware capabilities available on a wide range of platforms.
> 
> (ref: [https://webassembly.org/](https://webassembly.org/))

Four years later, the WebAssembly revolution is in full progress with first implementations being shipped in four major browsers. It has already brought us game engines, [entire IDEs](https://blog.stackblitz.com/posts/introducing-webcontainers/) and even a browser version of [Photoshop](https://web.dev/ps-on-the-web/). Today, we join the ranks with a first release of the npm library [@duckdb/duckdb-wasm](https://www.npmjs.com/package/@duckdb/duckdb-wasm).

As an in-process analytical database, DuckDB has the rare opportunity to significantly speed up OLAP workloads in the browser. We believe that there is a need for a comprehensive and self-contained data analysis library. DuckDB-wasm automatically offloads your queries to dedicated worker threads and reads Parquet, CSV and JSON files from either your local filesystem or HTTP servers driven by plain SQL input.
In this blog post, we want to introduce the library and present challenges on our journey towards a browser-native OLAP database.

*DuckDB-Wasm is not yet stable. You will find rough edges and bugs in this release. Please share your thoughts with us [on GitHub](https://github.com/duckdb/duckdb-wasm/discussions).*

#### How to Get Data In?

Let's dive into examples.
DuckDB-Wasm provides a variety of ways to load your data. First, raw SQL value clauses like `INSERT INTO sometable VALUES (1, 'foo'), (2, 'bar')` are easy to formulate and only depend on plain SQL text. Alternatively, SQL statements like `CREATE TABLE foo AS SELECT * FROM 'somefile.parquet'` consult our integrated web filesystem to resolve `somefile.parquet` locally, remotely or from a buffer. The methods `insertCSVFromPath` and `insertJSONFromPath` further provide convenient ways to import CSV and JSON files using additional typed settings like column types. And finally, the method `insertArrowFromIPCStream` (optionally through `insertArrowTable`, `insertArrowBatches` or `insertArrowVectors`) copies raw IPC stream bytes directly into a WebAssembly stream decoder.

The following example presents different options how data can be imported into DuckDB-Wasm:

```ts
// Data can be inserted from an existing arrow.Table
await c.insertArrowTable(existingTable, { name: "arrow_table" });
// ..., from Arrow vectors
await c.insertArrowVectors({
    col1: arrow.Int32Vector.from([1, 2]),
    col2: arrow.Utf8Vector.from(["foo", "bar"]),
}, {
    name: "arrow_vectors"
});
// ..., from a raw Arrow IPC stream
const c = await db.connect();
const streamResponse = await fetch(` someapi`);
const streamReader = streamResponse.body.getReader();
const streamInserts = [];
while (true) {
    const { value, done } = await streamReader.read();
    if (done) break;
    streamInserts.push(c.insertArrowFromIPCStream(value, { name: "streamed" }));
}
await Promise.all(streamInserts);

// ..., from CSV files
// (interchangeable: registerFile{Text,Buffer,URL,Handle})
await db.registerFileText(` data.csv`, "1|foo\n2|bar\n");
// ... with typed insert options
await db.importCSVFromPath('data.csv', {
    schema: 'main',
    name: 'foo',
    detect: false,
    header: false,
    delimiter: '|',
    columns: {
        col1: new arrow.Int32(),
        col2: new arrow.Utf8(),
    }
});

// ..., from JSON documents in row-major format
await db.registerFileText("rows.json", `[
    { "col1": 1, "col2": "foo" },
    { "col1": 2, "col2": "bar" },
]`);
// ... or column-major format
await db.registerFileText("columns.json", `{
    "col1": [1, 2],
    "col2": ["foo", "bar"]
}`);
// ... with typed insert options
await db.importJSONFromPath('rows.json', { name: 'rows' });
await db.importJSONFromPath('columns.json', { name: 'columns' });

// ..., from Parquet files
const pickedFile: File = letUserPickFile();
await db.registerFileHandle("local.parquet", pickedFile);
await db.registerFileURL("remote.parquet", "https://origin/remote.parquet");

// ..., by specifying URLs in the SQL text
await c.query(` 
    CREATE TABLE direct AS
        SELECT * FROM 'https://origin/remote.parquet'
`);
// ..., or by executing raw insert statements
await c.query(` INSERT INTO existing_table
    VALUES (1, "foo"), (2, "bar")`);
```

#### How to Get Data Out?

Now that we have the data loaded, DuckDB-Wasm can run queries on two different ways that differ in the result materialization. First, the method `query` runs a query to completion and returns the results as single `arrow.Table`. Second, the method `send` fetches query results lazily through an `arrow.RecordBatchStreamReader`. Both methods are generic and allow for typed results in Typescript:

```ts
// Either materialize the query result
await conn.query<{ v: arrow.Int32 }>(` 
    SELECT * FROM generate_series(1, 100) t(v)
`);
// ..., or fetch the result chunks lazily
for await (const batch of await conn.send<{ v: arrow.Int32 }>(` 
    SELECT * FROM generate_series(1, 100) t(v)
`)) {
    // ...
} 
```

Alternatively, you can prepare statements for parameterized queries using:

```ts
// Prepare query
const stmt = await conn.prepare<{ v: arrow.Int32 }>(
    `SELECT (v + ?) AS v FROM generate_series(0, 10000) t(v);`
);
// ... and run the query with materialized results
await stmt.query(234);
// ... or result chunks
for await (const batch of await stmt.send(234)) {
    // ...
}
```

#### Looks like Arrow to Me

DuckDB-Wasm uses [Arrow](https://arrow.apache.org) as data protocol for the data import and all query results. Arrow is a database-friendly columnar format that is organized in chunks of column vectors, called record batches and that support zero-copy reads with only a small overhead. The npm library `apache-arrow` implements the Arrow format in the browser and is already used by other data processing frameworks, like [Arquero](https://github.com/uwdata/arquero). Arrow therefore not only spares us the implementation of the SQL type logic in JavaScript, it also makes us compatible to existing tools.

_Why not use plain Javascript objects?_

WebAssembly is isolated and memory-safe. This isolation is part of it's DNA and drives fundamental design decisions in DuckDB-Wasm. For example, WebAssembly introduces a barrier towards the traditional JavaScript heap. Crossing this barrier is difficult as JavaScript has to deal with native function calls, memory ownership and serialization performance. Languages like C++ make this worse as they rely on smart pointers that are not available through the FFI. They leave us with the choice to either pass memory ownership to static singletons within the WebAssembly instance or maintain the memory through C-style APIs in JavaScript, a language that is too dynamic for sound implementations of the [RAII idiom](https://en.wikipedia.org/wiki/Resource_acquisition_is_initialization). The memory-isolation forces us to serialize data before we can pass it to the WebAssembly instance. Browsers can serialize JavaScript objects natively to and from JSON using the functions `JSON.stringify` and `JSON.parse` but this is slower compared to, for example, copying raw native arrays.

#### Web Filesystem

DuckDB-Wasm integrates a dedicated filesystem for WebAssembly. DuckDB itself is built on top of a virtual filesystem that decouples higher level tasks, such as reading a Parquet file, from low-level filesystem APIs that are specific to the operating system. We leverage this abstraction in DuckDB-Wasm to tailor filesystem implementations to the different WebAssembly environments.

The following figure shows our current web filesystem in action. The sequence diagram presents a user running a SQL query that scans a single Parquet file. The query is first offloaded to a dedicated web worker through a JavaScript API. There, it is passed to the WebAssembly module that processes the query until the execution hits the `parquet_scan` table function. This table function then reads the file using a buffered filesystem which, in turn, issues paged reads on the web filesystem. This web filesystem then uses an environment-specific runtime to read the file from several possible locations.

<p align="center">
    ![](../images/blog/webfs-light.svg)

    
</p>

Depending on the context, the Parquet file may either reside on the local device, on a remote server or in a buffer that was registered by the user upfront. We deliberately treat all three cases equally to unify the retrieval and processing of external data. This does not only simplify the analysis, it also enables more advanced features like partially consuming structured file formats. Parquet files, for example, consist of multiple row groups that store data in a column-major fashion. As a result, we may not need to download the entire file for a query but only required bytes.

A query like `SELECT count(*) FROM parquet_scan(...)`, for example, can be evaluated on the file metadata alone and will finish in milliseconds even on remote files that are several terabytes large. Another more general example are paging scans with `LIMIT` and `OFFSET` qualifiers such as `SELECT * FROM parquet_scan(...) LIMIT 20 OFFSET 40`, or queries with selective filter predicates where entire row groups can be skipped based on metadata statistics. These partial file reads are no groundbreaking novelty and could be implemented in JavaScript today, but with DuckDB-Wasm, these optimizations are now driven by the semantics of SQL queries instead of fine-tuned application logic.

*Note: The common denominator among the available File APIs is unfortunately not large. This limits the features that we can provide in the browser. For example, local persistency of DuckDB databases would be a feature with significant impact but requires a way to either read and write synchronously into user-provided files or IndexedDB. We might be able to bypass these limitations in the future but this is subject of ongoing research.*

#### Advanced Features

WebAssembly 1.0 has landed in all major browsers. The WebAssembly Community Group fixed the design of this first version back in November 2017, which is now referred to as WebAssembly MVP. Since then, the development has been ongoing with eight additional features that have been added to the standard and at least five proposals that are currently in progress.

The rapid pace of this development presents challenges and opportunities for library authors. On the one hand, the different features find their way into the browsers at different speeds which leads to a fractured space of post-MVP functionality. On the other hand, features can bring flat performance improvements and are therefore indispensable when aiming for a maximum performance.

The most promising feature for DuckDB-Wasm is [exception handling](https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/Exceptions.md) which is already enabled by default in Chrome 95. DuckDB and DuckDB-Wasm are written in C++ and use exceptions for faulty situations. DuckDB does not use exceptions for general control flow but to automatically propagate errors upwards to the top-level plan driver. In native environments, these exceptions are implemented as "zero-cost exceptions" as they induce no overhead until they are thrown. With the WebAssembly MVP, however, that is no longer possible as the compiler toolchain Emscripten has to emulate exceptions through JavaScript. Without WebAssembly exceptions, DuckDB-Wasm calls throwing functions through a JavaScript hook that can catch exceptions emulated through JavaScript `aborts`. An example for these hook calls is shown in the following figure. Both stack traces originate from a single paged read of a Parquet file in DuckDB-Wasm. The left side shows a stack trace with the WebAssembly MVP and requires multiple calls through the functions `wasm-to-js-i*` . The right stack trace uses WebAssembly exceptions without any hook calls.

<p align="center">
    ![](../images/blog/wasm-eh.png)

</p>

This fractured feature space is a temporary challenge that will be resolved once high-impact features like exception handling, SIMD and bulk-memory operations are available everywhere. In the meantime, we will ship multiple WebAssembly modules that are compiled for different feature sets and adaptively pick the best bundle for you using dynamic browser checks.

The following example shows how the asynchronous version of DuckDB-Wasm can be instantiated using either manual or JsDelivr bundles:

```ts
// Import the ESM bundle (supports tree-shaking)
import * as duckdb from '@duckdb/duckdb-wasm/dist/duckdb-esm.js';

// Either bundle them manually, for example as Webpack assets
import duckdb_wasm from '@duckdb/duckdb-wasm/dist/duckdb.wasm';
import duckdb_wasm_next from '@duckdb/duckdb-wasm/dist/duckdb-next.wasm';
import duckdb_wasm_next_coi from '@duckdb/duckdb-wasm/dist/duckdb-next-coi.wasm';
const WEBPACK_BUNDLES: duckdb.DuckDBBundles = {
    asyncDefault: {
        mainModule: duckdb_wasm,
        mainWorker: new URL('@duckdb/duckdb-wasm/dist/duckdb-browser-async.worker.js', import.meta.url).toString(),
    },
    asyncNext: {
        mainModule: duckdb_wasm_next,
        mainWorker: new URL('@duckdb/duckdb-wasm/dist/duckdb-browser-async-next.worker.js', import.meta.url).toString(),
    },
    asyncNextCOI: {
        mainModule: duckdb_wasm_next_coi,
        mainWorker: new URL(
            '@duckdb/duckdb-wasm/dist/duckdb-browser-async-next-coi.worker.js',
            import.meta.url,
        ).toString(),
        pthreadWorker: new URL(
            '@duckdb/duckdb-wasm/dist/duckdb-browser-async-next-coi.pthread.worker.js',
            import.meta.url,
        ).toString(),
    },
};
// ..., or load the bundles from jsdelivr
const JSDELIVR_BUNDLES = duckdb.getJsDelivrBundles();

// Select a bundle based on browser checks
const bundle = await duckdb.selectBundle(JSDELIVR_BUNDLES);
// Instantiate the asynchronous version of DuckDB-Wasm
const worker = new Worker(bundle.mainWorker!);
const logger = new duckdb.ConsoleLogger();
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule, bundle.pthreadWorker);
```

*You can also test the features and selected bundle in your browser using the web shell command `.features` .*

#### Multithreading

In 2018, the Spectre and Meltdown vulnerabilities sent crippling shockwaves through the internet. Today, we are facing the repercussions of these events, in particular in software that runs arbitrary user code – such as web browsers. Shortly after the publications, all major browser vendors restricted the use of `SharedArrayBuffers` to prevent dangerous timing attacks. `SharedArrayBuffers` are raw buffers that can be shared among web workers for global state and an alternative to the browser-specific message passing. These restrictions had detrimental effects on WebAssembly modules since  `SharedArrayBuffers` are necessary for the implementation of POSIX threads in WebAssembly.

Without `SharedArrayBuffers`, WebAssembly modules can run in a dedicated web worker to unblock the main event loop but won't be able to spawn additional workers for parallel computations within the same instance. By default, we therefore cannot unleash the parallel query execution of DuckDB in the web. However, browser vendors have recently started to reenable `SharedArrayBuffers` for websites that are [cross-origin-isolated](https://web.dev/coop-coep/). A website is cross-origin-isolated if it ships the main document with the following HTTP headers:

```text
Cross-Origin-Embedder-Policy: require-corp
Cross-Origin-Opener-Policy: same-origin
```

These headers will instruct browsers to A) isolate the top-level document from other top-level documents outside its own origin and B) prevent the document from making arbitrary cross-origin requests unless the requested resource explicitly opts in. Both restrictions have far reaching implications for a website since many third-party data sources won't yet provide the headers today and the top-level isolation currently hinders the communication with, for example, OAuth pop up's ([there are plans to lift that](https://github.com/whatwg/html/issues/6364)).

*We therefore assume that DuckDB-Wasm will find the majority of users on non-isolated websites. We are, however, experimenting with dedicated bundles for isolated sites using the suffix `-next-coi`) and will closely monitor the future need of our users.*

#### Web Shell

We further host a web shell powered by DuckDB-Wasm alongside the library release at [shell.duckdb.org](https://shell.duckdb.org).
Use the following shell commands to query remote TPC-H files at scale factor 0.01.
When querying your own, make sure to properly set CORS headers since your browser will otherwise block these requests.
You can alternatively use the `.files` command to register files from the local filesystem.

```sql
.timer on

SELECT count(*)
FROM 'https://blobs.duckdb.org/data/tpch-sf0.01-parquet/lineitem.parquet';

SELECT count(*)
FROM 'https://blobs.duckdb.org/data/tpch-sf0.01-parquet/customer.parquet';

SELECT avg(c_acctbal)
FROM 'https://blobs.duckdb.org/data/tpch-sf0.01-parquet/customer.parquet';

SELECT *
FROM 'https://blobs.duckdb.org/data/tpch-sf0.01-parquet/orders.parquet'
LIMIT 10;

SELECT n_name, avg(c_acctbal)
FROM
    'https://blobs.duckdb.org/data/tpch-sf0.01-parquet/customer.parquet',
    'https://blobs.duckdb.org/data/tpch-sf0.01-parquet/nation.parquet'
WHERE c_nationkey = n_nationkey
GROUP BY n_name;

SELECT *
FROM
    'https://blobs.duckdb.org/data/tpch-sf0.01-parquet/region.parquet',
    'https://blobs.duckdb.org/data/tpch-sf0.01-parquet/nation.parquet'
WHERE r_regionkey = n_regionkey;
```

#### Evaluation

The following table teases the execution times of some TPC-H queries at scale factor 0.5 using the libraries [DuckDB-Wasm](https://www.npmjs.com/package/@duckdb/duckdb-wasm), [sql.js](https://github.com/sql-js/sql.js/), [Arquero](https://github.com/uwdata/arquero) and [Lovefield](https://github.com/google/lovefield). You can find a more in-depth discussion with all TPC-H queries, additional scale factors and microbenchmarks on the [“DuckDB-Wasm versus X” page](https://shell.duckdb.org/versus).

| Query | DuckDB-Wasm |  sql.js |  Arquero | Lovefield |
| ----: | ----------: | ------: | -------: | --------: |
|     1 | **0.855 s** | 8.441 s | 24.031 s |  12.666 s |
|     3 | **0.179 s** | 1.758 s | 16.848 s |   3.587 s |
|     4 | **0.151 s** | 0.384 s |  6.519 s |   3.779 s |
|     5 | **0.197 s** | 1.965 s | 18.286 s |  13.117 s |
|     6 | **0.086 s** | 1.294 s |  1.379 s |   5.253 s |
|     7 | **0.319 s** | 2.677 s |  6.013 s |  74.926 s |
|     8 | **0.236 s** | 4.126 s |  2.589 s |  18.983 s |
|    10 | **0.351 s** | 1.238 s | 23.096 s |  18.229 s |
|    12 | **0.276 s** | 1.080 s | 11.932 s |  10.372 s |
|    13 | **0.194 s** | 5.887 s | 16.387 s |   9.795 s |
|    14 | **0.086 s** | 1.194 s |  6.332 s |   6.449 s |
|    16 | **0.137 s** | 0.453 s |  0.294 s |   5.590 s |
|    19 | **0.377 s** | 1.272 s | 65.403 s |   9.977 s |

#### Future Research

We believe that WebAssembly unveils hitherto dormant potential for shared query processing between clients and servers. Pushing computation closer to the client can eliminate costly round-trips to the server and thus increase interactivity and scalability of in-browser analytics. We further believe that the release of DuckDB-Wasm could be the first step towards a more universal data plane spanning across multiple layers including traditional database servers, clients, CDN workers and computational storage. As an in-process analytical database, DuckDB might be the ideal driver for distributed query plans that increase the scalability and interactivity of SQL databases at low costs.

## Fast Moving Holistic Aggregates

**Publication date:** 2021-11-12

**Author:** Richard Wesley

**TL;DR:** DuckDB, a free and open-source analytical data management system, has a windowing API that can compute complex moving aggregates like interquartile ranges and median absolute deviation much faster than the conventional approaches.

In a [previous post](https://duckdb.org/2021/10/13/windowing),
we described the DuckDB windowing architecture and mentioned the support for
some advanced moving aggregates.
In this post, we will compare the performance various possible moving implementations of these functions
and explain how DuckDB's performant implementations work.

#### What Is an Aggregate Function?

When people think of aggregate functions, they typically have something simple in mind such as `SUM` or `AVG`.
But more generally, what an aggregate function does is _summarise_ a set of values into a single value.
Such summaries can be arbitrarily complex, and involve any data type.
For example, DuckDB provides aggregates for concatenating strings (` STRING_AGG`)
and constructing lists (` LIST`).
In SQL, aggregated sets come from either a `GROUP BY` clause or an `OVER` windowing specification.

##### Holistic Aggregates

All of the basic SQL aggregate functions like `SUM` and `MAX` can be computed
by reading values one at a time and throwing them away.
But there are some functions that potentially need to keep track of all the values before they can produce a result.
These are called _holistic_ aggregates, and they require more care when implementing.

For some aggregates (like `STRING_AGG`) the order of the values can change the result.
This is not a problem for windowing because `OVER` clauses can specify an ordering,
but in a `GROUP BY` clause, the values are unordered.
To handle this, order-sensitive aggregates can include a `WITHIN GROUP(ORDER BY <expr>)` clause
to specify the order of the values.
Because the values must all be collected and sorted,
aggregates that use the `WITHIN GROUP` clause are holistic.

##### Statistical Holistic Aggregates

Because sorting the arguments to a windowed aggregate can be specified with the `OVER` clause,
you might wonder if there are any other kinds of holistic aggregates that do not use sorting,
or which use an ordering different from the one in the `OVER` clause.
It turns out that there are a number of important statistical functions that
turn into holistic aggregates in SQL.
In particular, here are the statistical holistic aggregates that DuckDB currently supports:

| Function                        | Description                                                                         |
| :------------------------------ | :---------------------------------------------------------------------------------- |
| `mode(x)`                       | The most common value in a set                                                      |
| `median(x)`                     | The middle value of a set                                                           |
| `quantile_disc(x, <frac>)`      | The exact value corresponding to a fractional position.                             |
| `quantile_cont(x, <frac>)`      | The interpolated value corresponding to a fractional position.                      |
| `quantile_disc(x, [<frac>...])` | A list of the exact values corresponding to a list of fractional positions.         |
| `quantile_cont(x, [<frac>...])` | A list of the interpolated value corresponding to a list of fractional positions.   |
| `mad(x)`                        | The median of the absolute values of the differences of each value from the median. |

Where things get really interesting is when we try to compute moving versions of these aggregates.
For example, computing a moving `AVG` is fairly straightforward:
You can subtract values that have left the frame and add in the new ones,
or use the segment tree approach from the [previous post on windowing](https://duckdb.org/2021/10/13/windowing).

##### Python Example

Computing a moving median is not as easy.
Let's look at a simple example of how we might implement moving `median` in Python
for the following string data, using a frame that includes one element from each side:

![](../images/blog/holistic/python.svg)


For this example we are using strings so we don't have to worry about interpolating values.

```python
data = ('a', 'b', 'c', 'd', 'c', 'b',)
w = len(data)
for row in range(w):
    l = max(row - 1, 0)       # First index of the frame
    r = min(row + 1, w-1)     # Last index of the frame
    frame = list(data[l:r+1]) # Copy the frame values
    frame.sort()              # Sort the frame values
    n = (r - l) // 2          # Middle index of the frame
    median = frame[n]         # The median is the middle value
    print(row, data[row], median)
```

Each frame has a different set of values to aggregate and we can't change the order in the table,
so we have to copy them each time before we sort.
Sorting is slow, and there is a lot of repetition.

All of these holistic aggregates have similar problems
if we just reuse the simple implementations for moving versions.
Fortunately, there are much faster approaches for all of them.

#### Moving Holistic Aggregation

In the [previous post on windowing](https://duckdb.org/2021/10/13/windowing),
we explained the component operations used to implement a generic aggregate function
(initialize, update, finalize, combine and window).
In the rest of this post, we will dig into how they can be implemented for these complex aggregates.

##### Quantile

The `quantile` aggregate variants all extract the value(s) at a given fraction (or fractions) of the way
through the ordered list of values in the set.
The simplest variant is the `median` function, which we met in the introduction, which uses a fraction of `0.5`.
There are other variants depending on whether the values are
quantitative (i.e., they have a distance and the values can be interpolated)
or merely ordinal (i.e., they can be ordered, but ties have to be broken.)
Still other variants depend on whether the fraction is a single value or a list of values,
but they can all be implemented in similar ways.

A common way to implement `quantile` that we saw in the Python example is to collect all the values into the state,
sort them, and then read out the values at the requested positions.
(This is probably why the SQL standard refers to it as an "ordered-set aggregate".)
States can be combined by concatenation,
which lets us group in parallel and build segment trees for windowing.

This approach is very time-consuming because sorting is `O(N log N)`,
but happily for `quantile` we can use a related algorithm called `QuickSelect`,
which can find a positional value in only `O(N)` time by _partially sorting_ the array.
You may have run into this algorithm if you have ever used the
`std::nth_element` algorithm in the C++ standard library.
This works well for grouped quantiles, but for moving quantiles
the segment tree approach ends up being about 5% slower than just starting from scratch for each value.

To really improve the performance of moving quantiles,
we note that the partial order probably does not change much between frames.
If we maintain a list of indirect indices into the window and call `nth_element` on the indices,
we can reorder the partially ordered indices instead of the values themselves.
In the common case where the frame has the same size,
we can even check to see whether the new value disrupts the partial ordering at all,
and skip the reordering!
With this approach, we can obtain a significant performance boost of 1.5-10 times.

In this example, we have a 3-element frame (green) that moves one space to the right for each value:

![](../images/blog/holistic/median.svg)


The median values in orange must be computed from scratch.
Notice that in the example, this only happens at the start of the window.
The median values in white are computed using the existing partial ordering.
In the example, this happens when the frame changes size.
Finally, the median values in blue do not require reordering
because the new value is the same as the old value.
With this algorithm, we can create a faster implementation of single-fraction `quantile` without sorting.

##### InterQuartile Ranges (IQR)

We can extend this implementation to _lists_ of fractions by leveraging the fact that each call to `nth_element`
partially orders the values, which further improves performance.
The "reuse" trick can be generalised to distinguish between fractions that are undisturbed
and ones that need to be recomputed.

A common application of multiple fractions is computing
[interquartile ranges](https://en.wikipedia.org/wiki/Interquartile_range)
by using the fraction list `[0.25, 0.5, 0.75]`.
This is the fraction list we use for the multiple fraction benchmarks.
Combined with moving `MIN` and `MAX`,
this moving aggregate can be used to generate the data for a moving box-and-whisker plot.

##### Median Absolute Deviation (MAD)

Maintaining the partial ordering can also be used to boost the performance of the
[median absolute deviation](https://en.wikipedia.org/wiki/Median_absolute_deviation)
(or `mad`) aggregate.
Unfortunately, the second partial ordering can't use the single value trick
because the "function" being used to partially order the values will have changed if the data median changes.
Still, the values are still probably not far off,
which again improves the performance of `nth_element`.

##### Mode

The `mode` aggregate returns the most common value in a set.
One common way to implement it is to accumulate all the values in the state,
sort them and then scan for the longest run.
These states can be combined by merging,
which lets us compute the mode in parallel and build segment trees for windowing.

Once again, this approach is very time-consuming because sorting is `O(N log N)`.
It may also use more memory than necessary because it keeps _all_ the values
instead of keeping only the unique values.
If there are heavy-hitters in the list,
(which is typically what `mode` is being used to find)
this can be significant.

Another way to implement `mode` is to use a hash map for the state that maps values to counts.
Hash tables are typically `O(N)` for accumulation, which is an improvement on sorting,
and they only need to store unique values.
If the state also tracks the largest value and count seen so far,
we can just return that value when we finalize the aggregate.
States can be combined by merging,
which lets us group in parallel and build segment trees for windowing.

Unfortunately, as the benchmarks below demonstrate, this segment tree approach for windowing is quite slow!
The overhead of merging the hash tables for the segment trees turns out to be about 5% slower
than just building a new hash table for each row in the window.
But for a moving `mode` computation,
we can instead make a single hash table and update it every time the frame moves,
removing the old values, adding the new values, and updating the value/count pair.
At times the current mode value may have its count decremented,
but when that happens we can rescan the table to find the new mode.

In this example, the 4-element frame (green) moves one space to the right for each value:

![](../images/blog/holistic/mode.svg)


When the mode is unchanged (blue) it can be used directly.
When the mode becomes ambiguous (orange), we must recan the table.
This approach is much faster,
and in the benchmarks it comes in between 15 and 55 times faster than the other two.

#### Microbenchmarks

To benchmark the various implementations, we run moving window queries against a 10M table of integers:

```sql
CREATE TABLE rank100 AS
    SELECT b % 100 AS a, b FROM range(10000000) tbl(b);
```

The results are then re-aggregated down to one row to remove the impact of streaming the results.
The frames are 100 elements wide, and the test is repeated with a fixed trailing frame:

```sql
SELECT quantile_cont(a, [0.25, 0.5, 0.75]) OVER (
    ORDER BY b ASC
    ROWS BETWEEN 100 PRECEDING AND CURRENT ROW) AS iqr
FROM rank100;
```

and a variable frame that moves pseudo-randomly around the current value:

```sql
SELECT quantile_cont(a, [0.25, 0.5, 0.75]) OVER (
    ORDER BY b ASC
    ROWS BETWEEN mod(b * 47, 521) PRECEDING AND 100 - mod(b * 47, 521) FOLLOWING) AS iqr
FROM rank100;
```

The two examples here are the interquartile range queries;
the other queries use the single argument aggregates `median`, `mad` and `mode`.

As a final step, we ran the same query with `count(*)`,
which has the same overhead as the other benchmarks, but is trivial to compute
(it just returns the frame size).
That overhead was subtracted from the run times to give the algorithm timings:

![](../images/blog/holistic/benchmarks.svg)


As can be seen, there is a substantial benefit from implementing the window operation
for all of these aggregates, often on the order of a factor of ten.

An unexpected finding was that the segment tree approach for these complex states
is always slower (by about 5%) than simply creating the state for each output row.
This suggests that when writing combinable complex aggregates,
it is well worth benchmarking the aggregate
and then considering providing a window operation instead of deferring to the segment tree machinery.

#### Conclusion

DuckDB's aggregate API enables aggregate functions to define a windowing operation
that can significantly improve the performance of moving window computations for complex aggregates.
This functionality has been used to significantly speed up windowing for several statistical aggregates,
such as mode, interquartile ranges and median absolute deviation.

DuckDB is a free and open-source database management system (MIT licensed).
It aims to be the SQLite for Analytics,
and provides a fast and efficient database system with zero external dependencies.
It is available not just for Python, but also for C/C++, R, Java, and more.

## DuckDB – Lord of the Enums: The Fellowship of the Categorical and Factors

**Publication date:** 2021-11-26

**Author:** Pedro Holanda

![](../images/blog/duck-lotr.png)


String types are one of the most commonly used types. However, often string columns have a limited number of distinct values. For example, a country column will never have more than a few hundred unique entries. Storing a data type as a plain string causes a waste of storage and compromises query performance. A better solution is to dictionary encode these columns. In dictionary encoding, the data is split into two parts: the category and the values. The category stores the actual strings, and the values stores a reference to the strings. This encoding is depicted below.

![](../images/blog/dictionary-encoding.png)


In the old times, users would manually perform dictionary encoding by creating lookup tables and translating their ids back with join operations. Environments like Pandas and R support these types more elegantly. [Pandas Categorical](https://pandas.pydata.org/docs/reference/api/pandas.Categorical.html) and [R Factors](https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Factors) are types that allow for columns of strings with many duplicate entries to be efficiently stored through dictionary encoding. 

Dictionary encoding not only allows immense storage savings but also allows systems to operate on numbers instead of on strings, drastically boosting query performance. By lowering RAM usage, `ENUM`s also allow DuckDB to scale to significantly larger datasets.

To allow DuckDB to fully integrate with these encoded structures, we implemented Enum Types. This blog post will show code snippets of how to use `ENUM` types from both SQL API and Python/R clients, and will demonstrate the performance benefits of the enum types over using regular strings. To the best of our knowledge, DuckDB is the first RDBMS that natively integrates with Pandas categorical columns and R factors.

#### SQL

Our Enum SQL syntax is heavily inspired by [Postgres](https://www.postgresql.org/docs/9.1/datatype-enum.html). Below, we depict how to create and use the `ENUM` type.

```sql
CREATE TYPE lotr_race AS ENUM ('Mayar', 'Hobbit', 'Orc');

CREATE TABLE character (
    name text,
    race lotr_race
);

INSERT INTO character VALUES ('Frodo Quackins','Hobbit'), ('Quackalf ', 'Mayar');

-- We can perform a normal string comparison
-- Note that 'Hobbit' will be cast to a lotr_race
-- hence this comparison is actually a fast integer comparison
SELECT name FROM character WHERE race = 'Hobbit';
----
Frodo Quackins
```

`ENUM` columns behave exactly the same as normal `VARCHAR` columns. They can be used in string functions (such as `LIKE` or `substring`), they can be compared, ordered, etc. The only exception is that `ENUM` columns can only hold the values that are specified in the enum definition. Inserting a value that is not part of the enum definition will result in an error.

DuckDB `ENUM`s are currently static (i.e., values can not be added or removed after the `ENUM` definition). However, `ENUM` updates are on the roadmap for the next version.

See [the documentation](#docs:lts:sql:data_types:enum) for more information.

#### Python

##### Setup

First we need to install DuckDB and Pandas. The installation process of both libraries in Python is straightforward:

```bash
# Python Install
pip install duckdb
pip install pandas 
```

##### Usage

Pandas columns from the categorical type are directly converted to DuckDB's `ENUM` types:

```python
import pandas as pd
import duckdb

# Our unencoded data.
data = ['Hobbit', 'Elf', 'Elf', 'Man', 'Mayar', 'Hobbit', 'Mayar']

# 'pd.Categorical' automatically encodes the data as a categorical column
df_in = pd.DataFrame({'races': pd.Categorical(data),})

# We can query this dataframe as we would any other
# The conversion from categorical columns to enums happens automatically
df_out = duckdb.execute("SELECT * FROM df_in").df()
```

#### R

##### Setup

We only need to install DuckDB in our R client, and we are ready to go.

```R
# R Install
install.packages("duckdb")
```

##### Usage

Similar to our previous example with Pandas, R Factor columns are also automatically converted to DuckDB's `ENUM` types.

```r
library ("duckdb")

con <- dbConnect(duckdb::duckdb())
on.exit(dbDisconnect(con, shutdown = TRUE))

# Our unencoded data.
data <- c('Hobbit', 'Elf', 'Elf', 'Man', 'Mayar', 'Hobbit', 'Mayar')

# Our R dataframe holding an encoded version of our data column
# 'as.factor' automatically encodes it.
df_in <- data.frame(races=as.factor(data))


duckdb::duckdb_register(con, "characters", df_in)
df_out <- dbReadTable(con, "characters")
```

#### Benchmark Comparison

To demonstrate the performance of DuckDB when running operations on categorical columns of Pandas DataFrames, we present a number of benchmarks. The source code for the benchmarks is available on [GitHub](https://raw.githubusercontent.com/duckdb/duckdb-web/main/_posts/benchmark_scripts/enum.py). In our benchmarks we always consume and produce Pandas DataFrames.

##### Dataset

Our dataset is composed of one dataframe with 4 columns and 10 million rows. The first two columns are named `race` and `subrace` representing races. They are both categorical, with the same categories but different values. The other two columns `race_string` and `subrace_string` are the string representations of `race` and `subrace```.

```python
def generate_df(size):
  race_categories = ['Hobbit', 'Elf', 'Man', 'Mayar']
  race = np.random.choice(race_categories, size)
  subrace = np.random.choice(race_categories, size)
  return pd.DataFrame({'race': pd.Categorical(race),
                       'subrace': pd.Categorical(subrace),
                       'race_string': race,
                       'subrace_string': subrace,})

size = pow(10,7) #10,000,000 rows
df = generate_df(size)
```

##### Grouped Aggregation

In our grouped aggregation benchmark, we do a count of how many characters for each race we have in the `race` or `race_string` column of our table.

```python
def duck_categorical(df):
  return con.execute("SELECT race, count(*) FROM df GROUP BY race").df()

def duck_string(df):
  return con.execute("SELECT race_string, count(*) FROM df GROUP BY race_string").df()

def pandas(df):
  return df.groupby(['race']).agg({'race': 'count'})

def pandas_string(df):
  return df.groupby(['race_string']).agg({'race_string': 'count'})
```

The table below depicts the timings of this operation. We can see the benefits of performing grouping on encoded values over strings, with DuckDB being 4× faster when grouping small unsigned values.

| Name                 | Time (s) |
| :------------------- | -------: |
| DuckDB (Categorical) |     0.01 |
| DuckDB (String)      |     0.04 |
| Pandas (Categorical) |     0.06 |
| Pandas (String)      |     0.40 |


##### Filter

In our filter benchmark, we do a count of how many Hobbit characters we have in the `race` or `race_string` column of our table.

```python
def duck_categorical(df):
  return con.execute("SELECT count(*) FROM df WHERE race = 'Hobbit'").df()

def duck_string(df):
  return con.execute("SELECT count(*) FROM df WHERE race_string = 'Hobbit'").df()

def pandas(df):
  filtered_df = df[df.race == "Hobbit"]
  return filtered_df.agg({'race': 'count'})

def pandas_string(df):
  filtered_df = df[df.race_string == "Hobbit"]
  return filtered_df.agg({'race_string': 'count'})
```

For the DuckDB enum type, DuckDB converts the string `Hobbit` to a value in the `ENUM`, which returns an unsigned integer. We can then do fast numeric comparisons, instead of expensive string comparisons, which results in greatly improved performance.

| Name                 | Time (s) |
| :------------------- | -------: |
| DuckDB (Categorical) |    0.003 |
| DuckDB (String)      |    0.023 |
| Pandas (Categorical) |    0.158 |
| Pandas (String)      |    0.440 |


##### Enum – Enum Comparison

In this benchmark, we perform an equality comparison of our two breed columns. `race` and `subrace` or `race_string` and `subrace_string```

```python
def duck_categorical(df):
  return con.execute("SELECT count(*) FROM df WHERE race = subrace").df()

def duck_string(df):
  return con.execute("SELECT count(*) FROM df WHERE race_string = subrace_string").df()

def pandas(df):
  filtered_df = df[df.race == df.subrace]
  return filtered_df.agg({'race': 'count'})

def pandas_string(df):
  filtered_df = df[df.race_string == df.subrace_string]
  return filtered_df.agg({'race_string': 'count'})
```

DuckDB `ENUM`s can be compared directly on their encoded values. This results in a time difference similar to the previous case, again because we are able to compare numeric values instead of strings.

| Name                 | Time (s) |
| :------------------- | -------: |
| DuckDB (Categorical) |    0.005 |
| DuckDB (String)      |    0.040 |
| Pandas (Categorical) |    0.130 |
| Pandas (String)      |    0.550 |

##### Storage

In this benchmark, we compare the storage savings of storing `ENUM` Types vs Strings.

```python
race_categories = ['Hobbit', 'Elf', 'Man','Mayar']
race = np.random.choice(race_categories, size)
categorical_race = pd.DataFrame({'race': pd.Categorical(race),})
string_race = pd.DataFrame({'race': race,})
con = duckdb.connect('duck_cat.db')
con.execute("CREATE TABLE character AS SELECT * FROM categorical_race")
con = duckdb.connect('duck_str.db')
con.execute("CREATE TABLE character AS SELECT * FROM string_race")
```

The table below depicts the DuckDB file size differences when storing the same column as either an Enum or a plain string. Since the dictionary-encoding does not repeat the string values, we can see a reduction of one order of magnitude in size.

| Name                 | Size (MB) |
| :------------------- | --------: |
| DuckDB (Categorical) |        11 |
| DuckDB (String)      |       102 |

#### What about the Sequels?

There are three main directions we will pursue in the following versions of DuckDB related to `ENUM`s.

1. Automatic Storage Encoding: As described in the introduction, users frequently define database columns as Strings when in reality they are `ENUM`s. Our idea is to automatically detect and dictionary-encode these columns, without any input of the user and in a way that is completely invisible to them.
2. `ENUM` Updates: As said in the introduction, our `ENUM`s are currently static. We will allow the insertion and removal of `ENUM` categories.
3. Integration with other Data Formats: We want to expand our integration with data formats that implement `ENUM`-like structures.

#### Feedback

If you encounter any problems when using our `ENUM`s, please open an issue in our [issue tracker](https://github.com/duckdb/duckdb/issues)!

## DuckDB Quacks Arrow: A Zero-Copy Data Integration between Apache Arrow and DuckDB

**Publication date:** 2021-12-03

**Authors:** Pedro Holanda, Jonathan Keane

**TL;DR:** The zero-copy integration between DuckDB and Apache Arrow allows for rapid analysis of larger than memory datasets in Python and R using either SQL or relational APIs.

This post is a collaboration with and cross-posted on the [Arrow blog](https://arrow.apache.org/blog/2021/12/03/arrow-duckdb/).

Part of [Apache Arrow](https://arrow.apache.org) is an in-memory data format optimized for analytical libraries. Like Pandas and R Dataframes, it uses a columnar data model. But the Arrow project contains more than just the format: The Arrow C++ library, which is accessible in Python, R, and Ruby via bindings, has additional features that allow you to compute efficiently on datasets. These additional features are on top of the implementation of the in-memory format described above. The datasets may span multiple files in Parquet, CSV, or other formats, and files may even be on remote or cloud storage like HDFS or Amazon S3. The Arrow C++ query engine supports the streaming of query results, has an efficient implementation of complex data types (e.g., Lists, Structs, Maps), and can perform important scan optimizations like Projection and Filter Pushdown.

[DuckDB](https://www.duckdb.org) is a new analytical data management system that is designed to run complex SQL queries within other processes. DuckDB has bindings for R and Python, among others. DuckDB can query Arrow datasets directly and stream query results back to Arrow. This integration allows users to query Arrow data using DuckDB's SQL Interface and API, while taking advantage of DuckDB's parallel vectorized execution engine, without requiring any extra data copying. Additionally, this integration takes full advantage of Arrow's predicate and filter pushdown while scanning datasets.

This integration is unique because it uses zero-copy streaming of data between DuckDB and Arrow and vice versa so that you can compose a query using both together. This results in three main benefits:

1. **Larger Than Memory Analysis:** Since both libraries support streaming query results, we are capable of executing on data without fully loading it from disk. Instead, we can execute one batch at a time. This allows us to execute queries on data that is bigger than memory.
2. **Complex Data Types:** DuckDB can efficiently process complex data types that can be stored in Arrow vectors, including arbitrarily nested structs, lists, and maps.
3. **Advanced Optimizer:** DuckDB's state-of-the-art optimizer can push down filters and projections directly into Arrow scans. As a result, only relevant columns and partitions will be read, allowing the system to e.g., take advantage of partition elimination in Parquet files. This significantly accelerates query execution.

For those that are just interested in benchmarks, you can jump ahead [benchmark section below](#::Benchmark Comparison).

#### Quick Tour

Before diving into the details of the integration, in this section we provide a quick motivating example of how powerful and simple to use is the DuckDB-Arrow integration. With a few lines of code, you can already start querying Arrow datasets. Say you want to analyze the infamous [NYC Taxi Dataset](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page) and figure out if groups tip more or less than single riders.

##### R

Both Arrow and DuckDB support dplyr pipelines for people more comfortable with using dplyr for their data analysis. The Arrow package includes two helper functions that allow us to pass data back and forth between Arrow and DuckDB (` to_duckdb()` and `to_arrow()`).
This is especially useful in cases where something is supported in one of Arrow or DuckDB but not the other. For example, if you find a complex dplyr pipeline where the SQL translation doesn't work with DuckDB, use `to_arrow()` before the pipeline to use the Arrow engine. Or, if you have a function (e.g., windowed aggregates) that aren't yet implemented in Arrow, use `to_duckdb()` to use the DuckDB engine. All while not paying any cost to (re)serialize the data when you pass it back and forth!

```R
library(duckdb)
library(arrow)
library(dplyr)

# Open dataset using year, month folder partition
ds <- arrow::open_dataset("nyc-taxi", partitioning = c("year", "month"))

ds %>%
  # Look only at 2015 on, where the number of passenger is positive, the trip distance is
  # greater than a quarter mile, and where the fare amount is positive
  filter(year > 2014 & passenger_count > 0 & trip_distance > 0.25 & fare_amount > 0) %>%
  # Pass off to DuckDB
  to_duckdb() %>%
  group_by(passenger_count) %>%
  mutate(tip_pct = tip_amount / fare_amount) %>%
  summarise(
    fare_amount = mean(fare_amount, na.rm = TRUE),
    tip_amount = mean(tip_amount, na.rm = TRUE),
    tip_pct = mean(tip_pct, na.rm = TRUE)
  ) %>%
  arrange(passenger_count) %>%
  collect()
```

##### Python

The workflow in Python is as simple as it is in R. In this example we use DuckDB's Relational API.

```python
import duckdb
import pyarrow as pa
import pyarrow.dataset as ds

# Open dataset using year, month folder partition
nyc = ds.dataset('nyc-taxi/', partitioning=["year", "month"])

# We transform the nyc dataset into a DuckDB relation
nyc = duckdb.arrow(nyc)

# Run same query again
nyc.filter("year > 2014 & passenger_count > 0 & trip_distance > 0.25 & fare_amount > 0")
    .aggregate("SELECT avg(fare_amount), avg(tip_amount), avg(tip_amount / fare_amount) AS tip_pct", "passenger_count").arrow()
```

#### DuckDB and Arrow: The Basics

In this section, we will look at some basic examples of the code needed to read and output Arrow tables in both Python and R.

##### Setup

First we need to install DuckDB and Arrow. The installation process for both libraries is shown below.

Python:

```bash
pip install duckdb
pip install pyarrow
```

R:

```R
install.packages("duckdb")
install.packages("arrow")
```

To execute the sample examples in this section, we need to download the following custom Parquet files:

* [`integers.parquet`](https://duckdb.org/data/integers.parquet)
* [`lineitemsf1.snappy.parquet`](https://blobs.duckdb.org/data/lineitemsf1.snappy.parquet)

###### Python

There are two ways in Python of querying data from Arrow.

1. Through the Relational API:

    ```python
    # Reads Parquet File to an Arrow Table
    arrow_table = pq.read_table('integers.parquet')

    # Transforms Arrow Table -> DuckDB Relation
    rel_from_arrow = duckdb.arrow(arrow_table)

    # we can run a SQL query on this and print the result
    print(rel_from_arrow.query('arrow_table', 'SELECT sum(data) FROM arrow_table WHERE data > 50').fetchone())

    # Transforms DuckDB Relation -> Arrow Table
    arrow_table_from_duckdb = rel_from_arrow.arrow()
    ```

2. By using replacement scans and querying the object directly with SQL:

    ```python
    # Reads Parquet File to an Arrow Table
    arrow_table = pq.read_table('integers.parquet')

    # Gets Database Connection
    con = duckdb.connect()

    # we can run a SQL query on this and print the result
    print(con.execute('SELECT sum(data) FROM arrow_table WHERE data > 50').fetchone())

    # Transforms Query Result from DuckDB to Arrow Table
    # We can directly read the arrow object through DuckDB's replacement scans.
    con.execute("SELECT * FROM arrow_table").fetch_arrow_table()
    ```

It is possible to transform both DuckDB Relations and Query Results back to Arrow.

###### R

In R, you can interact with Arrow data in DuckDB by registering the table as a view (an alternative is to use dplyr as shown above).

```r
library(duckdb)
library(arrow)
library(dplyr)

# Reads Parquet File to an Arrow Table
arrow_table <- arrow::read_parquet("integers.parquet", as_data_frame = FALSE)

# Gets Database Connection
con <- dbConnect(duckdb::duckdb())

# Registers arrow table as a DuckDB view
arrow::to_duckdb(arrow_table, table_name = "arrow_table", con = con)

# we can run a SQL query on this and print the result
print(dbGetQuery(con, "SELECT sum(data) FROM arrow_table WHERE data > 50"))

# Transforms Query Result from DuckDB to Arrow Table
result <- dbSendQuery(con, "SELECT * FROM arrow_table")
```

##### Streaming Data from/to Arrow

In the previous section, we depicted how to interact with Arrow tables. However, Arrow also allows users to interact with the data in a streaming fashion. Either consuming it (e.g., from an Arrow Dataset) or producing it (e.g., returning a RecordBatchReader). And of course, DuckDB is able to consume Datasets and produce RecordBatchReaders. This example uses the NYC Taxi Dataset, stored in Parquet files partitioned by year and month, which we can download through the Arrow R package:

```R
arrow::copy_files("s3://ursa-labs-taxi-data", "nyc-taxi")
```

###### Python

```python
# Reads dataset partitioning it in year/month folder
nyc_dataset = ds.dataset('nyc-taxi/', partitioning=["year", "month"])

# Gets Database Connection
con = duckdb.connect()

query = con.execute("SELECT * FROM nyc_dataset")
# DuckDB's queries can now produce a Record Batch Reader
record_batch_reader = query.fetch_record_batch()
# Which means we can stream the whole query per batch.
# This retrieves the first batch
chunk = record_batch_reader.read_next_batch()
```

###### R

```r
# Reads dataset partitioning it in year/month folder
nyc_dataset = open_dataset("nyc-taxi/", partitioning = c("year", "month"))

# Gets Database Connection
con <- dbConnect(duckdb::duckdb())

# We can use the same function as before to register our arrow dataset
duckdb::duckdb_register_arrow(con, "nyc", nyc_dataset)

res <- dbSendQuery(con, "SELECT * FROM nyc", arrow = TRUE)
# DuckDB's queries can now produce a Record Batch Reader
record_batch_reader <- duckdb::duckdb_fetch_record_batch(res)

# Which means we can stream the whole query per batch.
# This retrieves the first batch
cur_batch <- record_batch_reader$read_next_batch()
```

The preceding R code shows in low-level detail how the data is streaming. We provide the helper `to_arrow()` in the Arrow package which is a wrapper around this that makes it easy to incorporate this streaming into a dplyr pipeline.

> In Arrow 6.0.0, `to_arrow()` currently returns the full table, but will allow full streaming in our upcoming 7.0.0 release.

#### Benchmark Comparison

Here we demonstrate in a simple benchmark the performance difference between querying Arrow datasets with DuckDB and querying Arrow datasets with Pandas.
For both the Projection and Filter pushdown comparison, we will use Arrow tables. That is due to Pandas not being capable of consuming Arrow stream objects.

For the NYC Taxi benchmarks, we used a server in the SciLens cluster and for the TPC-H benchmarks, we used a MacBook Pro with an M1 CPU. In both cases, parallelism in DuckDB was used (which is now on by default).

For the comparison with Pandas, note that DuckDB runs in parallel, while pandas only support single-threaded execution. Besides that, one should note that we are comparing automatic optimizations. DuckDB's query optimizer can automatically push down filters and projections. This automatic optimization is not supported in pandas, but it is possible for users to manually perform some of these predicate and filter pushdowns by manually specifying them in the `read_parquet()` call.

##### Projection Pushdown

In this example we run a simple aggregation on two columns of our lineitem table.

```python
# DuckDB
lineitem = pq.read_table('lineitemsf1.snappy.parquet')
con = duckdb.connect()

# Transforms Query Result from DuckDB to Arrow Table
con.execute("""SELECT sum(l_extendedprice * l_discount) AS revenue
                FROM
                lineitem;""").fetch_arrow_table()
```

```python
# Pandas
arrow_table = pq.read_table('lineitemsf1.snappy.parquet')

# Converts an Arrow table to a Dataframe
df = arrow_table.to_pandas()

# Runs aggregation
res =  pd.DataFrame({'sum': [(df.l_extendedprice * df.l_discount).sum()]})

# Creates an Arrow Table from a Dataframe
new_table = pa.Table.from_pandas(res)
```

| Name   | Time (s) |
| ------ | -------: |
| DuckDB |     0.19 |
| Pandas |     2.13 |

The lineitem table is composed of 16 columns, however, to execute this query only two columns `l_extendedprice` and `l_discount` are necessary. Since DuckDB can push down the projection of these columns, it is capable of executing this query about one order of magnitude faster than Pandas.

##### Filter Pushdown

For our filter pushdown we repeat the same aggregation used in the previous section, but add filters on 4 more columns.

```python
# DuckDB
lineitem = pq.read_table('lineitemsf1.snappy.parquet')

# Get database connection
con = duckdb.connect()

# Transforms Query Result from DuckDB to Arrow Table
con.execute("""SELECT sum(l_extendedprice * l_discount) AS revenue
        FROM
            lineitem
        WHERE
            l_shipdate >= CAST('1994-01-01' AS date)
            AND l_shipdate < CAST('1995-01-01' AS date)
            AND l_discount BETWEEN 0.05
            AND 0.07
            AND l_quantity < 24; """).fetch_arrow_table()
```

```python
# Pandas
arrow_table = pq.read_table('lineitemsf1.snappy.parquet')

df = arrow_table.to_pandas()
filtered_df = lineitem[
        (lineitem.l_shipdate >= "1994-01-01") &
        (lineitem.l_shipdate < "1995-01-01") &
        (lineitem.l_discount >= 0.05) &
        (lineitem.l_discount <= 0.07) &
        (lineitem.l_quantity < 24)]

res =  pd.DataFrame({'sum': [(filtered_df.l_extendedprice * filtered_df.l_discount).sum()]})
new_table = pa.Table.from_pandas(res)
```

| Name   | Time (s) |
| ------ | -------: |
| DuckDB |     0.04 |
| Pandas |     2.29 |

The difference now between DuckDB and Pandas is more drastic, being two orders of magnitude faster than Pandas. Again, since both the filter and projection are pushed down to Arrow, DuckDB reads less data than Pandas, which can't automatically perform this optimization.

##### Streaming

As demonstrated before, DuckDB is capable of consuming and producing Arrow data in a streaming fashion. In this section we run a simple benchmark, to showcase the benefits in speed and memory usage when comparing it to full materialization and Pandas. This example uses the full NYC taxi dataset which you can download

```python
# DuckDB
# Open dataset using year, month folder partition
nyc = ds.dataset('nyc-taxi/', partitioning=["year", "month"])

# Get database connection
con = duckdb.connect()

# Run query that selects part of the data
query = con.execute("SELECT total_amount, passenger_count, year FROM nyc where total_amount > 100 and year > 2014")

# Create Record Batch Reader from Query Result.
# "fetch_record_batch()" also accepts an extra parameter related to the desired produced chunk size.
record_batch_reader = query.fetch_record_batch()

# Retrieve all batch chunks
chunk = record_batch_reader.read_next_batch()
while len(chunk) > 0:
    chunk = record_batch_reader.read_next_batch()
```

```python
# Pandas
# We must exclude one of the columns of the NYC dataset due to an unimplemented cast in Arrow.
working_columns = ["vendor_id","pickup_at","dropoff_at","passenger_count","trip_distance","pickup_longitude",
    "pickup_latitude","store_and_fwd_flag","dropoff_longitude","dropoff_latitude","payment_type",
    "fare_amount","extra","mta_tax","tip_amount","tolls_amount","total_amount","year", "month"]

# Open dataset using year, month folder partition
nyc_dataset = ds.dataset(dir, partitioning=["year", "month"])
# Generate a scanner to skip problematic column
dataset_scanner = nyc_dataset.scanner(columns=working_columns)

# Materialize dataset to an Arrow Table
nyc_table = dataset_scanner.to_table()

# Generate Dataframe from Arow Table
nyc_df = nyc_table.to_pandas()

# Apply Filter
filtered_df = nyc_df[
    (nyc_df.total_amount > 100) &
    (nyc_df.year >2014)]

# Apply Projection
res = filtered_df[["total_amount", "passenger_count","year"]]

# Transform Result back to an Arrow Table
new_table = pa.Table.from_pandas(res)
```

| Name   | Time (s) | Peak memory usage (GBs) |
| ------ | -------: | ----------------------: |
| DuckDB |     0.05 |                     0.3 |
| Pandas |   146.91 |                     248 |

The difference in times between DuckDB and Pandas is a combination of all the integration benefits we explored in this article. In DuckDB the filter pushdown is applied to perform partition elimination (i.e., we skip reading the Parquet files where the year is <= 2014). The filter pushdown is also used to eliminate unrelated row_groups (i.e., row groups where the total amount is always <= 100). Due to our projection pushdown, Arrow only has to read the columns of interest from the Parquet files, which allows it to read only 4 out of 20 columns. On the other hand, Pandas is not capable of automatically pushing down any of these optimizations, which means that the full dataset must be read. **This results in the 4 orders of magnitude difference in query execution time.**

In the table above, we also depict the comparison of peak memory usage between DuckDB (Streaming) and Pandas (Fully-Materializing).  In DuckDB, we only need to load the row group of interest into memory. Hence our memory usage is low. We also have constant memory usage since we only have to keep one of these row groups in-memory at a time. Pandas, on the other hand, has to fully materialize all Parquet files when executing the query. Because of this, we see a constant steep increase in its memory consumption. **The total difference in memory consumption of the two solutions is around 3 orders of magnitude.**

#### Conclusion and Feedback

In this blog post, we mainly showcased how to execute queries on Arrow datasets with DuckDB. There are additional libraries that can also consume the Arrow format but they have different purposes and capabilities.

If you encounter any problems when using our integration, please open an issue in either [DuckDB's issue tracker](https://github.com/duckdb/duckdb/issues) or [Arrow's issue tracker](https://issues.apache.org/jira/projects/ARROW/), depending on which library has a problem.

## DuckDB Time Zones: Supporting Calendar Extensions

**Publication date:** 2022-01-06

**Author:** Richard Wesley

**TL;DR:** The DuckDB ICU extension now provides time zone support.

Time zone support is a common request for temporal analytics, but the rules are complex and somewhat arbitrary. 
The most well supported library for locale-specific operations is the [International Components for Unicode (ICU)](https://icu.unicode.org).
DuckDB already provided collated string comparisons using ICU via an extension (to avoid dependencies),
and we have now connected the existing ICU calendar and time zone functions to the main code 
via the new `TIMESTAMP WITH TIME ZONE` (or `TIMESTAMPTZ` for short) data type. The ICU extension is pre-bundled in DuckDB's Python client and can be optionally installed in the remaining clients.

In this post, we will describe how time works in DuckDB and what time zone functionality has been added.

#### What Is Time?

>People assume that time is a strict progression of cause to effect,
>but actually from a non-linear, non-subjective viewpoint
>it’s more like a big ball of wibbly wobbly timey wimey stuff.  
> -- Doctor Who: Blink

Time in databases can be very confusing because the way we talk about time is itself confusing.
Local time, GMT, UTC, time zones, leap years, proleptic Gregorian calendars – it all looks like a big mess.
But if you step back, modeling time is actually fairly simple, and can be reduced to two pieces: instants and binning.

##### Instants

You will often hear people (and documentation) say that database time is stored in UTC.
This is sort of right, but it is more accurate to say that databases store *instants*.
An instant is a point in universal time, and they are usually given as a count of some time increment from a fixed point in time (called the *epoch*).
In DuckDB, the fixed point is the Unix epoch `1970-01-01 00:00:00 +00:00`, and the increment is microseconds (µs).
(Note that to avoid confusion we will be using ISO-8601 y-m-d notation in this post to denote instants.)
In other words, a `TIMESTAMP` column contains instants.

There are three other temporal types in SQL:

* `DATE` – an integral count of days from a fixed date. In DuckDB, the fixed date is `1970-01-01`, again in UTC.
* `TIME` – a (positive) count of microseconds up to a single day
* `INTERVAL` – a set of fields for counting time differences. In DuckDB, intervals count months, days and microseconds. (Months are not completely well-defined, but when present, they represent 30 days.)

None of these other temporal types except `TIME` can have a `WITH TIME ZONE` modifier (and shorter `TZ` suffix),
but to understand what that modifier means, we first need to talk about *temporal binning*.

##### Temporal Binning

Instants are pretty straightforward – they are just a number – but binning is the part that trips people up.
Binning is probably a familiar idea if you have worked with continuous data:
You break up a set of values into ranges and map each value to the range (or *bin*) that it falls into.
Temporal binning is just doing this to instants:

![](../images/blog/timezones/tz-instants-light.svg)



Temporal binning systems are often called *calendars*,
but we are going to avoid that term for now because calendars are usually associated with dates,
and temporal binning also includes rules for time.
These time rules are called *time zones*, and they also impact where the day boundaries used by the calendar fall.
For example, here is what the binning for a second time zone looks like at the epoch:

![](../images/blog/timezones/tz-timezone-light.svg)



The most confusing thing about temporal binning is that there is more than one way to bin time,
and it is not always obvious what binning should be used.
For example, what I mean by "today" is a bin of instants often determined by where I live.
Every instant that is part of my "today" goes in that bin.
But notice that I qualified "today" with "where I live", 
and that qualification determines what binning system is being used.
But "today" could also be determined by "where the events happened",
which would require a different binning to be applied.

The biggest temporal binning problem most people run into occurs when daylight savings time changes.
This example contains a daylight savings time change where the "hour" bin is two hours long!
To distinguish the two hours, we needed to include another bin containing the offset from UTC:

![](../images/blog/timezones/tz-daylight-light.svg)



As this example shows, in order to bin the instants correctly, we need to know the binning rules that apply.
It also shows that we can't just use the built in binning operations, 
because they don't understand daylight savings time.

##### Naïve Timestamps

Instants are sometimes created from a string format using a local binning system instead of an instant.
This results in the instants being offset from UTC, which can cause problems with daylight savings time.
These are called *naïve* timestamps, and they may constitute a data cleaning problem.

Cleaning naïve timestamps requires determining the offset for each timestamp and then updating the value to be an instant.
For most values, this can be done with an inequality join against a table containing the correct offsets,
but the ambiguous values may need to be fixed by hand.
It may also be possible to correct the ambiguous values by assuming that they were inserted in order
and looking for "backwards jumps" using window functions.

A simple way to avoid this situation going forward is to add the UTC offset to non-UTC strings: `2021-07-31 07:20:15 -07:00`.
The DuckDB `VARCHAR` cast operation parses these offsets correctly and will generate the corresponding instant.

#### Time Zone Data Types

The SQL standard defines temporal data types qualified by `WITH TIME ZONE`.
This terminology is confusing because it seems to imply that the time zone will be stored with the value,
but what it really means is "bin this value using the session's `TimeZone` setting".
Thus a `TIMESTAMPTZ` column also stores instants, 
but expresses a "hint" that it should use a specific binning system.

There are a number of operations that can be performed on instants without a binning system:

* Comparing;
* Sorting;
* Increment (µs) difference;
* Casting to and from regular `TIMESTAMP`s.

These common operations have been implemented in the main DuckDB code base,
while the binning operations have been delegated to extensions such as ICU.

One small difference between the display of the new `WITH TIME ZONE` types and the older types
is that the new types will be displayed with a `+00` UTC offset.
This is simply to make the type differences visible in command line interfaces and for testing.
Properly formatting a `TIMESTAMPTZ` for display in a locale requires using a binning system.

#### ICU Temporal Binning

DuckDB already uses an ICU extension for collating strings for a particular locale,
so it was natural to extend it to expose the ICU calendar and time zone functionality.

##### ICU Time Zones

The first step for supporting time zones is to add the `TimeZone` setting that should be applied.
DuckDB extensions can define and validate their own settings, and the ICU extension now does this:

```sql
-- Load the extension
-- This is not needed in Python or R, as the extension is already installed
LOAD icu;

-- Show the current time zone. The default is set to ICU's current time zone.
SELECT * FROM duckdb_settings() WHERE name = 'TimeZone';
```

```text
TimeZone    Europe/Amsterdam    The current time zone   VARCHAR
```

```sql
-- Choose a time zone.
SET TimeZone = 'America/Los_Angeles';

-- Emulate Postgres' time zone table
SELECT name, abbrev, utc_offset 
FROM pg_timezone_names() 
ORDER BY 1 
LIMIT 5;
```

```text
ACT ACT 09:30:00
AET AET 10:00:00
AGT AGT -03:00:00
ART ART 02:00:00
AST AST -09:00:00
```

##### ICU Temporal Binning Functions

Databases like DuckDB and Postgres usually provide some temporal binning functions such as `YEAR` or `DATE_PART`.
These functions are part of a single binning system for the conventional (proleptic Gregorian) calendar and the UTC time zone.
Note that casting to a string is a binning operation because the text produced contains bin values.

Because timestamps that require custom binning have a different data type,
the ICU extension can define additional functions with bindings to `TIMESTAMPTZ`:

* `+` – Add an `INTERVAL` to a timestamp
* `-` – Subtract an `INTERVAL` from a timestamp
* `AGE` – Compute an `INTERVAL` describing the months/days/microseconds between two timestamps (or one timestamp and the current instant).
* `DATE_DIFF` – Count part boundary crossings between two timestamp
* `DATE_PART` – Extract a named timestamp part. This includes the part alias functions such as `YEAR`.
* `DATE_SUB` – Count the number of complete parts between two timestamp
* `DATE_TRUNC` – Truncate a timestamp to the given precision
* `LAST_DAY` – Returns the last day of the month
* `MAKE_TIMESTAMPTZ` – Constructs a `TIMESTAMPTZ` from parts, including an optional final time zone specifier. 

We have not implemented these functions for `TIMETZ` because this type has limited utility, 
but it would not be difficult to add in the future.
We have also not implemented string formatting/casting to `VARCHAR` 
because the type casting system is not yet extensible, 
and the current [ICU build](https://github.com/Mytherin/minimal-icu-collation) we are using does not embed this data.

##### ICU Calendar Support

ICU can also perform binning operations for some non-Gregorian calendars. 
We have added support for these calendars via a `Calendar` setting and the `icu_calendar_names` table function:

```sql
LOAD icu;

-- Show the current calendar. The default is set to ICU's current locale.
SELECT * FROM duckdb_settings() WHERE name = 'Calendar';
```

```text
Calendar    gregorian   The current calendar    VARCHAR
```

```sql
-- List the available calendars
SELECT DISTINCT name FROM icu_calendar_names()
ORDER BY 1 DESC LIMIT 5;
```

```text
roc
persian
japanese
iso8601
islamic-umalqura
```

```sql
-- Choose a calendar
SET Calendar = 'japanese';

-- Extract the current Japanese era number using Tokyo time
SET TimeZone = 'Asia/Tokyo';

SELECT
     era('2019-05-01 00:00:00+10'::TIMESTAMPTZ),
     era('2019-05-01 00:00:00+09'::TIMESTAMPTZ);
```

```text
235  236
```

##### Caveats

ICU has some differences in behavior and representation from the DuckDB implementation. These are hopefully minor issues that should only be of concern to serious time nerds.

* ICU represents instants as millisecond counts using a `DOUBLE`. This makes it lose accuracy far from the epoch (e.g., around the first millennium)
* ICU uses the Julian calendar for dates before the Gregorian change on `1582-10-15` instead of the proleptic Gregorian calendar. This means that dates prior to the changeover will differ, although ICU will give the date as actually written at the time.
* ICU computes ages by using part increments instead of using the length of the earlier month like DuckDB and Postgres.

#### Future Work

Temporal analysis is a large area, and while the ICU time zone support is a big step forward, there is still much that could be done.
Some of these items are core DuckDB improvements that could benefit all temporal binning systems and some expose more ICU functionality.
There is also the prospect for writing other custom binning systems via extensions.

##### DuckDB Features

Here are some general projects that all binning systems could benefit from:

* Add a `DATE_ROLL` function that emulates the ICU calendar `roll` operation for "rotating" around a containing bin;
* Making casting operations extensible so extensions can add their own support;

##### ICU Functionality

ICU is a very rich library with a long pedigree, and there is much that could be done with the existing library:

* Create a more general `MAKE_TIMESTAPTZ` variant that takes a `STRUCT` with the parts. This could be useful for some non-Gregorian calendars.
* Extend the embedded data to contain locale temporal information (such as month names) and support formatting (` to_char`) and parsing (` to_timestamp`) of local dates. One issue here is that the ICU date formatting language is more sophisticated than the Postgres language, so multiple functions might be required (e.g., `icu_to_char`);
* Extend the binning functions to take per-row calendar and time zone specifications to support row-level temporal analytics such as "what time of day did this happen"?

##### Separation of Concerns

Because the time zone data type is defined in the main code base, but the calendar operations are provided by an extension,
it is now possible to write application-specific extensions with custom calendar and time zone support such as:

* Financial 4-4-5 calendars;
* ISO week-based years;
* Table-driven calendars;
* Astronomical calendars with leap seconds;
* Fun calendars, such as Shire Reckoning and French Republican!

#### Conclusion and Feedback

In this blog post, we described the new DuckDB time zone functionality as implemented via the ICU extension.
We hope that the functionality provided can enable temporal analytic applications involving time zones.
We also look forward to seeing any custom calendar extensions that our users dream up!

Last but not least, if you encounter any problems when using our integration, please open an issue in DuckDB's issue tracker!

## Parallel Grouped Aggregation in DuckDB

**Publication date:** 2022-03-07

**Authors:** Hannes Mühleisen, Mark Raasveldt

**TL;DR:** DuckDB has a fully parallelized aggregate hash table that can efficiently aggregate over millions of groups.

Grouped aggregations are a core data analysis command. It is particularly important for large-scale data analysis (“OLAP”) because it is useful for  computing statistical summaries of huge tables. DuckDB contains a highly optimized parallel aggregation capability for fast and scalable summarization.

Jump [straight to the benchmarks](#::experiments)?

#### Introduction

`GROUP BY` changes the result set cardinality – instead of returning the same number of rows of the input (like a normal `SELECT`), `GROUP BY` returns as many rows as there are groups in the data. Consider this (weirdly familiar) example query:

```sql
SELECT
    l_returnflag,
    l_linestatus,
    sum(l_extendedprice),
    avg(l_quantity)
FROM
    lineitem
GROUP BY
    l_returnflag,
    l_linestatus;
```

`GROUP BY` is followed by two column names, `l_returnflag` and `l_linestatus`. Those are the columns to compute the groups on, and the resulting table will contain all combinations of the same column that occur in the data. We refer to the columns in the `GROUP BY` clause as the “grouping columns” and all occurring combinations of values therein as “groups”. The `SELECT` clause contains four (not five) expressions: References to the grouping columns, and two aggregates: the `sum` over `l_extendedprice` and the `avg` over `l_quantity`. We refer to those as the “aggregates”.  If executed, the result of this query looks something like this:

| l_returnflag | l_linestatus | sum(l_extendedprice) | avg(l_quantity) |
| ------------ | ------------ | -------------------- | --------------: |
| N            | O            | 114935210409.19      |            25.5 |
| R            | F            | 56568041380.9        |           25.51 |
| A            | F            | 56586554400.73       |           25.52 |
| N            | F            | 1487504710.38        |           25.52 |

In general, SQL allows only columns that are mentioned in the `GROUP BY` clause to be part of the `SELECT` expressions directly, all other columns need to be subject to one of the aggregate functions like `sum`, `avg` etc. There are [many more aggregate functions](#docs:lts:sql:functions:aggregates) depending on which SQL system you use.

How should a query processing engine compute such an aggregation? There are many design decisions involved, and we will discuss those below and in particular the decisions made by DuckDB. The main issue when computing grouping results is that the groups can occur in the input table in any order. Were the input already sorted on the grouping columns, computing the aggregation would be trivial, as we could just compare the current values for the grouping columns with the previous ones. If a change occurs, the next group begins and a new aggregation result needs to be computed. Since the sorted case is easy, one straightforward way of computing grouped aggregates is to sort the input table on the grouping columns first, and then use the trivial approach. But sorting the input is unfortunately still a computationally expensive operation [despite our best efforts](https://duckdb.org/2021/08/27/external-sorting). In general, sorting has a computational complexity of `O(nlogn)` with n being the number of rows sorted.

#### Hash Tables for Aggregation

A better way is to use a hash table. Hash tables are a [foundational data structure in computing](https://en.wikipedia.org/wiki/Hash_table) that allow us to find entries with a computational complexity of `O(1)`. A full discussion on how hash tables work is far beyond the scope of this post. Below we try to focus on a very basic description and considerations related to aggregate computation.

![](../images/blog/aggregates/aggr-bench-nlogn.svg)

<figcaption align="center"><b>O(n) plotted against O(nlogn) to illustrate scaling behavior</b></figcaption>

To add `n` rows to a hash table we are looking at a complexity of `O(n)`, much, much better than `O(nlogn)` for sorting, especially when n goes into the billions. The figure above illustrates how the complexity develops as the table size increases. Another big advantage is that we do not have to make a sorted copy of the input first, which is going to be just as large  as the input. Instead, the hash table will have at most as many entries as there are groups, which can be (and usually are) dramatically fewer than input rows. The overall process is thus this: Scan the input table, and for each row, update the hash table accordingly. Once the input is exhausted, we scan the hash table to provide rows to upstream operators or the query result directly.


##### Collision Handling

So, hash table it is then! We build a hash table on the input with the groups as keys and the aggregates as the entries. Then, for every input row, we compute a hash of the group values, find the entry in the hash table, and either create or update the aggregate states with the values from the row? Its unfortunately not that simple: Two rows with *different* values for the grouping columns may result in a hash that points to the *same* hash table entry, which would lead to incorrect results. 

There are two main approaches to [work around this problem](https://en.wikipedia.org/wiki/Hash_table#Collision_resolution): “Chaining” or “linear probing”. With chaining, we do not keep the aggregate values in the hash table directly, but rather keep a list of group values and aggregates. If grouping values points to a hash table entry with an empty list, the new group and the aggregates are simply added. If grouping values point to an existing list, we check for every list entry whether the grouping values match. If so, we update the aggregates for that group. If not, we create a new list entry. In linear probing there are no such lists, but on finding an existing entry, we will compare the grouping values, and if they match we will update the entry. If they do not match, we move one entry down in the hash table and try again. This process finishes when either a matching group entry has been found or an empty hash table entry is found. While theoretically equivalent, computer hardware architecture will favor linear probing because of cache locality. Because linear probing walks the hash table entries *linearly*, the next entry will very likely be in the CPU cache and hence access is faster. Chaining will generally lead to random access and much worse performance on modern hardware architectures. We have therefore adopted linear probing for our aggregate hash table.

Both chaining and linear probing will degrade in theoretical lookup performance from O(1) to O(n) wrt hash table size if there are too many collisions, i.e., too many groups hashing to the same hash table entry. A common solution to this problem is to resize the hash table once the “fill ratio” exceeds some threshold, e.g., 75% is the default for Java’s `HashMap`. This is particularly important as we do not know the amount of groups in the result before starting the aggregation. Neither do we assume to know the amount of rows in the input table. We thus start with a fairly small hash table and resize it once the fill ratio exceeds a threshold. The basic hash table structure is shown in the figure below, the table has four slots 0-4. There are already three groups in the table, with group keys 12, 5 and 2. Each group has aggregate values (e.g., from a `SUM`) of 43 etc. 

![](../images/blog/aggregates/aggr-ht-naive.png)

<figcaption align="center"><b>Basic Aggregate Hash Table Structure</b></figcaption>

A big challenge with the resize of a partially filled hash table after the resize, all the groups are in the wrong place and we would have to move everything, which will be very expensive. 

![](../images/blog/aggregates/aggr-ht-twopart.png)

<figcaption align="center"><b>Two-Part Aggregate Hash Table</b></figcaption>

To support resize efficiently, we have implemented a two-part aggregate hash table consisting of a separately-allocated pointer array which points into payload blocks that contain grouping values and aggregate states for each group. The pointers are not actual pointers but symbolic, they refer to a block ID and a row offset within said block. This is shown in the figure above, the hash table entries are split over two payload blocks. On resize, we throw away the pointer array and allocate a bigger one. Then, we read all payload blocks again, hash the group values, and re-insert pointers to them into the new pointer array. The group data thus remains unchanged, which greatly reduces the cost of resizing the hash table. This can be seen in the figure below, where we double the pointer array size but the payload blocks remain unchanged. 

![](../images/blog/aggregates/aggr-ht-resize.png)

<figcaption align="center"><b>Resizing Two-Part Aggregate Hash Table</b></figcaption>

The naive two-part hash table design would require a re-hashing of *all* group values on resize, which can be quite expensive especially for string values. To speed this up, we also write the raw hash of the group values to the payload blocks for every group. Then, during resize, we don’t have to re-hash the groups but can just read them from the payload blocks, compute the new offset into the pointer array, and insert there. 

![](../images/blog/aggregates/aggr-ht-hashcache.png)

<figcaption align="center"><b>Optimization: Adding Hashes to Payload</b></figcaption>

The two-part hash table has a big drawback when looking up entries: There is no ordering between the pointer array and the group entries in the payload blocks. Hence, following the pointer creates random access in the memory hierarchy. This will lead to unnecessary stalls in the computation. To mitigate this issue, we extend the memory layout of the pointer array to include some (1 or 2) bytes from the group hash in addition to the pointer to the payload value. This way, linear probing can first compare the hash bits in the pointer array with the current group hash and decide whether it’s worth following the payload pointer or not. This can potentially continue for every group in the pointer chain. Only when the hash bits match we have to actually follow the pointer and compare the actual groups. This optimization greatly reduces the amount of times the pointer to the payload blocks has to be followed and thereby reduces the amount of random accesses into memory which are directly related to overall performance. It has the nice side-effect of also greatly reducing full group comparisons which can also be expensive, e.g., when aggregating on groups that contain strings.

![](../images/blog/aggregates/aggr-ht-salting.png)

<figcaption align="center"><b>Optimization: Adding Hash Bits to Pointer Array</b></figcaption>

Another (smaller) optimization here concerns the width of the pointer array entries. For small hash tables with few entries, we do not need many bits to encode the payload block offset pointers. DuckDB supports both 4 byte and 8 byte pointer array entries. 


For most aggregate queries, the vast majority of query processing time is spent looking up hash table entries, which is why it's worth spending time on optimizing them. If you’re curious, the code for all this is in the DuckDB repo, `aggregate_hashtable.cpp`. There is another optimization for when we know that there are only a few distinct groups from column statistics, the perfect hash aggregate, but that’s for another post. But we’re not done here just yet.


#### Parallel Aggregation

While we now have an aggregate hash table design that should do fairly well for grouped aggregations, we still have not considered the fact that DuckDB automatically parallelizes all queries to use multiple hardware threads (“CPUs”). How does parallelism work together with hash tables? In general, the answer is unfortunately: “Badly”. Hash tables are delicate structures that don’t handle parallel modifications well. For example, imagine one thread would want to resize the hash table while another wants to add some new group data to it. Or how should we handle multiple threads inserting new groups at the same time for the same entry? One could use locks to make sure that only one thread at a time is using the table, but this would mostly defeat parallelizing the query. There has been plenty of research into concurrency-friendly hash tables but the short summary is that it's still an open issue. 

It is possible to let each thread read data from downstream operators and build individual, local hash tables and merge those together later from a single thread. This works quite nicely if there are few groups like in the example at the top of this post. If there are few groups, a single thread can merge many thread-local hash tables without creating a bottleneck. However, it’s entirely possible there are as many groups as there are input rows, for this tends to happen a lot when someone groups on a column that would be a candidate for a primary key, e.g., `observation_number`, `timestamp` etc. What is thus needed is a parallel merge of the parallel hash tables. We adopt a method from [Leis et al.](https://15721.courses.cs.cmu.edu/spring2016/papers/p743-leis.pdf): Each thread builds not one, but multiple *partitioned* hash tables based on a radix-partitioning on the group hash. 

![](../images/blog/aggregates/aggr-ht-parallel.png)

<figcaption align="center"><b>Partitioning Hash Tables for Parallelized Merging</b></figcaption>

The key observation here is that if two groups have a different hash value, they cannot possibly be the same. Because of this property, it is possible to use the hash values to create fully independent partitions of the groups without requiring any communication between threads as long as all the threads use the same partitioning scheme (see Phase 1 in the above diagram). 

After all the local hash tables have been constructed, we assign individual partitions to each worker thread and merge the hash tables within that partition together (Phase 2). Because the partitions were created using the radix partitioning scheme on the hash, all worker threads can independently merge the hash tables within their respective partitions. The result is correct because each group goes into a single partition and that partition only. 

One interesting detail is that we never need to build a final (possibly giant) hash table that holds all the groups because the radix group partitioning ensures that each group is localized to a partition.

There are two additional optimizations for the parallel partitioned hash table strategy: 
1) We only start partitioning once a single thread’s aggregate hash table exceeds a fixed limit of entries, currently set to 10 000 rows. This is because using a partitioned hash table is not free. For every row added, we have to figure out which partition it should go into, and we have to merge everything back together at the end. For this reason, we will not start partitioning until the parallelization benefit outweighs the cost. Since the partitioning decision is individual to each thread, it may well be possible only some threads start partitioning. If that is the case, we will need to partition the hash tables of the threads that have not done so before starting merging them. This is a fully thread-local operation however and does not interfere with parallelism. 
2) We will stop adding values to a hash table once its pointer array exceeds a certain threshold. Every thread then builds multiple sets of potentially partitioned hash tables. This is because we do not want the pointer array to become arbitrarily large. While this potentially creates duplicate entries for the same group in multiple hash tables, this is not problematic because we merge them all later anyway. This optimization works particularly well on data sets that have many distinct groups, but have group values that are clustered in the input in some manner. For example, when grouping by day in a data set that is ordered on date.

There are some kinds of aggregates which cannot use the parallel and partitioned hash table approach. While it is trivial to parallelize a sum, because the sum of the overall result is just the sum of the individual results, this is fairly impossible for computations like `median`, which DuckDB also supports. Also for this reason, DuckDB also supports `approx_quantile`, which *is* parallelizable. 


<a name="experiments"></a>

#### Experiments

Putting all this together, it’s now time for some performance experiments. We will compare DuckDB’s aggregation operator as described above with the same operator in various Python data wrangling libraries. The other contenders are Pandas, Polars and Arrow. Those are chosen since they can all execute an aggregation operator on Pandas DataFrames without converting into some other storage format first, just like DuckDB. 

For our benchmarks, we generate a synthetic dataset with a pre-defined number of groups over two integer columns and some random integer data to aggregate. The entire dataset is shuffled before the experiments to prevent taking advantage of the clustered nature of the synthetically generated data. For each group, we compute two aggregates, sum of the data column and a simple count. The SQL version of this aggregation would be `SELECT g1, g2, sum(d), count(*) FROM dft GROUP BY g1, g2 LIMIT 1;`. In the experiments below, we vary the dataset size and the amount of groups in them. This should nicely show the scaling behavior of the aggregation. 

Because we are not interested in measuring the result set materialization time which would be significant for millions of groups, we follow the aggregation with an operator that only retrieves the first row. This does not change the complexity of the aggregation at all, since it needs to collect all data before producing even the first result row, since there might be data in the very last input data row that changes results for the first result. Of course this would be fairly unrealistic in practice, but it should nicely isolate the behavior of the aggregation operator only, since a `head(1)` operation on three columns should be fairly cheap and constant in execution time. 


![](../images/blog/aggregates/aggr-bench-rows-fewgroups.svg)

<figcaption align="center"><b>Varying row count for 1000 groups</b></figcaption>

We measure the elapsed wall clock time required to complete each aggregation. To account for minor variation, we repeat each measurement three times and report the median time required. All experiments were run on a 2021 MacBook Pro with a ten-core M1 Max processor and 64 GB of RAM. Our data generation benchmark script [is available online](https://gist.github.com/hannes/e2599ae338d275c241c567934a13d422) and we invite interested readers to re-run the experiment on their machines. 


![](../images/blog/aggregates/aggr-bench-rows-manygroups.svg)

<figcaption align="center"><b>Varying both row count and group count</b></figcaption>

Now let's discuss some results. We start with varying the amount of rows in the table between one million and 100 millions. We repeat the experiment for both a fixed (small) group count of 1000 and when the amount of groups is equal to the amount of rows. Results are plotted as a *log-log plot*, we can see how DuckDB consistently outperforms the other systems, with the single-threaded Pandas being slowest, Polars and Arrow being generally similar.

![](../images/blog/aggregates/aggr-bench-groups.svg)

<figcaption align="center"><b>Varying group count for 100M rows</b></figcaption>


For the next experiment, we fix the amount of rows at 100M (the largest size we experimented with) and show the full behavior when increasing the group size. We can see again how DuckDB consistently exhibits good scaling behavior when increasing group size, because it can effectively parallelize all phases of aggregation as outlined above. If you are interested in how we generated those plots, the plotting [script is available, too](https://gist.github.com/hannes/9b0e47625290b8af78de88e1d26441c0).



#### Conclusion

Data analysis pipelines using mostly aggregation spend the vast majority of their execution time in the aggregate hash table, which is why it is worth spending an ungodly amount of human time optimizing them. We have some ideas for future work on this, for example we would like to extend [our work when comparing sorting keys](https://duckdb.org/2021/08/27/external-sorting) to comparing groups in the aggregate hash table. We also would like to add capabilities of dynamically choosing the amount of partitions a thread uses based on dynamic observation of the created hash table, e.g., if partitions are imbalanced we could use more bits to do so. Another large area of future work is to make our aggregate hash table work with out-of-core operations, where an individual hash table no longer fits in memory, this is particularly problematic when merging. And of course there are always opportunities to fine-tune an aggregation operator, and we are continuously improving DuckDBs aggregation operator. 

If you want to work on cutting edge data engineering like this that will be used by thousands of people, consider contributing to DuckDB or join us at DuckDB Labs in Amsterdam!

## Friendlier SQL with DuckDB

**Publication date:** 2022-05-04

**Author:** Alex Monahan

**TL;DR:** DuckDB offers several extensions to the SQL syntax. For a full list of these features, see the [Friendly SQL documentation page](/docs/guides/sql_features/friendly_sql).

![](../images/blog/duck_chewbacca.png)


An elegant user experience is a key design goal of DuckDB. This goal guides much of DuckDB's architecture: it is simple to install, seamless to integrate with other data structures like Pandas, Arrow, and R Dataframes, and requires no dependencies. Parallelization occurs automatically, and if a computation exceeds available memory, data is gracefully buffered out to disk. And of course, DuckDB's processing speed makes it easier to get more work accomplished.

However, SQL is not famous for being user-friendly. DuckDB aims to change that! DuckDB includes both a Relational API for dataframe-style computation, and a highly Postgres-compatible version of SQL. If you prefer dataframe-style computation, we would love your feedback on [our roadmap](https://github.com/duckdb/duckdb/issues/2000). If you are a SQL fan, read on to see how DuckDB is bringing together both innovation and pragmatism to make it easier to write SQL in DuckDB than anywhere else. Please reach out on [GitHub](https://github.com/duckdb/duckdb/discussions) or [Discord](https://discord.gg/vukK4xp7Rd) and let us know what other features would simplify your SQL workflows. Join us as we teach an old dog new tricks!

#### `SELECT * EXCLUDE`

A traditional SQL `SELECT` query requires that requested columns be explicitly specified, with one notable exception: the `*` wildcard. `SELECT *` allows SQL to return all relevant columns. This adds tremendous flexibility, especially when building queries on top of one another. However, we are often interested in *almost* all columns. In DuckDB, simply specify which columns to `EXCLUDE`:

```sql
SELECT * EXCLUDE (jar_jar_binks, midichlorians) FROM star_wars;
```

Now we can save time repeatedly typing all columns, improve code readability, and retain flexibility as additional columns are added to underlying tables.  

DuckDB's implementation of this concept can even handle exclusions from multiple tables within a single statement:

```sql
SELECT 
    sw.* EXCLUDE (jar_jar_binks, midichlorians),
    ff.* EXCLUDE cancellation
FROM star_wars sw, firefly ff;
```

#### `SELECT * REPLACE`

Similarly, we often wish to use all of the columns in a table, aside from a few small adjustments. This would also prevent the use of `*` and require a list of all columns, including those that remain unedited. In DuckDB, easily apply changes to a small number of columns with `REPLACE`:

```sql
SELECT 
    * REPLACE (movie_count+3 AS movie_count, show_count*1000 AS show_count)
FROM star_wars_owned_by_disney;
```

This allows views, CTE's, or sub-queries to be built on one another in a highly concise way, while remaining adaptable to new underlying columns.

#### `GROUP BY ALL`

A common cause of repetitive and verbose SQL code is the need to specify columns in both the `SELECT` clause and the `GROUP BY` clause. In theory this adds flexibility to SQL, but in practice it rarely adds value. DuckDB now offers the `GROUP BY` we all expected when we first learned SQL – just `GROUP BY ALL` columns in the `SELECT` clause that aren't wrapped in an aggregate function!

```sql
SELECT
    systems,
    planets,
    cities,
    cantinas,
    sum(scum + villainy) AS total_scum_and_villainy
FROM star_wars_locations
GROUP BY ALL;
-- GROUP BY systems, planets, cities, cantinas
```

Now changes to a query can be made in only one place instead of two! Plus this prevents many mistakes where columns are removed from a `SELECT` list, but not from the `GROUP BY`, causing duplication.

Not only does this dramatically simplify many queries, it also makes the above `EXCLUDE` and `REPLACE` clauses useful in far more situations. Imagine if we wanted to adjust the above query by no longer considering the level of scum and villainy in each specific cantina:

```sql
SELECT
    * EXCLUDE (cantinas, booths, scum, villainy),
    sum(scum + villainy) AS total_scum_and_villainy
FROM star_wars_locations
GROUP BY ALL;
-- GROUP BY systems, planets, cities
```

Now that is some concise and flexible SQL! How many of your `GROUP BY` clauses could be re-written this way?

#### `ORDER BY ALL`

Another common cause for repetition in SQL is the `ORDER BY` clause. DuckDB and other RDBMSs have previously tackled this issue by allowing queries to specify the numbers of columns to `ORDER BY` (For example, `ORDER BY 1, 2, 3`). However, frequently the goal is to order by all columns in the query from left to right, and maintaining that numeric list when adding or subtracting columns can be error prone. In DuckDB, simply `ORDER BY ALL`:

```sql
SELECT
    age,
    sum(civility) AS total_civility
FROM star_wars_universe
GROUP BY ALL
ORDER BY ALL;
-- ORDER BY age, total_civility
```

This is particularly useful when building summaries, as many other client tools automatically sort results in this manner. DuckDB also supports `ORDER BY ALL DESC` to sort each column in reverse order, and options to specify `NULLS FIRST` or `NULLS LAST`.

#### Column Aliases in `WHERE` / `GROUP BY` / `HAVING`

In many SQL dialects, it is not possible to use an alias defined in a `SELECT` clause anywhere but in the `ORDER BY` clause of that statement. This commonly leads to verbose CTE's or subqueries in order to utilize those aliases. In DuckDB, a non-aggregate alias in the `SELECT` clause can be immediately used in the `WHERE` and `GROUP BY` clauses, and aggregate aliases can be used in the `HAVING` clause, even at the same query depth. No subquery needed!

```sql
SELECT
    only_imperial_storm_troopers_are_so_precise AS nope,
    turns_out_a_parsec_is_a_distance AS very_speedy,
    sum(mistakes) AS total_oops
FROM oops
WHERE
    nope = 1
GROUP BY
    nope,
    very_speedy
HAVING
    total_oops > 0;
```

#### Case Insensitivity While Maintaining Case

DuckDB allows queries to be case insensitive, while maintaining the specified case as data flows into and out of the system. This simplifies queries within DuckDB while ensuring compatibility with external libraries.

```sql
CREATE TABLE mandalorian AS SELECT 1 AS "THIS_IS_THE_WAY";
SELECT this_is_the_way FROM mandalorian;
```  

| THIS_IS_THE_WAY |
| --------------: |
|               1 |

#### Friendly Error Messages

Regardless of expertise, and despite DuckDB's best efforts to understand our intentions, we all make mistakes in our SQL queries. Many RDBMSs leave you trying to use the force to detect an error. In DuckDB, if you make a typo on a column or table name, you will receive a helpful suggestion about the most similar name. Not only that, you will receive an arrow that points directly to the offending location within your query.

```sql
SELECT * FROM star_trek;
```

```console
Error: Catalog Error: Table with name star_trek does not exist!
Did you mean "star_wars"?
LINE 1: SELECT * FROM star_trek;
                      ^
```

(Don't worry, ducks and duck-themed databases still love some Trek as well).

DuckDB's suggestions are even context specific. Here, we receive a suggestion to use the most similar column from the table we are querying.

```sql
SELECT long_ago FROM star_wars;
```

```console
Error: Binder Error: Referenced column "long_ago" not found in FROM clause!
Candidate bindings: "star_wars.long_long_ago"
LINE 1: SELECT long_ago FROM star_wars;
               ^
```

#### String Slicing

Even as SQL fans, we know that SQL can learn a thing or two from newer languages. Instead of using bulky `SUBSTRING` functions, you can slice strings in DuckDB using bracket syntax. As a note, SQL is required to be 1-indexed, so that is a slight difference from other languages (although it keeps DuckDB internally consistent and similar to other DBs). 

```sql
SELECT 'I love you! I know'[:-3] AS nearly_soloed;
```  


| nearly_soloed   |
| :-------------- |
| I love you! I k |

#### Simple List and Struct Creation

DuckDB provides nested types to allow more flexible data structures than the purely relational model would allow, while retaining high performance. To make them as easy as possible to use, creating a `LIST` (array) or a `STRUCT` (object) uses simpler syntax than other SQL systems. Data types are automatically inferred.

```sql
SELECT
    ['A-Wing', 'B-Wing', 'X-Wing', 'Y-Wing'] AS starfighter_list,
    {name: 'Star Destroyer', common_misconceptions: 'Can''t in fact destroy a star'} AS star_destroyer_facts;
```

#### List Slicing

Bracket syntax may also be used to slice a `LIST`. Again, note that this is 1-indexed for SQL compatibility.

```sql
SELECT 
    starfighter_list[2:2] AS dont_forget_the_b_wing 
FROM (SELECT ['A-Wing', 'B-Wing', 'X-Wing', 'Y-Wing'] AS starfighter_list);
```  


| dont_forget_the_b_wing |
| :--------------------- |
| [B-Wing]               |

#### Struct Dot Notation

Use convenient dot notation to access the value of a specific key in a DuckDB `STRUCT` column. If keys contain spaces, double quotes can be used.

```sql
SELECT 
    planet.name,
    planet."Amount of sand" 
FROM (SELECT {name: 'Tatooine', 'Amount of sand': 'High'} AS planet);
```

#### Trailing Commas

Have you ever removed your final column from a SQL `SELECT` and been met with an error, only to find you needed to remove the trailing comma as well!? Never? Ok, Jedi... On a more serious note, this feature is an example of DuckDB's responsiveness to the community. In under 2 days from seeing this issue in a tweet (not even about DuckDB!), this feature was already built, tested, and merged into the primary branch. You can include trailing commas in many places in your query, and we hope this saves you from the most boring but frustrating of errors! 

```sql
SELECT
    x_wing,
    proton_torpedoes,
    --targeting_computer
FROM luke_whats_wrong
GROUP BY
    x_wing,
    proton_torpedoes,
;
```

#### Function Aliases from Other Databases

For many functions, DuckDB supports multiple names in order to align with other database systems. After all, ducks are pretty versatile – they can fly, swim, and walk! Most commonly, DuckDB supports PostgreSQL function names, but many SQLite names are supported, as well as some from other systems. If you are migrating your workloads to DuckDB and a different function name would be helpful, please reach out – they are very easy to add as long as the behavior is the same! See our [functions documentation](#docs:lts:sql:functions:overview) for details.

```sql
SELECT
    'Use the Force, Luke'[:13] AS sliced_quote_1,
    substr('I am your father', 1, 4) AS sliced_quote_2,
    substring('Obi-Wan Kenobi, you''re my only hope', 17, 100) AS sliced_quote_3;
```

#### Auto-Increment Duplicate Column Names

As you are building a query that joins similar tables, you'll often encounter duplicate column names. If the query is the final result, DuckDB will simply return the duplicated column names without modifications. However, if the query is used to create a table, or nested in a subquery or Common Table Expression (where duplicate columns are forbidden by other databases!), DuckDB will automatically assign new names to the repeated columns to make query prototyping easier.

```sql
SELECT
    *
FROM (
    SELECT
        s1.tie_fighter,
        s2.tie_fighter
    FROM squadron_one s1
    CROSS JOIN squadron_two s2
    ) theyre_coming_in_too_fast;
```  

| tie_fighter | tie_fighter:1 |
| :---------- | :------------ |
| green_one   | green_two     |

#### Implicit Type Casts

DuckDB believes in using specific data types for performance, but attempts to automatically cast between types whenever necessary. For example, when joining between an integer and a varchar, DuckDB will automatically cast them to be the same type and complete the join successfully. A `List` or `IN` expression may also be created with a mixture of types, and they will be automatically cast as well. Also, `INTEGER` and `BIGINT` are interchangeable, and thanks to DuckDB's new storage compression, a `BIGINT` usually doesn't even take up any extra space! Now you can store your data as the optimal data type, but use it easily for the best of both!

```sql
CREATE TABLE sith_count_int AS SELECT 2::INTEGER AS sith_count;
CREATE TABLE sith_count_varchar AS SELECT 2::VARCHAR AS sith_count;

SELECT
    * 
FROM sith_count_int s_int 
JOIN sith_count_varchar s_char 
  ON s_int.sith_count = s_char.sith_count;
```

| sith_count | sith_count |
| ---------: | ---------: |
|          2 |          2 |

#### Other Friendly Features

There are many other features of DuckDB that make it easier to analyze data with SQL!  

DuckDB [makes working with time easier in many ways](https://duckdb.org/2022/01/06/time-zones), including by accepting multiple different syntaxes (from other databases) for the [`INTERVAL` data type](#docs:lts:sql:data_types:interval) used to specify a length of time.  

DuckDB also implements multiple SQL clauses outside of the traditional core clauses including the [`SAMPLE` clause](#docs:lts:sql:query_syntax:sample) for quickly selecting a random subset of your data and the [`QUALIFY` clause](#docs:lts:sql:query_syntax:qualify) that allows filtering of the results of window functions (much like a `HAVING` clause does for aggregates).  

The [`DISTINCT ON` clause](#docs:lts:sql:query_syntax:select::distinct-on-clause) allows DuckDB to select unique combinations of a subset of the columns in a `SELECT` clause, while returning the first row of data for columns not checked for uniqueness.

#### Ideas for the Future

In addition to what has already been implemented, several other improvements have been suggested. Let us know if one would be particularly useful – we are flexible with our roadmap! If you would like to contribute, we are very open to PRs and you are welcome to reach out on [GitHub](https://github.com/duckdb/duckdb) or [Discord](https://discord.gg/vukK4xp7Rd) ahead of time to talk through a new feature's design. 

* Choose columns via regex
  * Decide which columns to select with a pattern rather than specifying columns explicitly
  * ClickHouse supports this with the [`COLUMNS` expression](https://clickhouse.com/docs/en/sql-reference/statements/select/#columns-expression) 
* Incremental column aliases
  * Refer to previously defined aliases in subsequent calculated columns rather than re-specifying the calculations
* Dot operators for JSON types
  * The JSON extension is brand new ([see our documentation!]({% link docs/lts/data/json/overview.md %})) and already implements friendly `->` and `->>` syntax

Thanks for checking out DuckDB! May the Force be with you...

## Range Joins in DuckDB

**Publication date:** 2022-05-27

**Author:** Richard Wesley

**TL;DR:** DuckDB has fully parallelized range joins that can efficiently join millions of range predicates.

Range intersection joins are an important operation in areas such as
[temporal analytics](https://www2.cs.arizona.edu/~rts/tdbbook.pdf),
and occur when two inequality conditions are present in a join predicate.
Database implementations often rely on slow `O(N^2)` algorithms that compare every pair of rows
for these operations.
Instead, DuckDB leverages its fast sorting logic to implement two highly optimized parallel join operators
for these kinds of range predicates, resulting in 20-30× faster queries.
With these operators, DuckDB can be used effectively in more time-series-oriented use cases.

#### Introduction

Joining tables row-wise is one of the fundamental and distinguishing operations of the relational model.
A join connects two tables horizontally using some Boolean condition called a _predicate_.
This sounds straightforward, but how fast the join can be performed depends on the expressions in the predicate.
This has lead to the creation of different join algorithms that are optimized for different predicate types.

In this post, we will explain several join algorithms and their capabilities.
In particular, we will describe a newly added "range join" algorithm
that makes connecting tables on overlapping time intervals or multiple ordering conditions much faster.

##### Flight Data

No, this part isn't about ducks, but about air group flight statistics from the Battlestar Galactica reboot.
We have a couple of tables we will be using: `Pilots`, `Crafts`, `Missions` and `Battles`.
Some data was lost when the fleet dispersed, but hopefully this is enough to provide some "real life" examples!

The `Pilots` table contains the pilots and their data that does not change (name, call sign, serial number):

|   id | callsign | name             | serial |
| ---: | :------- | :--------------- | -----: |
|    1 | Apollo   | Lee Adama        | 234567 |
|    2 | Starbuck | Kara Thrace      | 462753 |
|    3 | Boomer   | Sharon Valeri    | 312743 |
|    4 | Kat      | Louanne Katraine | 244977 |
|    5 | Hotdog   | Brendan Costanza | 304871 |
|    6 | Husker   | William Adama    | 204971 |
|  ... | ...      | ...              |    ... |

The `Crafts` table contains all the various fighting craft
(ignoring the ["Ship Of Theseus"](https://en.wikipedia.org/wiki/Ship_of_Theseus) problem of recycled parts!):

|   id | type      | tailno |
| ---: | :-------- | :----- |
|    1 | Viper     | N7242C |
|    2 | Viper     | 2794NC |
|    3 | Raptor    | 312    |
|    4 | Blackbird | N9999C |
|  ... | ...       | ...    |

The `Missions` table contains all the missions flown by pilots.
Missions have a `begin` and `end` time logged with the flight deck.
We will use some common pairings
(and an unusual mission at the end where Commander Adama flew his old Viper):

|  pid |  cid | begin               | end                 |
| ---: | ---: | :------------------ | :------------------ |
|    2 |    2 | 3004-05-04 13:22:12 | 3004-05-04 15:05:49 |
|    1 |    2 | 3004-05-04 10:00:00 | 3004-05-04 18:19:12 |
|    3 |    3 | 3004-05-04 13:33:52 | 3004-05-05 19:12:21 |
|    6 |    1 | 3008-03-20 08:14:37 | 3008-03-20 10:21:15 |
|  ... |  ... | ...                 | ...                 |

The `Battles` table contains the time window of each
[battle with the Cylons](#<https:::en.battlestarwikiclone.org:wiki:Colonial_battles_chronology_(RDM)>).

| battle               | begin               | end                 |
| :------------------- | :------------------ | :------------------ |
| Fall of the Colonies | 3004-05-04 13:21:45 | 3004-05-05 02:47:16 |
| Red Moon             | 3004-05-28 07:55:27 | 3004-05-28 08:12:19 |
| Tylium Asteroid      | 3004-06-09 09:00:00 | 3004-06-09 11:14:29 |
| Resurrection Ship    | 3004-10-28 22:00:00 | 3004-10-28 23:47:05 |
| ...                  | ...                 | ...                 |

These last two tables (` Missions` and `Battles`) are examples of _state tables_.
An object in a state table has a state that runs between two time points.
For the battles, the state is just yes/no.
For the missions, the state is a pilot/craft combination.

##### Equality Predicates

The most common type of join involves comparing one or more pairs of expressions for equality,
often a primary key and a foreign key.
For example, if we want a list of the craft flown by the pilots,
we can join the `Pilots` table to the `Craft` table through the `Missions` table:

```sql
SELECT callsign, count(*), tailno
FROM Pilots p, Missions m, Crafts c
WHERE p.id = m.pid
  AND c.id = m.cid
GROUP BY ALL
ORDER BY 2 DESC;
```

This will give us a table like:

| callsign | count(\*) | tailno |
| :------- | --------: | :----- |
| Starbuck |       127 | 2794NC |
| Boomer   |        55 | R1234V |
| Apollo   |         3 | N7242C |
| Husker   |         1 | N7242C |
| ...      |       ... | ...    |

##### Range Predicates

The thing to notice in this example is that the conditions joining the tables are equalities connected with `AND`s.
But relational joins can be defined using _any_ Boolean predicate – even ones without equality or `AND`.

One common operation in temporal databases is intersecting two state tables.
Suppose we want to find the time intervals when each pilot was engaged in combat
so we can compute combat hours for seniority?
Vipers are launched quickly, but not before the battle has started,
and there can be malfunctions or pilots may be delayed getting to the flight deck.

```sql
SELECT callsign, battle,
    greatest(m.begin, b.begin) AS begin,
    least(m.end, b.end) AS end
FROM Pilots p, Missions m, Crafts c, Battles b
WHERE m.begin < b.end
  AND b.begin < m.end
  AND p.id = m.pid
  AND c.id = m.cid;
```

This join creates a set of records containing the call sign and period in combat for each pilot.
It handles the case where a pilot returns for a new craft, excludes patrol flights,
and even handles the situation when a patrol flight turns into combat!
This is because intersecting state tables this way produces a _joint state table_ –
an important temporal database operation.
Here are a few rows from the result:

| callsign | battle               | begin               | end                 |
| :------- | :------------------- | :------------------ | :------------------ |
| Starbuck | Fall of the Colonies | 3004-05-04 13:22:12 | 3004-05-04 15:05:49 |
| Apollo   | Fall of the Colonies | 3004-05-04 13:21:45 | 3004-05-04 18:19:12 |
| Boomer   | Fall of the Colonies | 3004-05-04 13:33:52 | 3004-05-05 02:47:16 |
| ...      | ...                  | ...                 | ...                 |

Apollo was already in flight when the first Cylon attack came,
so the query puts his `begin` time for the battle at the start of the battle,
not when he launched for the decommissioning flyby.
Starbuck and Boomer were scrambled after the battle started,
but Boomer did not return until after the battle was effectively over,
so her `end` time is moved back to the official end of the battle.

What is important here is that the join condition between the pilot/mission/craft relation
and the battle table has no equalities in it.
This kind of join is traditionally very expensive to compute,
but as we will see, there are ways of speeding it up.

##### Infinite Time

One common problem with populating state tables is how to represent the open edges.
For example, the begin time for the first state might not be known,
or the current state may not have ended yet.

Often such values are represented by `NULL`s,
but this complicates the intersection query because comparing with `NULL` yields `NULL`.
This issue can be worked around by using `coalesce(end, <large timestamp>)`,
but that adds a computation to every row, most of which don't need it.
Another approach is to just use `<large timestamp>` directly instead of the `NULL`,
which solves the expression computation problem but introduces an arbitrary time value.
This value may give strange results when used in computations.

DuckDB provides a third alternative from Postgres that can be used for these situations:
[infinite time values](https://www.postgresql.org/docs/14/datatype-datetime.html#DATATYPE-DATETIME-SPECIAL-TABLE).
Infinite time values will compare as expected, but arithmetic with them will produce `NULL`s or infinities,
indicating that the computation is not well defined.

#### Common Join Algorithms

To see why these joins can be expensive, let's start by looking at the two most common join algorithms.

##### Hash Joins

Joins with at least one equality condition `AND`ed to the rest of the conditions are called _equi-joins_.
They are usually implemented using a hash table like this:

```python
hashes = {}
for b in build:
    hashes[b.pk] = b

result = []
for p in probe:
    result.append((p, hashes[p.fk], ))
```

The expressions from one side (the _build_ side) are computed and hashed,
then the corresponding expressions from the other side (the _probe_ side)
are looked up in the hash table and checked for a match.

We can modify this a bit when only _some_ of the `AND`ed conditions are equalities
by checking the other conditions once we find the equalities in the hash table.
The important point is that we can use a hash table to make the join run time `O(N)`.
This modification is a general technique that can be used with any join algorithm which reduces the possible matches.

##### Nested Loop Joins

Since relational joins can be defined using _any_ Boolean predicate – even one without equality or `AND`,
hash joins do not always work.
The join algorithm of last resort in these situations is called a _Nested Loop Join_ (or NLJ for short),
and consists of just comparing every row from the probe side with every row from the build side:

```python
result = []
for p in probe:
    for b in build
        if compare(p, b):
            result.append((p, b, ))
```

This is `O(M x N)` in the number of rows, which can be very slow if the tables are large.
Even worse, most practical analytic queries (such as the combat hours example above)
will not return anything like this many results, so a lot of effort may be wasted.
But without an algorithm that is tuned for a kind of predicate,
this is what we would have to use.

#### Range Joins

When we have a range comparison (one of `<`, `<=` `>`, `>=`) as one of the join conditions,
we can take advantage of the ordering it implies by sorting the input relations on some of the join conditions.
Sorting is `O(N log N)`, which suggests that this could be faster than an NLJ,
and indeed this turns out to be the case.

##### Piecewise Merge Join

Before the advent of hash joins, databases would often sort the join inputs to find matches.
For equi-joins, a repeated binary search would then find the matching values on the build side in `O(M log N)` time.
This is called a _Merge Join_, and it runs faster than `O(M x N)`, but not as fast as the `O(N)` time of a hash join.
Still, in the case where we have a single range comparison,
the binary search lets us find the first match for a probe value.
We can then find all the remaining matches by looking after the first one.

If we also sort the probe side, we can even know where to start the search for the next probe value
because it will be after where we found the previous value.
This is how _Piecewise Merge Join_ (PWMJ) works:
We sort the build side so that the values are ordered by the predicate (either `ASC` or `DESC`),
then sort each probe chunk the same way so we can quickly scan through sets of values to find possible matches.
This can be significantly faster than NLJ for these types of queries.
If there are more join conditions, we can then check the generated matches to make sure all conditions are met
because once again the sorting has significantly reduced the number of checks that have to be made.

##### Inequality Join (IEJoin)

For two range conditions (like the combat pay query), there are even faster algorithms available.
We have recently added a new join called [IEJoin](https://vldb.org/pvldb/vol8/p2074-khayyat.pdf),
which sorts on two predicates to really speed things up.

The way that IEJoin works is to first sort both tables on the values for the first condition
and merge the two sort keys into a combined table that tracks the two input tables' row numbers.
Next, it sorts the positions in the combined table on the second range condition.
It can then quickly scan for matches that pass both conditions.
And just like for hash joins, we can check any remaining conditions
because we have hopefully significantly reduced the number pairs we have to test.

###### Walk Through

Because the algorithm is a bit tricky, let's step through a small example.
(If you are reading the paper, this is a simplified version of the "Union Arrays" optimization from §4.3,
but I find this version of the algorithm is much easier to understand than the version in §3.1.)
We are going to look at `Qp` from the paper, which is a self join on the table "West":

| West | t_id | time | cost | cores |
| :--- | ---: | ---: | ---: | ----: |
| s1   |  404 |  100 |    6 |     4 |
| s2   |  498 |  140 |   11 |     2 |
| s3   |  676 |   80 |   10 |     1 |
| s4   |  742 |   90 |    5 |     4 |

We are looking for pairs of billing ids where the second id had a shorter time than the first,
but a higher cost:

```sql
SELECT s1.t_id, s2.t_id AS t_id2
FROM west s1, west s2
WHERE s1.time > s2.time
  AND s1.cost < s2.cost;
```

There are two pairs that meet this criteria:

| t_id | t_id2 |
| ---: | ----: |
|  404 |   676 |
|  742 |   676 |

(This is an example of another kind of double range query where we are looking for anomalies.)

First, we sort both input tables on the first condition key (` time`).
(We sort `DESC` because we want the values to satisfy the join condition (` >`) from left to right.)

Because they are sorted the same way,
we can merge the condition keys from the sorted tables into a new table called `L1`
after marking each row with the table it came from (using negative row numbers to indicate the right table):

| L1   |   s2 |   s2 |   s1 |   s1 |   s4 |   s4 |   s3 |   s3 |
| :--- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| time |  140 |  140 |  100 |  100 |   90 |   90 |   80 |   80 |
| cost |   11 |   11 |    6 |    6 |    5 |    5 |   10 |   10 |
| rid  |    1 |   -1 |    2 |   -2 |    3 |   -3 |    4 |   -4 |

The `rid` column lets us map rows in `L1` back to the original table.

Next, we build a second table `L2` with the second condition key (` cost`) and the row positions (` P`) of `L1`
(not the row numbers from the original tables!)
We sort `L2` on `cost` (` DESC` again this time because now we want the join condition to hold from right to left):

| L2   |   s2 |   s2 |   s3 |   s3 |   s1 |   s1 |   s4 |   s4 |
| :--- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| cost |   11 |   11 |   10 |   10 |    6 |    6 |    5 |    5 |
| P    |    0 |    1 |    6 |    7 |    2 |    3 |    4 |    5 |

The sorted column of `L1` row positions is called the _permutation array_,
and we can use it to find the corresponding position of the `time` value for a given `cost`.

At this point we have two tables (` L1` and `L2`),
each sorted on one of the join conditions and pointing back to the tables it was derived from.
Moreover, the sort orders have been chosen so that the condition holds from left to right
(resp. right to left).
Since the conditions are transitive,
this means that whenever we have a value that satisfies a condition at a point in the table,
it also satisfies it for everything to the right (resp. left)!

With this setup, we can scan `L2` from left to right
looking for rows that match both conditions using two indexes:

- `i` iterates across `L2` from left to right;
- `off2` tracks `i` and is used to identify `costs` that satisfy the join condition compared to `i`. (Note that for loose inequalities, this could be to the right of `i`);

We use a bitmap `B` to track which rows in `L1` that the `L2` scan
has already identified as satisfying the `cost` condition compared to the `L2` scan position `i`.

Because we only want matches between one left and one right row, we can skip matches where the `rid`s have different signs.
To leverage this observation, we only process values of `i` that are in the left hand table (` rid[P[i]]` is positive),
and we only mark bits for rows in the right hand table (` rid[P[i]]` is negative).
In this example, the right side rows are the odd numbered values in `P` (which are conveniently also the odd values of `i`),
which makes them easy to track in the example.

For the other rows, here is what happens:

|    i | off2 | cost[i] | cost[off2] | P[i] | rid[P[i]] | B          | Result     |
| ---: | ---: | ------: | ---------: | ---: | --------: | :--------- | :--------- |
|    0 |    0 |      11 |         11 |    0 |         1 | `00000000` | []         |
|    2 | 0..2 |      10 |     11..10 |    6 |         4 | `01000000` | []         |
|    4 | 2..4 |       6 |      10..6 |    2 |         2 | `01000001` | [{s4, s3}] |
|    6 | 4..6 |       5 |       6..5 |    4 |         3 | `01010001` | [{s1, s3}] |

Whenever we find `cost`s that satisfy the condition to the left of the scan location (between `off2` and `i`),
we use `P[off2]` to mark the bits in `B` corresponding to those positions in `L1` that reference right side rows.
This records that the `cost` condition is satisfied for those rows.
Then whenever we have a position `P[i]` in `L1`,
we can scan `B` to the right to find values that also satisfy the `cost` condition.
This works because everything to the right of `P[i]` in `L1` satisfies the `price` condition
thanks the sort order of `L1` and the transitivity of the comparison operations.

In more detail:

1. When `i` and `off2` are `0`, the `cost` condition `<` is not satisfied, so nothing happens;
1. When `i` is `1`, we are looking at a row from the right side of the join, so we skip it and move on;
1. When `i` is `2`, we are now looking at a row from the left side, so we bring `off2` forward until the `cost` condition fails, marking `B` where it succeeds at `P[1] = [1]`;
1. We then scan the `time` values in `L1` right from position `P[i=2] = 6` and find no matches in `B`;
1. When `i` is `4`, we bring `off2` forward again, marking `B` at `P[3] = [7]`;
1. We then scan `time` from position `2` and find matches at `[6,7]`, one of which (` 6`) is from the right side table;
1. When `i` is `6`, we bring `off2` forward again, marking `B` at `P[5] = [3]`;
1. We then scan `time` from position `4` and again find matches at `[6,7]`;
1. Finally, when `i` runs off the end, we have no new `cost` values, so nothing happens;

What makes this fast is that we only have to check a few bits to find the matches.
When we do need to perform comparisons, we can use the fast radix comparison code from our sorting code,
which doesn't require special templated versions for every data type.
This not only reduces the code size and complexity, it "future-proofs" it against new data types.

###### Further Details

That walk through is a slightly simplified, single threaded version of the actual algorithm.
There are a few more details that may be of interest:

- Scanning large, mostly empty bit maps can be slow, so we use the Bloom filter optimization from §4.2.
- The published algorithm assumes that there are no duplicate `L1` values in either table. To handle the general case, we use an [exponential search](https://en.wikipedia.org/wiki/Exponential_search) to find the first `L1` value that satisfies the predicate with respect to the current position and scan right from that point;
- We also adapted the distributed Algorithm 3 from §5 by joining pairs of the sorted blocks generated by the sort code on separate threads. This allows us to fully parallelize the operator by first using parallel sorting and then by breaking up the join into independent pieces;
- Breaking up the pieces for parallel execution also allows us to spool join blocks that are not being processed to disk, making the join scalable.

#### Special Joins

One of the nice things about IEJoin is that it is very general and implements a number of more specialized join types reasonably efficiently.
For example, the state intersection query above is an example of an _interval join_
where we are looking to join on the intersection of two intervals.

Another specialized join that can be accelerated with `IEJoin` is a _band join_.
This can be used to join values that are "close" to each other

```sql
SELECT r.id, s.id
FROM r, s
WHERE r.value - s.value BETWEEN a AND b;
```

This translates into a double inequality join condition:

```sql
SELECT r.id, s.id
FROM r, s
WHERE s.value + a <= r.value AND r.value <= s.value + b;
```

which is exactly the type of join expression that IEJoin handles.

#### Performance

So how fast is the IEJoin?
It is so fast that it is difficult to compare it to the previous range join algorithms
because the improvements are so large that the other algorithms do not complete in a reasonable amount of time!

##### Simple Measurements

To give an example, here are the run times for a 100K self join of some employee tax and salary data,
where the goal is to find the 1001 pairs of employees where one has a higher salary but the other has a higher tax rate:

```sql
SELECT
    r.id,
    s.id
FROM Employees r
JOIN Employees s
    ON r.salary < s.salary
    AND r.tax > s.tax;
```

| Algorithm | Time (s) |
| :-------- | -------: |
| NLJ       |   21.440 |
| PWMJ      |   38.698 |
| IEJoin    |    0.280 |

Another example is a self join to find 3772 overlapping events in a 30K event table:

```sql
SELECT
    r.id,
    s.id
FROM events r
JOIN events s
    ON r.start <= s.end
    AND r.end >= s.start
    AND r.id <> s.id;
```

| Algorithm | Time (s) |
| :-------- | -------: |
| NLJ       |    6.985 |
| PWMJ      |    4.780 |
| IEJoin    |    0.226 |

In both cases we see performance improvements of 20-100×,
which is very helpful when you run a lot of queries like these!

##### Optimization Measurements

A third example demonstrates the importance of the join pair filtering and exponential search optimizations.
The data is a state table of
[library circulation data](https://www.opendata.dk/city-of-aarhus/transaktionsdata-fra-aarhus-kommunes-biblioteker)
from another [interval join paper](https://vldb.org/pvldb/vol10/p1346-bouros.pdf),
and the query is a point-in-period temporal query used to generate Figure 4d:

```sql
SELECT x, count(*) AS y
FROM books,
    (SELECT x FROM range('2013-01-01'::TIMESTAMP, '2014-01-01'::TIMESTAMP, INTERVAL 1 DAY) tbl(x)) dates
WHERE checkout <= x AND x <= return
GROUP BY ALL
ORDER BY 1;
```

The result is a count of the number of books checked out at midnight on each day.
These are the runtimes on an 18 core iMac Pro:

| Improvement |     Time |   CPU |
| :---------- | -------: | ----: |
| Unoptimized |   > 30 m | ~100% |
| Filtering   | 119.76 s |  269% |
| Exponential |  11.21 s |  571% |

The query joins a 35M row table with a 365 row table, so most of the data comes from the left hand side.
By avoiding setting bits for the matching rows in the left table, we eliminate almost all `L1` checks.
This dramatically reduces the runtime and improved the CPU utilization.

The data also has a large number of rows corresponding to books that were checked out at the start of the year,
which all have the same `checkout` date.
Searching left linearly in the first block to find the first match for the scan
resulted in repeated runs of ~120K comparisons.
This caused the runtime to be completely dominated by processing the first block.
By reducing the number of comparisons for these rows from an average of ~60K to 16,
the runtime dropped by a factor of 10 and the CPU utilization doubled.

#### Conclusion and Feedback

In this blog post, we explained the new DuckDB range join improvements provided by the new IEJoin operator.
This should greatly improve the response time of state table joins and anomaly detection joins.
We hope this makes your DuckDB experience even better – and please let us know if you run into any problems!
Feel free to reach out on our [GitHub page](https://github.com/duckdb/duckdb), or our [Discord server](https://discord.gg/vukK4xp7Rd).

## Persistent Storage of Adaptive Radix Trees (ART) in DuckDB

**Publication date:** 2022-07-27

**Author:** Pedro Holanda

**TL;DR:** DuckDB uses Adaptive Radix Tree (ART) Indexes to enforce constraints and to speed up query filters. Up to this point, indexes were not persisted, causing issues like loss of indexing information and high reload times for tables with data constraints. We now persist ART Indexes to disk, drastically diminishing database loading times (up to orders of magnitude), and we no longer lose track of existing indexes. This blog post contains a deep dive into the implementation of ART storage, benchmarks, and future work. Finally, to better understand how our indexes are used, I'm asking you to answer the following [survey](https://forms.gle/eSboTEp9qpP7ybz98). It will guide us when defining our future roadmap.

![](../images/blog/ART/pedro-art.jpg)


DuckDB uses [ART Indexes](https://db.in.tum.de/~leis/papers/ART.pdf) to keep primary key (PK), foreign key (FK), and unique constraints. They also speed up point-queries, range queries (with high selectivity), and joins. Before the bleeding edge version (or V0.4.1, depending on when you are reading this post), DuckDB did not persist ART indexes on disk. When storing a database file, only the information about existing PKs and FKs would be stored, with all other indexes being transient and non-existing when restarting the database. For PKs and FKs, they would be fully reconstructed when reloading the database, creating the inconvenience of high-loading times.

A lot of scientific work has been published regarding ART Indexes, most notably on [synchronization](https://db.in.tum.de/~leis/papers/artsync.pdf), [cache-efficiency](https://dbis.uibk.ac.at/sites/default/files/2018-06/hot-height-optimized.pdf), and [evaluation](https://bigdata.uni-saarland.de/publications/ARCD15.pdf). However, up to this point, no public work exists on serializing and buffer managing an ART Tree. [Some say](https://twitter.com/muehlbau/status/1548024479971807233) that Hyper, the database in Tableau, persists ART indexes, but again, there is no public information on how that is done.

This blog post will describe how DuckDB stores and loads ART indexes. In particular, how the index is lazily loaded (i.e., an ART node is only loaded into memory when necessary). In the [ART Index Section](#::art-index), we go through what an ART Index is, how it works, and some examples. In the [ART in DuckDB Section](#::art-in-duckdb), we explain why we decided to use an ART index in DuckDB where it is used and discuss the problems of not persisting ART indexes. In the [ART Storage Section](#::art-storage), we explain how we serialize and buffer manage ART Indexes in DuckDB. In the [Benchmarks Section](#::benchmarks), we compare DuckDB v0.4.0 (before ART Storage) with the bleeding edge version of DuckDB. We demonstrate the difference in the loading costs of PKs and FKs in both versions and the differences between lazily loading an ART index and accessing a fully loaded ART Index. Finally, in the [Road Map section](#::roadmap), we discuss the drawbacks of our current implementations and the plans on the list of ART index goodies for the future.

#### ART Index

Adaptive Radix Trees are, in essence, [Tries](https://en.wikipedia.org/wiki/Trie) that apply vertical and horizontal compression to create compact index structures.

##### [Trie](https://en.wikipedia.org/wiki/Trie)

Tries are tree data structures, where each tree level holds information on part of the dataset. They are commonly exemplified with strings. In the figure below, you can see a Trie representation of a table containing the strings "pedro", "paulo" and "peri" The root node represents the first character "p" with children "a" (from paulo) and "e" (from pedro and peri), and so on.

![](../images/blog/ART/string-trie.png)


To perform lookups on a Trie, you must match each character of the key to the current level of the Trie. For example, if you search for pedro, you must check the root contains the letter p. If it does, you check if any of its children contains the letter e, up to the point you reach a leaf node containing the pointer to the tuple that holds this string. (See figure below).

![](../images/blog/ART/lookup-trie.png)


The main advantage of Tries is that they have O(k) lookups, meaning that in the worst case, the lookup cost will equal the length of the strings.

In reality, Tries can also be used for numeric data types. However, storing them character by character-like strings would be wasteful. Take, for example, the `UBIGINT` data type. In reality, `UBIGINT` is a `uint64_t` which takes 64 bits (i.e., 8 bytes) of space. The maximum value of a `uint64_t` is `18,446,744,073,709,551,615`. Hence if we represented it, like in the example above, we would need 17 levels on the Trie. In practice, Tries are created on a bit fan-out, which tells how many bits are represented per level of the Trie. A `uint64_t` Trie with 8-bit fan-out would have a maximum of 8 levels, each representing a byte.

To have more realistic examples, from this point onwards, all depictions in this post will be with bit representations. In DuckDB, the fan-out is always 8 bits. However, for simplicity, the following examples in this blog post will have a fan-out of 2 bits.

In the example below, we have a Trie that indexes the values 7, 10, and 12. You can also see the binary representation of each value on the table next to them. Each node consists of the bits 0 and 1, with a pointer next to them. This pointer can either be set (represented by `*`) or null (represented by `Ø`). Similar to the string Trie we had before, each level of the Trie will represent two bits, with the pointer next to these bits pointing to their children. Finally, the leaves point to the actual data. 

![](../images/blog/ART/2-bit-trie.png)


One can quickly notice that this Trie representation is wasteful on two different fronts. First, many nodes only have one child (i.e., one path), which could be collapsed by vertical compression (i.e., Radix Tree). Second, many nodes have null pointers, storing space without any information in them, which could be resolved with horizontal compression.

##### Vertical Compression (i.e., [Radix Trees](https://en.wikipedia.org/wiki/Radix_tree))

The basic idea of vertical compression is that we collapse paths with nodes that only have one child. To support this, nodes store a prefix variable containing the collapsed path to that node. You can see a representation of this in the figure below. For example, one can see that the first four nodes have only one child. These nodes can be collapsed to the third node (i.e., the first one that bifurcates) as a prefix path. When performing lookups, the key must match all values included in the prefix path. 

![](../images/blog/ART/2-bit-collapse-trie.png)


Below you can see the resulting Trie after vertical compression. This Trie variant is commonly known as a Radix Tree. Although a lot of wasted space has already been saved with this Trie variant, we still have many nodes with unset pointers.

![](../images/blog/ART/2-bit-collapse-trie-result.png)





##### Horizontal Compression (i.e., ART)

To fully understand the design decisions behind ART indexes, we must first extend the 2-bit fan-out to 8-bits, the commonly found fan-out for database systems.

![](../images/blog/ART/8-bit-radix-tree.png)


Below you can see the same nodes as before in a TRIE node of 8 bits. In reality, these nodes will store (2^8) 256 pointers, with the key being the array position of the pointer. In the case depicted by this example, we have a node with (256 pointers * 8 bytes) 2048 byte size while only actually utilizing  24 bytes (3 pointers * 8 bytes), which means that 2016 bytes are entirely wasted. To avoid this situation. ART indexes are composed of 4 different node types that depend on how full the current node is. Below I quickly describe each node with a graphical representation of them. In the graphical representation, I present a conceptual visualization of the node and an example with keys 0,4 and 255.



**Node 4**: Node 4 holds up to 4 different keys. Each key is stored in a one-byte array, with one pointer per key. With its total size being 40 bytes (4\*1 + 4\*8). Note that the pointer array is aligned with the key array (e.g., key 0 is in position 0 of the keys array, hence its pointer is in position 0 of the pointers array)

![](../images/blog/ART/art-4.png)


**Node 16** : Node 16 holds up to 16 different keys. Like node 4, each key is stored in a one-byte array, with one pointer per key. With its total size being 144 bytes (16\*1 + 16\*8). Like Node 4, the pointer array is aligned with the key array.

![](../images/blog/ART/art-16.png)


**Node 48** : Node 48 holds up to 48 different keys. When a key is present in this node, the one-byte array position representing that key will hold an index into the pointer array that points to the child of that key. Its total size is 640 bytes (256\*1 + 48\*8). Note that the pointer array and the key array are not aligned anymore. The key array points to the position in the pointer array where the pointer of that key is stored (e.g., the key 255 in the key array is set to 2 because the position 2 of the pointer array points to the child pertinent to that key).

![](../images/blog/ART/art-48.png)


**Node 256**: Node 256 holds up to 256 different keys, hence all possible values in the distribution. It only has a pointer vector, if the pointer is set, the key exists, and it points to its child. Its total size is 2048 bytes (256 pointers * 8 bytes).

![](../images/blog/ART/art-256.png)


For the example in the previous section, we could use a `Node 4` instead of a `Node 256` to store the keys, since we only have 3 keys present. Hence it would look like the following:

![](../images/blog/ART/art-index-example.png)


#### ART in DuckDB

When considering which index structure to implement in DuckDB, we wanted a structure that could be used to keep PK/FK/Unique constraints while also being able to speed up range queries and Joins. Database systems commonly implement [Hash-Tables](https://en.wikipedia.org/wiki/Hash_table) for constraint checks and [BP-Trees](https://en.wikipedia.org/wiki/B%2B_tree) for range queries. However, we saw in ART Indexes an opportunity to diminish the code complexity by having one data structures for two use cases. The main characteristics that ART Index provides that we take advantage of are:
1. Compact Structure. Since the ART internal nodes are rather small, they can fit in CPU caches, being a more cache-conscious structure than BP-Trees.
2. Fast Point Queries. The worst case for an ART point query is O(k), which is sufficiently fast for constraint checking.
3. No dramatic regression on insertions. Many Hash-Table variants must be rebuilt when they reach a certain size. In practice, one insert might cause a significant regression in time, with a query suddenly taking orders of magnitude more time to complete, with no apparent reason for the user. In the ART, inserts might cause node growths (e.g., a Node 4 might grow to a Node 16), but these are inexpensive.
4. Ability to run range queries. Although the ART does not run range queries as fast as BP-Trees since it must perform tree traversals, where the BP-Tree can scan leaf nodes sequentially, it still presents an advantage over hash tables since these types of queries can be done (Some might argue that you can use Hash Tables for range queries, but meh). This allows us to efficiently use ART for highly selective range queries and index joins.
5. Maintainability. Using one structure for both constraint checks and range queries instead of two is more code efficient and maintainable.

##### What Is It Used For?

As said previously, ART indexes are mainly used in DuckDB on three fronts.

1. Data Constraints. Primary key, Foreign Keys, and Unique constraints are all maintained by an ART Index. When inserting data in a tuple with a constraint, this will effectively try to perform an insertion in the ART index and fail if the tuple already exists.

   ```sql
   CREATE TABLE integers (i INTEGER PRIMARY KEY);
   -- Insert unique values into ART
   INSERT INTO integers VALUES (3), (2);
   -- Insert conflicting value in ART will fail
   INSERT INTO integers VALUES (3);

   CREATE TABLE fk_integers (j INTEGER, FOREIGN KEY (j) REFERENCES integers(i));
   -- This insert works normally
   INSERT INTO fk_integers VALUES (2), (3);
   -- This fails after checking the ART in integers
   INSERT INTO fk_integers VALUES (4);
   ```

2. Range Queries. Highly selective range queries on indexed columns will also use the ART index underneath.

   ```sql
   CREATE TABLE integers (i INTEGER PRIMARY KEY);
   -- Insert unique values into ART
   INSERT INTO integers VALUES (3), (2), (1), (8) , (10);
   -- Range queries (if highly selective) will also use the ART index
   SELECT * FROM integers WHERE i >= 8;
   ```

3. Joins. Joins with a small number of matches will also utilize existing ART indexes.

   ```sql
   -- Optionally you can always force index joins with the following pragma
   PRAGMA force_index_join;

   CREATE TABLE t1 (i INTEGER PRIMARY KEY);
   CREATE TABLE t2 (i INTEGER PRIMARY KEY);
   -- Insert unique values into ART
   INSERT INTO t1 VALUES (3), (2), (1), (8), (10);
   INSERT INTO t2 VALUES (3), (2), (1), (8), (10);
   -- Joins will also use the ART index
   SELECT * FROM t1 INNER JOIN t2 ON (t1.i = t2.i);
   ```

4. Indexes over expressions. ART indexes can also be used to quickly look up expressions.

   ```sql
   CREATE TABLE integers (i INTEGER, j INTEGER);

   INSERT INTO integers VALUES (1, 1), (2, 2), (3, 3);

   -- Creates index over the i + j expression
   CREATE INDEX i_index ON integers USING ART((i + j));

   -- Uses ART index point query
   SELECT i FROM integers WHERE i + j = 2;
   ```

#### ART Storage

There are two main constraints when storing ART indexes: 

1. The index must be stored in an order that allows for lazy-loading. Otherwise, we would have to fully load the index, including nodes that might be unnecessary to queries that would be executed in that session.
2. It must not increase the node size. Otherwise, we diminish the cache-conscious effectiveness of the ART index.

##### Post-Order Traversal

To allow for lazy loading, we must store all children of a node, collect the information of where each child is stored, and then, when storing the actual node, we store the disk information of each of its children. To perform this type of operation, we do a post-order traversal.

The post-order traversal is shown in the figure below. The circles in red represent the numeric order where the nodes will be stored. If we start from the root node (i.e., Node 4 with storage order 10), we must first store both children (i.e., Node 16 with storage order 8 and the Leaf with storage order 9). This goes on recursively for each of its children.

![](../images/blog/ART/serialization-order.png)


The figure below shows an actual representation of what this would look like in DuckDB's block format. In DuckDB, data is stored in 256 kB contiguous blocks, with some blocks reserved for metadata and some for actual data. Each block is represented by an `id`. To allow for navigation within a block, they are partitioned by byte offsets hence each block contains 256,000 different offsets

![](../images/blog/ART/block-storage.png)


In this example, we have `Block 0` that stored some of our database metadata. In particular, between offsets 100,000 and 100,200 we store information pertinent to one ART index. This will store information on the index (e.g., name, constraints, expression) and the `<Block,Offset>` position of its root node.

For example, let's assume we are doing a lookup of the key with `row_ids` stored in the Leaf with storage order 1. We would start by loading the Art Root node on `<Block:2, Offset:220>`, by checking the keys stored in that Node, we would then see we must load the Node 16 at `<Block:2, Offset:140>`, and then finally our Leaf at `<Block:0, Offset:0>`. That means that for this lookup, only these 3 nodes were loaded into memory. Subsequent access to these nodes would only require memory access, while access to different nodes (e.g., Leaf storage order 2) would still result in disk access.

One major problem with implementing this (de)serialization process is that now we not only have to keep information about the memory address of pointers but also if they are already in memory and if not, what's the `<Block,Offset>` position they are stored.

If we stored the Block Id and Offset in new variables, it would dramatically increase the ART node sizes, diminishing its effectiveness as a cache-conscious data structure.

Take Node 256 as an example. The cost of holding 256 pointers is 2048 bytes (256 pointers * 8 bytes). Let's say we decide to store the Block Information on a new array like the following: 

```cpp
struct BlockPointer { 
    uint32_t block_id;
    uint32_t offset;
}

class Node256 : public Node  {
    // Pointers to the child nodes
    Node* children[256];
    BlockPointer block_info[256];
}
```

Node 256 would increase 2048 bytes (256 * (4+4)), causing it to double its current size to 4096 bytes.

##### Pointer Swizzling

To avoid the increase in sizes of ART nodes, we decided to implement [Swizzlable Pointers](https://en.wikipedia.org/wiki/Pointer_swizzling) and use them instead of regular pointers.

The idea is that we don't need all 64 bits (i.e., 48 bits give you an address space of 256 terabyte, supporting any of the current architectures, see the [related discussion on Stack Overflow](https://stackoverflow.com/questions/6716946/why-do-x86-64-systems-have-only-a-48-bit-virtual-address-space) and the [“64-bit computing” Wikipedia page](https://en.wikipedia.org/wiki/64-bit_computing)) in a pointer to point to a memory address. Hence we can use the most significant bit as a flag (i.e., the Swizzle Flag). 
If the swizzle flag is set, the value in our Swizzlable Pointer is a memory address for the Node. Otherwise, the variable stores the Block Information of where the Node is stored. In the latter case, we use the following 31 bits to store the Block ID and the remaining 32 bits to store the offset.

In the following figure, you can see a visual representation of DuckDB's Swizzlable Pointer.

![](../images/blog/ART/pointer-swizzling.png)


#### Benchmarks

To evaluate the benefits and disadvantages of our current storage implementation, we run a benchmark (Available at this [Colab Link](https://colab.research.google.com/drive/1lidiFNswQfxdmYlsufXUT80IFpyluEF3?usp=sharing)), where we create a table containing 50,000,000 integral elements with a primary key constraint on top of them. 

```python
con = duckdb.connect("vault.db") 
con.execute("CREATE TABLE integers (x INTEGER PRIMARY KEY)")
con.execute("INSERT INTO integers SELECT * FROM range(50000000)")
```

We run this benchmark on two different versions of DuckDB, one where the index is not stored (i.e., v0.4.0), which means it is always in memory and fully reconstructed at a database restart, and another one where the index is stored (i.e., bleeding-edge version), using the lazy-loading technique described previously.

##### Storing Time

We first measure the additional cost of serializing our index.

```python
cur_time = time.time()
con.close()
print("Storage time: " + str(time.time() - cur_time))
```

Storage Time

| Name           | Time (s) |
| -------------- | -------: |
| Reconstruction |     8.99 |
| Storage        |    18.97 |


We can see storing the index is 2x more expensive than not storing the index. The reason is that our table consists of one column with 50,000,000 `int32_t` values. However, when storing the ART, we also store 50,000,000 `int64_t` values for their respective `row_ids` in the leaves. This increase in the elements is the main reason for the additional storage cost.


##### Load Time

We now measure the loading time of restarting our database.

```python
cur_time = time.time()
con = duckdb.connect("vault.db") 
print("Load time: " + str(time.time() - cur_time))
```

| Name           | Time (s) |
| -------------- | -------: |
| Reconstruction |     7.75 |
| Storage        |     0.06 |

Here we can see a two-order magnitude difference in the times of loading the database. This difference is basically due to the complete reconstruction of the ART index during loading. In contrast, in the `Storage` version, only the metadata information about the ART index is loaded at this point.


##### Query Time (Cold)

We now measure the cold query time (i.e., the Database has just been restarted, which means that in the `Storage` version, the index does not exist in memory yet.) of running point queries on our index. We run 5000 point queries, equally spaced on 10000 elements in our distribution. We use this value to always force the point queries to load a large number of unused nodes.

```python
times = []
for i in range (0, 50000000, 10000):
  cur_time = time.time()
  con.execute("SELECT x FROM integers WHERE x = " + str(i))
  times.append(time.time() - cur_time)
```

![](../images/blog/ART/cold-run-light.png)



In general, each query is 3x more expensive in the persisted storage format. This is due to two main reasons:
1) Creating the nodes. In the storage version, we do create the nodes lazily, which means that, for each node, all parameters must be allocated, and values like keys and prefixes are loaded. 
2) Block Pinning. At every node, we must pin and unpin the blocks where they are stored.

##### Query Time (Hot)

In this experiment, we execute the same queries as in the previous section.

![](../images/blog/ART/hot-run-light.png)



The times in both versions are comparable since all the nodes in the storage version are already set in memory.
In conclusion, when stored indexes are in active use, they present similar performance to fully in-memory indexes.

#### Future Work

ART index storage has been a long-standing issue in DuckDB, with multiple users claiming it a missing feature that created an impediment for them to use DuckDB. Although now storing and lazily loading ART indexes is possible, there are many future paths we can still pursue to make the ART-Index more performant. Here I list what I believe are the most important next steps:
1. Caching Pinned Blocks. In our current implementation, blocks are constantly pinned and unpinned, even though blocks can store multiple nodes and are most likely reused continuously through lookups. Smartly caching them will result in drastic savings for queries that trigger node loading.
2. Bulk Loading. Our ART-Index currently does not support bulk loading. This means that nodes will be constantly resized when creating an index over a column since elements will be inserted one by one. If we bulk-load the data, we can know exactly which Nodes we must create for that dataset, hence avoiding these frequent resizes.
3. Bulk Insertion. When performing bulk insertion, a similar problem as bulk-loading would happen. A possible solution would be to create a new ART index with Bulk-Loading and then merge it with the existing Art Index
4. Vectorized Lookups/Inserts. DuckDB utilized a vectorized execution engine. However, both our ART lookups and inserts still follow a tuple-at-a-time model.
5. Updatable Index Storage. In our current implementation, ART-Indexes are fully invalidated from disk and stored again. This causes an unnecessary increase in storage time on subsequent storage. Updating nodes directly into disk instead of entirely rewriting the index could drastically decrease future storage costs. In other words, indexes are constantly completely stored at every checkpoint.
6. Combined Pointer/Row Id Leaves. Our current leaf node format allows for storing values that are repeated over multiple tuples. However, since ART indexes are commonly used to keep unique key constraints (e.g., Primary Keys), and a unique `row_id` fits in the same pointer size space, a lot of space can be saved by using the child pointers to point to the actual `row_id` instead of creating an actual leaf node that only stores one `row_id`.

#### Roadmap

> It's tough to make predictions, especially about the future  
> – Yogi Berra

Art Indexes are a core part of both constraint enforcement and keeping access speed up in DuckDB. And as depicted in the previous section, there are many distinct paths we can take in our bag of ART goodies, with advantages for completely different use cases.

![](../images/blog/ART/want.jpg)


To better understand how our indexes are used, it would be extremely helpful if you could answer the following [survey](https://forms.gle/eSboTEp9qpP7ybz98) created by one of our MSc students.

## Querying Postgres Tables Directly from DuckDB

**Publication date:** 2022-09-30

**Author:** Hannes Mühleisen

**TL;DR:** DuckDB can now directly query tables stored in PostgreSQL and speed up complex analytical queries without duplicating data.

![](../images/blog/elephant-duck.jpg)


#### Introduction

PostgreSQL is the world's most advanced open source database ([self-proclaimed](https://www.postgresql.org)). From its [interesting beginnings as an academic DBMS](https://dsf.berkeley.edu/papers/ERL-M90-34.pdf), it has evolved over the past 30 years into a fundamental workhorse of our digital environment. 

PostgreSQL is designed for traditional [transactional use cases, "OLTP"](https://en.wikipedia.org/wiki/Online_transaction_processing), where rows in tables are created, updated and removed concurrently, and it excels at this. But this design decision makes PostgreSQL far less suitable for [analytical use cases, "OLAP"](https://en.wikipedia.org/wiki/Online_analytical_processing), where large chunks of tables are read to create summaries of the stored data. Yet there are many use cases where both transactional and analytical use cases are important, for example when trying to gain the latest business intelligence insights into transactional data.

There have been [some attempts to build database management systems that do well on both workloads, "HTAP"](https://en.wikipedia.org/wiki/Hybrid_transactional/analytical_processing), but in general many design decisions between OLTP and OLAP systems are hard trade-offs, making this endeavour difficult. Accepting that [one size does not fit all after all](https://cs.brown.edu/~ugur/fits_all.pdf), systems are often separated, with the transactional application data living in a purpose-built system like PostgreSQL, and a copy of the data being stored in an entirely different DBMS. Using a purpose-built analytical system speeds up analytical queries by several orders of magnitude.

Unfortunately, maintaining a copy of the data for analytical purposes can be problematic: The copy will immediately be outdated as new transactions are processed, requiring a complex and non-trivial synchronization setup. Storing two copies of the database also will require twice the storage space. For example, OLTP systems like PostgreSQL traditionally use a row-based data representation, and OLAP systems tend to favor a chunked-columnar data representation. You can't have both without maintaining a copy of the data with all the issues that brings with it. Also, the SQL syntaxes between whatever OLAP system you're using and Postgres may differ quite significantly.

But the design space is not as black and white as it seems. For example, the OLAP performance in systems like DuckDB does not only come from a chunked-columnar on-disk data representation. Much of DuckDB's performance comes from its vectorized query processing engine that is custom-tuned for analytical queries. What if DuckDB was able to somehow *read data stored in PostgreSQL*? While it seems daunting, we have embarked on a quest to make just this possible.

 To allow for fast and consistent analytical reads of Postgres databases, we designed and implemented the "Postgres Scanner". This scanner leverages the *binary transfer mode* of the Postgres client-server protocol (See the [Implementation Section](#::implementation) for more details.), allowing us to efficiently transform and use the data directly in DuckDB.

Among other things, DuckDB's design is different from conventional data management systems because DuckDB's query processing engine can run on nearly arbitrary data sources without needing to copy the data into its own storage format. For example, DuckDB can currently directly run queries on [Parquet files](#docs:lts:data:parquet:overview), [CSV files](#docs:lts:data:csv:overview), [SQLite files](https://github.com/duckdb/duckdb-sqlite), [Pandas](#docs:lts:guides:python:sql_on_pandas), [R](#docs:lts:clients:r::efficient-transfer) and [Julia](#docs:lts:clients:julia::scanning-dataframes) data frames as well as [Apache Arrow sources](#docs:lts:guides:python:sql_on_arrow). This new extension adds the capability to directly query PostgreSQL tables from DuckDB.

#### Usage

The Postgres Scanner DuckDB extension source code [is available on GitHub](https://github.com/duckdb/duckdb-postgres), but it is directly installable through DuckDB's new binary extension installation mechanism. To install, just run the following SQL query once:

```sql
INSTALL postgres_scanner;
```

Then, whenever you want to use the extension, you need to first load it:

```sql
LOAD postgres_scanner;
```

To make a Postgres database accessible to DuckDB, use the `POSTGRES_ATTACH` command:

```sql
CALL postgres_attach('dbname=myshinydb');
```

`postgres_attach` takes a single required string parameter, which is the [`libpq` connection string](https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING). For example you can pass `'dbname=myshinydb'` to select a different database name. In the simplest case, the parameter is just `''`. There are three additional named parameters to the function:

* `source_schema` the name of a non-standard schema name in Postgres to get tables from. Default is `public`.
* `overwrite` whether we should overwrite existing views in the target schema, default is `false`.
* `filter_pushdown` whether filter predicates that DuckDB derives from the query should be forwarded to Postgres, defaults to `false`. See below for a discussion of what this parameter controls.

The tables in the database are registered as views in DuckDB, you can list them with

```sql
PRAGMA show_tables;
```

Then you can query those views normally using SQL. Again, no data is being copied, this is just a virtual view on the tables in your Postgres database. 

If you prefer to not attach all tables, but just query a single table, that is possible using the `POSTGRES_SCAN` and `POSTGRES_SCAN_PUSHDOWN` table-producing functions directly, e.g.

```sql
SELECT * FROM postgres_scan('dbname=myshinydb', 'public', 'mytable');
SELECT * FROM postgres_scan_pushdown('dbname=myshinydb', 'public', 'mytable');
```

Both functions take three unnamed string parameters, the `libpq` connection string (see above), a Postgres schema name and a table name. The schema name is often `public`. As the name suggests, the variant with "pushdown" in the name will perform selection pushdown as described below.

The Postgres scanner will only be able to read actual tables, views are not supported. However, you can of course recreate such views within DuckDB, the syntax should be exactly the same!

#### Implementation

From an architectural perspective, the Postgres Scanner is implemented as a plug-in extension for DuckDB that provides a so-called table scan function (` postgres_scan`) in DuckDB. There are many such functions in DuckDB and in extensions, such as the Parquet and CSV readers, Arrow readers etc. 

The Postgres Scanner uses the standard `libpq` library, which it statically links in. Ironically, this makes the Postgres Scanner easier to install than the other Postgres clients. However, Postgres' normal client-server protocol is [quite slow](https://ir.cwi.nl/pub/26415/p852-muehleisen.pdf), so we spent quite some time optimizing this. As a note, DuckDB's [SQLite Scanner](https://github.com/duckdb/sqlite_scanner) does not face this issue, as SQLite is also an in-process database.

We actually implemented a prototype direct reader for Postgres' database files, but while performance was great, there is the issue that committed but not yet checkpointed data would not be stored in the heap files yet. In addition, if a checkpoint was currently running, our reader would frequently overtake the checkpointer, causing additional inconsistencies. We abandoned that approach since we want to be able to query an actively used Postgres database and believe that consistency is important. Another architectural option would have been to implement a DuckDB Foreign Data Wrapper (FDW) for Postgres similar to [duckdb_fdw](https://github.com/alitrack/duckdb_fdw) but while this could improve the protocol situation, deployment of a postgres extension is quite risky on production servers so we expect few people will be able to do so.

Instead, we use the rarely-used *binary transfer mode* of the Postgres client-server protocol. This format is quite similar to the on-disk representation of Postgres data files and avoids some of the otherwise expensive to-string and from-string conversions. For example, to read a normal `int32` from the protocol message, all we need to do is to swap byte order ([`ntohl`](https://linux.die.net/man/3/ntohl)).

The Postgres scanner connects to PostgreSQL and issues a query to read a particular table using the binary protocol. In the simplest case (see optimizations below), to read a table called `lineitem`, we internally run the query:

```sql
COPY (SELECT * FROM lineitem) TO STDOUT (FORMAT binary);
```

This query will start reading the contents of `lineitem` and write them directly to the protocol stream in binary format.

##### Parallelization

DuckDB supports automatic intra-query parallelization through pipeline parallelism, so we also want to parallelize scans on Postgres tables: Our scan operator opens multiple connections to Postgres, and reads subsets of the table from each. To efficiently split up reading the table, we use Postgres' rather obscure *TID Scan* (Tuple ID) operator, which allows a query to surgically read a specified range of tuple IDs from a table. The Tuple IDs have the form `(page, tuple)`. We parallelize our scan of a Postgres table based on database page ranges expressed in TIDs. Each scan task reads 1000 pages currently. For example, to read a table consisting of 2500 pages, we would start three scan tasks with TID ranges `[(0,0),(999,0)]`, `[(1000,0),(1999,0)]` and `[(2000,0),(UINT32_MAX,0)]`. Having an open bound for the last range is important because the number of pages (` relpages`) in a table in the `pg_class` table is merely an estimate. For a given page range (P_MIN, P_MAX), our query from above is thus extended to look like this:

```sql
COPY (
   SELECT 
     * 
   FROM lineitem 
   WHERE 
     ctid BETWEEN '(P_MIN,0)'::tid AND '(P_MAX,0)'::tid
   ) TO STDOUT (FORMAT binary);
```

This way, we can efficiently scan the table in parallel while not relying on the schema in any way. Because page size is fixed in Postgres, this also has the added bonus of equalizing the effort to read a subset of the page independent of the number of columns in each row. 

"But wait!", you will say, according to the documentation the tuple ID is not stable and may be changed by operations such as `VACUUM ALL`. How can you use it for synchronizing parallel scans? This is true, and could be problematic, but we found a solution: 

##### Transactional Synchronization

Of course a transactional database such as Postgres is expected to run transactions while we run our table scans for analytical purposes. Therefore we need to address concurrent changes to the table we are scanning in parallel. We solve this by first creating a new read-only transaction in DuckDB's bind phase, where query planning happens. We leave this transaction running until we are completely done reading the table. We use yet another little-known Postgres feature, `pg_export_snapshot()`, which allows us to get the current transaction context in one connection, and then import it into our parallel read connections using `SET TRANSACTION SNAPSHOT ...`. This way, all connections related to one single table scan will see the table state exactly as it appeared at the very beginning of our scan throughout the potentially lengthy read process.

##### Projection and Selection Push-Down

DuckDB's query optimizer moves selections (filters on rows) and projections (removal of unused columns) as low as possible in the query plan (push down), and even instructs the lowermost scan operators to perform those operations if they support them. For the Postgres scanner, we have implemented both push down variants. Projections are rather straightforward – we can immediately instruct Postgres to only retrieve the columns the query is using. This of course also reduces the number of bytes that need to be transferred, which speeds up queries. For selections, we construct a SQL filter expression from the pushed down filters. For example, if we run a query like `SELECT l_returnflag, l_linestatus FROM lineitem WHERE l_shipdate < '1998-09-02'` through the Postgres scanner, it would run the following queries:

```sql
COPY (
  SELECT 
    "l_returnflag",
    "l_linestatus" 
  FROM "public"."lineitem" 
  WHERE 
    ctid BETWEEN '(0,0)'::tid AND '(1000,0)'::tid AND 
    ("l_shipdate" < '1998-09-02' AND "l_shipdate" IS NOT NULL)
  ) TO STDOUT (FORMAT BINARY);
-- and so on
```

As you can see, the projection and selection pushdown has expanded the queries ran against Postgres accordingly. Using the selection push-down is optional. There may be cases where running a filter in Postgres is actually slower than transferring the data and running the filter in DuckDB, for example when filters are not very selective (many rows match).

#### Performance

To investigate the performance of the Postgres Scanner, we ran the well-known TPC-H benchmark on DuckDB using its internal storage format, on Postgres also using its internal format and with DuckDB reading from Postgres using the new Postgres Scanner. We used DuckDB 0.5.1 and Postgres 14.5, all experiments were run on a MacBook Pro with an M1 Max CPU. The experiment script [is available](https://gist.github.com/hannes/d2f0914a8e0ed0fb235040b9981c58a7). We run "scale factor" 1 of TPCH, creating a dataset of roughly 1 GB with ca. 6 M rows in the biggest table, `lineitem`. Each of the 22 TPC-H benchmark queries was run 5 times, and we report the median run time in seconds. The time breakdown is given in the following table. 

| query | duckdb | duckdb/postgres | postgres |
| :---- | -----: | --------------: | -------: |
| 1     |   0.03 |            0.74 |     1.12 |
| 2     |   0.01 |            0.20 |     0.18 |
| 3     |   0.02 |            0.55 |     0.21 |
| 4     |   0.03 |            0.52 |     0.11 |
| 5     |   0.02 |            0.70 |     0.13 |
| 6     |   0.01 |            0.24 |     0.21 |
| 7     |   0.04 |            0.56 |     0.20 |
| 8     |   0.02 |            0.74 |     0.18 |
| 9     |   0.05 |            1.34 |     0.61 |
| 10    |   0.04 |            0.41 |     0.35 |
| 11    |   0.01 |            0.15 |     0.07 |
| 12    |   0.01 |            0.27 |     0.36 |
| 13    |   0.04 |            0.18 |     0.32 |
| 14    |   0.01 |            0.19 |     0.21 |
| 15    |   0.03 |            0.36 |     0.46 |
| 16    |   0.03 |            0.09 |     0.12 |
| 17    |   0.05 |            0.75 |  > 60.00 |
| 18    |   0.08 |            0.97 |     1.05 |
| 19    |   0.03 |            0.32 |     0.31 |
| 20    |   0.05 |            0.37 |  > 60.00 |
| 21    |   0.09 |            1.53 |     0.35 |
| 22    |   0.03 |            0.15 |     0.15 |

Stock Postgres is not able to finish queries 17 and 20 within a one-minute timeout because of correlated subqueries containing a query on the lineitem table. For the other queries, we can see that DuckDB with the Postgres Scanner not only finished all queries, it also was faster than stock Postgres on roughly half of them, which is astonishing given that DuckDB has to read its input data from Postgres through the client/server protocol as described above. Of course, stock DuckDB is still 10× faster with its own storage, but as discussed at the very beginning of this post this requires the data to be imported there first. 

#### Other Use Cases

The Postgres Scanner can also be used to combine live Postgres data with pre-cached data in creative ways. This is especially effective when dealing with an append only table, but could also be used if a modified date column is present. Consider the following SQL template:

```sql
INSERT INTO my_table_duckdb_cache
SELECT * FROM postgres_scan('dbname=myshinydb', 'public', 'my_table') 
WHERE incrementing_id_column > (SELECT max(incrementing_id_column) FROM my_table_duckdb_cache);

SELECT * FROM my_table_duckdb_cache;
```

This provides faster query performance with fully up to date query results, at the cost of data duplication. It also avoids complex data replication technologies.

DuckDB has built-in support to write query results to Parquet files. The Postgres scanner provides a rather simple way to write Postgres tables to Parquet files, it can even directly write to S3 if desired. For example,

```sql
COPY (SELECT * FROM postgres_scan('dbname=myshinydb', 'public', 'lineitem')) TO 'lineitem.parquet' (FORMAT parquet);
```

#### Conclusion

DuckDB's new Postgres Scanner extension can read PostgreSQL's tables while PostgreSQL is running and compute the answers to complex OLAP SQL queries often faster than PostgreSQL itself can without the need to duplicate data. The Postgres Scanner is currently in preview and we are curious to hear what you think.
If you find any issues with the Postgres Scanner, please [report them](https://github.com/duckdb/duckdb-postgres/issues).

## Modern Data Stack in a Box with DuckDB

**Publication date:** 2022-10-12

**Author:** Guest post by Jacob Matson

**TL;DR:** A fast, free, and open-source Modern Data Stack (MDS) can now be fully deployed on your laptop or to a single machine using the combination of DuckDB, [Meltano](https://meltano.com/), [dbt](https://www.getdbt.com/), and [Apache Superset](https://superset.apache.org/).


![](../images/blog/mds_in_a_box/rubber_duck_on_a_box.jpg)


This post is a collaboration with [Jacob Matson](https://github.com/matsonj) and cross-posted on [dataduel.co](https://www.dataduel.co/modern-data-stack-in-a-box-with-duckdb/).

#### Summary

There is a large volume of literature, e.g., [1](https://www.startdataengineering.com/post/scale-data-pipelines/) and [2](https://towardsdatascience.com/scaling-data-products-delivery-using-domain-oriented-data-pipelines-869ca9461892), about scaling data pipelines. “Use Kafka! Build a lake house! Don't build a lake house, use Snowflake! Don't use Snowflake, use XYZ!” However, with advances in hardware and the rapid maturation of data software, there is a simpler approach. This article will light up the path to highly performant single node analytics with an MDS-in-a-box open source stack: Meltano, DuckDB, dbt, & Apache Superset on Windows using Windows Subsystem for Linux (WSL). There are many options within the MDS, so if you are using another stack to build an MDS-in-a-box, please share it with the community on the DuckDB [Twitter](https://twitter.com/duckdb?s=20&t=yBKUNLGHVZGEj1jL-P_PsQ), [GitHub](https://github.com/duckdb/duckdb/discussions), or [Discord](https://discord.com/invite/tcvwpjfnZx), or the [dbt slack](https://www.getdbt.com/community/join-the-community/)! Or just stop by for a friendly debate about our choice of tools!

#### Motivation

What is the Modern Data Stack, and why use it? The MDS can mean many things (see [examples](https://www.moderndatastack.xyz/stacks) and a [historical perspective](https://www.getdbt.com/blog/future-of-the-modern-data-stack/)), but fundamentally it is a return to using SQL for data transformations by combining multiple best-in-class software tools to form a stack. A typical stack would include (at least!) a tool to extract data from sources and load it into a data warehouse, dbt to transform and analyze that data in the warehouse, and a business intelligence tool. The MDS leverages the accessibility of SQL in combination with software development best practices like git to enable analysts to scale their impact across their companies.

Why build a bundled Modern Data Stack on a single machine, rather than on multiple machines and on a data warehouse? There are many advantages!
* Simplify for higher developer productivity
* Reduce costs by removing the data warehouse
* Deploy with ease either locally, on-premise, in the cloud, or all 3
* Eliminate software expenses with a fully free and open-source stack
* Maintain high performance with modern software like DuckDB and increasingly powerful single-node compute instances
* Achieve self-sufficiency by completing an end-to-end proof of concept on your laptop
* Enable development best practices by integrating with GitHub
* Enhance security by (optionally) running entirely locally or on-premise

If you contribute to an open-source community or provide a product within the Modern Data Stack, there is an additional benefit!
* Increase adoption of your tool by providing a free and self-contained example stack
    * [Dagster's example project](https://github.com/dagster-io/dagster/blob/master/examples/project_fully_featured/README.md) uses DuckDB for this already!
    * Reach out on the DuckDB [Twitter](https://twitter.com/duckdb?s=20&t=yBKUNLGHVZGEj1jL-P_PsQ), [GitHub](https://github.com/duckdb/duckdb/discussions), or [Discord](https://discord.com/invite/tcvwpjfnZx), or the [dbt slack](https://www.getdbt.com/community/join-the-community/) to share an example using your tool with the community!

#### Trade-offs

One key component of the MDS is the unlimited scalability of compute. How does that align with the MDS-in-a-box approach? Today, cloud computing instances can vertically scale significantly more than in the past (for example, [224 cores and 24 TB of RAM on AWS](https://aws.amazon.com/ec2/instance-types/high-memory/)!). Laptops are more powerful than ever. Now that new OLAP tools like DuckDB can take better advantage of that compute, horizontal scaling is no longer necessary for many analyses! Also, this MDS-in-a-box can be duplicated with ease to as many boxes as needed if partitioned by data subject area. So, while infinite compute is sacrificed, significant scale is still easily achievable.

Due to this tradeoff, this approach is more of an “Open Source Analytics Stack in a box” than a traditional MDS. It sacrifices infinite scale for significant simplification and the other benefits above.

#### Choosing a Problem

Given that the NBA season is starting soon, a monte carlo type simulation of the season is both topical and well-suited for analytical SQL. This is a particularly great scenario to test the limits of DuckDB because it only requires simple inputs and easily scales out to massive numbers of records. This entire project is held in a GitHub repo, which you can find on [GitHub](https://github.com/matsonj/nba-monte-carlo).

#### Building the Environment

The detailed steps to build the project can be found in the repo, but the high-level steps will be repeated here. As a note, Windows Subsystem for Linux (WSL) was chosen to support Apache Superset, but the other components of this stack can run directly on any operating system. Thankfully, using Linux on Windows has become very straightforward.

1. Install Ubuntu 20.04 on WSL.
1. Upgrade your packages (` sudo apt update`).
1. Install python.
1. Clone the git repo.
1. Run `make build` and then `make run` in the terminal.
1. Create super admin user for Superset in the terminal, then login and configure the database.
1. Run test queries in superset to check your work.

#### Meltano as a Wrapper for Pipeline Plugins

In this example, [Meltano](https://meltano.com/) pulls together multiple bits and pieces to allow the pipeline to be run with a single statement. The first part is the tap (extractor) which is '[tap-spreadsheets-anywhere](https://hub.meltano.com/extractors/tap-spreadsheets-anywhere/)'. This tap allows us to get flat data files from various sources. It should be noted that DuckDB can consume directly from flat files (locally and over the network), or SQLite and PostgreSQL databases. However, this tap was chosen to provide a clear example of getting static data into your database that can easily be configured in the meltano.yml file. Meltano also becomes more beneficial as the complexity of your data sources increases.

```yaml
plugins:
  extractors:
  - name: tap-spreadsheets-anywhere
    variant: ets
    pip_url: git+https://github.com/ets/tap-spreadsheets-anywhere.git
# data sources are configured inside of this extractor
```

The next bit is the target (loader), '[target-duckdb](https://github.com/jwills/target-duckdb)'. This target can take data from any Meltano tap and load it into DuckDB. Part of the beauty of this approach is that you don't have to mess with all the extra complexity that comes with a typical database. DuckDB can be dropped in and is ready to go with zero configuration or ongoing maintenance. Furthermore, because the components and the data are co-located, networking is not a consideration and further reduces complexity.

```yaml
  loaders:
  - name: target-duckdb
    variant: jwills
    pip_url: target-duckdb~=0.4
    config:
      filepath: /tmp/mdsbox.db
      default_target_schema: main
```

Next is the transformer: '[dbt-duckdb](https://github.com/jwills/dbt-duckdb)'. dbt enables transformations using a combination of SQL and Jinja templating for approachable SQL-based analytics engineering. The dbt adapter for DuckDB now supports parallel execution across threads, which makes the MDS-in-a-box run even faster. Since the bulk of the work is happening inside of dbt, this portion will be described in detail later in the post.

```yaml
  transformers:
  - name: dbt-duckdb
    variant: jwills
    pip_url: dbt-core~=1.2.0 dbt-duckdb~=1.2.0
    config:
      path: /tmp/mdsbox.db
```

Lastly, [Apache Superset](https://superset.apache.org/) is included as a [Meltano utility](https://hub.meltano.com/utilities/superset/) to enable some data querying and visualization. Superset leverages DuckDB's SQLAlchemy driver, [duckdb_engine](https://github.com/Mause/duckdb_engine), so it can query DuckDB directly as well. 

```yaml
  utilities:
  - name: superset
    variant: apache
    pip_url: apache-superset==1.5.0 markupsafe==2.0.1 duckdb-engine==0.6.4
```

With Superset, the engine needs to be configured to open DuckDB in “read-only” mode. Otherwise, only one query can run at a time (simultaneous queries will cause locks). This also prevents refreshing the Superset dashboard while the pipeline is running. In this case, the pipeline runs in under 8 seconds!

#### Wrangling the Data

The NBA schedule was downloaded from basketball-reference.com, and the Draft Kings win totals from Sept 27th were used for win totals. The schedule and win totals make up the entirety of the data required as inputs for this project. Once converted into CSV format, they were uploaded to the GitHub project, and the meltano.yml file was updated to reference the file locations.

#### Loading Sources

Once the data is on the web inside of GitHub, Meltano can pull a copy down into DuckDB. With the command `meltano run tap-spreadsheets-anywhere target-duckdb`, the data is loaded into DuckDB, and ready for transformation inside of dbt.

#### Building dbt Models

After the sources are loaded, the data is transformed with dbt. First, the source models are created as well as the scenario generator. Then the random numbers for that simulation run are generated – it should be noted that the random numbers are recorded as a table, not a view, in order to allow subsequent re-runs of the downstream models with the graph operators for troubleshooting purposes (i.e., `dbt run -s random_num_gen+`). Once the underlying data is laid out, the simulation begins, first by simulating the regular season, then the play-in games, and lastly the playoffs. Since each round of games has a dependency on the previous round, parallelization is limited in this model, which is reflected in the [dbt DAG](https://matsonj.github.io/nba-monte-carlo/#!/overview/nba_monte_carlo?g_v=1), in this case conveniently hosted on GitHub Pages.

There are a few more design choices worth calling out:
1. Simulation tables and summary tables were split into separate models for ease of use / transparency. So each round of the simulation has a sim model and an end model – this allows visibility into the correct parameters (conference, team, elo rating) to be passed into each subsequent round.
1. To prevent overly deep queries, 'reg_season_end' and 'playoff_sim_r1' have been materialized as tables. While it is slightly slower on build, the performance gains when querying summary tables (i.e., 'season_summary') are more than worth the slowdown. However, it should be noted that even for only 10k sims, the database takes up about 150 MB in disk space. Running at 100k simulations easily expands it to a few GB.

#### Connecting Superset

Once the dbt models are built, the data visualization can begin. An admin user must be created in superset in order to log in. The instructions for connecting the database can be found in the GitHub project, as well as a note on how to connect it in 'read only mode'.

There are 2 models designed for analysis, although any number of them can be used. 'season_summary' contains various summary statistics for the season, and 'reg_season_sim' contains all simulated game results. This second data set produces an interesting histogram chart. In order to build data visualizations in superset, the dataset must be defined first, the chart built, and lastly, the chart assigned to a dashboard.

Below is an example Superset dashboard containing several charts based on this data. Superset is able to clearly summarize the data as well as display the level of variability within the monte carlo simulation. The duckdb_engine queries can be refreshed quickly when new simulations are run.

![](../images/blog/mds_in_a_box/mds_in_a_box_superset_1.png)


![](../images/blog/mds_in_a_box/mds_in_a_box_superset_2.png)


#### Conclusion

The ecosystem around DuckDB has grown such that it integrates well with the Modern Data Stack. The MDS-in-a-box is a viable approach for smaller data projects, and would work especially well for read-heavy analytics. There were a few other learnings from this experiment. Superset dashboards are easy to construct, but they are not scriptable and must be built in the GUI (the paid hosted version, Preset, does support exporting as YAML). Also, while you can do monte carlo analysis in SQL, it may be easier to do in another language. However, this shows how far you can stretch the capabilities of SQL!

#### Next Steps

There are additional directions to take this project. One next step could be to Dockerize this workflow for even easier deployments. If you want to put together a Docker example, please reach out! Another adjustment to the approach could be to land the final outputs in Parquet files, and to read them with in-memory DuckDB connections. Those files could even be landed in an S3-compatible object store (and still read by DuckDB), although that adds complexity compared with the in-a-box approach! Additional MDS components could also be integrated for data quality monitoring, lineage tracking, etc. 

Josh Wills is also in the process of making [an interesting enhancement to dbt-duckdb](https://github.com/jwills/dbt-duckdb/pull/22)! Using the [sqlglot](https://github.com/tobymao/sqlglot) library, dbt-duckdb would be able to automatically transpile dbt models written using the SQL dialect of other databases (including Snowflake and BigQuery) to DuckDB. Imagine if you could test out your queries locally before pushing to production... Join the DuckDB channel of the [dbt slack](https://www.getdbt.com/community/join-the-community/) to discuss the possibilities!

Please reach out if you use this or another approach to build an MDS-in-a-box! Also, if you are interested in writing a guest post for the DuckDB blog, please reach out on [Discord](https://discord.com/invite/tcvwpjfnZx)!

## Lightweight Compression in DuckDB

**Publication date:** 2022-10-28

**Author:** Mark Raasveldt

**TL;DR:** DuckDB supports efficient lightweight compression that is automatically used to keep data size down without incurring high costs for compression and decompression.

![](../images/compression/matroshka-duck.png)


When working with large amounts of data, compression is critical for reducing storage size and egress costs. Compression algorithms typically reduce data set size by **75-95%**, depending on how compressible the data is. Compression not only reduces the storage footprint of a data set, but also often **improves performance** as less data has to be read from disk or over a network connection.

Column store formats, such as DuckDB's native file format or [Parquet](https://duckdb.org/2021/06/25/querying-parquet), benefit especially from compression. That is because data within an individual column is generally very similar, which can be exploited effectively by compression algorithms. Storing data in row-wise format results in interleaving of data of different columns, leading to lower compression rates.

DuckDB added support for compression [at the end of last year](https://github.com/duckdb/duckdb/pull/2099). As shown in the table below, the compression ratio of DuckDB has continuously improved since then and is still actively being improved. In this blog post, we discuss how compression in DuckDB works, and the design choices and various trade-offs that we have made while implementing compression for DuckDB's storage format.

| Version                |    Taxi | On&nbsp;Time | `lineitem` | Notes          | Date           |
| :--------------------- | ------: | -----------: | ---------: | :------------- | :------------- |
| DuckDB v0.2.8          | 15.3 GB |      1.73 GB |    0.85 GB | Uncompressed   | July 2021      |
| DuckDB v0.2.9          | 11.2 GB |      1.25 GB |    0.79 GB | RLE + Constant | September 2021 |
| DuckDB v0.3.2          | 10.8 GB |      0.98 GB |    0.56 GB | Bitpacking     | February 2022  |
| DuckDB v0.3.3          |  6.9 GB |      0.23 GB |    0.32 GB | Dictionary     | April 2022     |
| DuckDB v0.5.0          |  6.6 GB |      0.21 GB |    0.29 GB | FOR            | September 2022 |
| DuckDB dev             |  4.8 GB |      0.21 GB |    0.17 GB | FSST + Chimp   | `now()`        |
| CSV                    | 17.0 GB |      1.11 GB |    0.72 GB |                |                |
| Parquet (Uncompressed) |  4.5 GB |      0.12 GB |    0.31 GB |                |                |
| Parquet (Snappy)       |  3.2 GB |      0.11 GB |    0.18 GB |                |                |
| Parquet (Zstd)         |  2.6 GB |      0.08 GB |    0.15 GB |                |                |

#### Compression Intro

At its core, compression algorithms try to find patterns in a data set in order to store it more cleverly. **Compressibility** of a data set is therefore dependent on whether or not such patterns can be found, and whether they exist in the first place. Data that follows a fixed pattern can be compressed significantly. Data that does not have any patterns, such as random noise, cannot be compressed. Formally, the compressibility of a dataset is known as its [entropy](#<https:::en.wikipedia.org:wiki:Entropy_(information_theory)>).

As an example of this concept, let us consider the following two data sets.

![](../images/compression/exampledata.png)


The constant data set can be compressed by simply storing the value of the pattern and how many times the pattern repeats (e.g., `1x8`). The random noise, on the other hand, has no pattern, and is therefore not compressible.

#### General-Purpose Compression Algorithms

The compression algorithms that most people are familiar with are _general-purpose compression algorithms_, such as Zip, Gzip or Zstd. General-purpose compression algorithms work by finding patterns in bits. They are therefore agnostic to data types, and can be used on any stream of bits. They can be used to compress files, but they can also be applied to arbitrary data sent over a socket connection.

General-purpose compression is flexible and very easy to set up. There are a number of high quality libraries available (such as Zstd, Snappy or LZ4) that provide compression, and they can be applied to any data set stored in any manner.

The downside of general-purpose compression is that (de)compression is generally expensive. While this does not matter if we are reading and writing from a hard disk or over a slow internet connection, the speed of (de)compression can become a bottleneck when data is stored in RAM.

Another downside is that these libraries operate as a _black box_. They operate on streams of bits, and do not reveal information of their internal state to the user. While that is not a problem if you are only looking to decrease the size of your data, it prevents the system from taking advantage of the patterns found by the compression algorithm during execution.

Finally, general-purpose compression algorithms work better when compressing large chunks of data. As illustrated in the table below, compression ratios suffer significantly when compressing small amounts of data. To achieve a good compression ratio, blocks of at least **256 kB** must be used.

| Compression | 1 kB | 4 kB | 16 kB | 64 kB | 256 kB | 1 MB |
| ----------- | ---: | ---: | ----: | ----: | -----: | ---: |
| zstd        | 1.72 |  2.1 |  2.21 |  2.41 |   2.54 | 2.73 |
| lz4         | 1.29 |  1.5 |  1.52 |  1.58 |   1.62 | 1.64 |
| gzip        |  1.7 | 2.13 |  2.28 |  2.49 |   2.62 | 2.67 |

This is relevant because the block size is the minimum amount of data that must be decompressed when reading a single row from disk. Worse, as DuckDB compresses data on a per-column basis, the block size would be the minimum amount of data that must be decompressed per column. With a block size of 256 kB, fetching a single row could require decompressing multiple megabytes of data. This can cause queries that fetch a low number of rows, such as `SELECT * FROM tbl LIMIT 5` or `SELECT * FROM tbl WHERE id = 42` to incur significant costs, despite appearing to be very cheap on the surface.

#### Lightweight Compression Algorithms

Another option for achieving compression is to use specialized lightweight compression algorithms. These algorithms also operate by finding patterns in data. However, unlike general-purpose compression, they do not attempt to find generic patterns in bitstreams. Instead, they operate by finding **specific patterns** in data sets.

By detecting specific patterns, specialized compression algorithms can be significantly more lightweight, providing much faster compression and decompression. In addition, they can be effective on much smaller data sizes. This allows us to decompress a few rows at a time, rather than requiring large blocks of data to be decompressed at once. These specialized compression algorithms can also offer efficient support for random seeks, making data access through an index significantly faster.

Lightweight compression algorithms also provide us with more fine-grained control over the compression process. This is especially relevant for us as DuckDB's file format uses fixed-size blocks in order to avoid fragmentation for workloads involving deletes and updates. The fine-grained control allows us to fill these blocks more effectively, and avoid having to guess how much compressed data will fit into a buffer.

On the flip side, these algorithms are ineffective if the specific patterns they are designed for do not occur in the data. As a result, individually, these lightweight compression algorithms are no replacement for general-purpose algorithms. Instead, multiple specialized algorithms must be combined in order to capture many different common patterns in data sets.

#### Compression Framework

Because of the advantages described above, DuckDB uses only specialized lightweight compression algorithms. As each of these algorithms work optimally on different patterns in the data, DuckDB's compression framework must first decide on which algorithm to use to store the data of each column.

DuckDB's storage splits tables into _Row Groups_. These are groups of `120K` rows, stored in columnar chunks called _Column Segments_. This storage layout is similar to [Parquet](https://duckdb.org/2021/06/25/querying-parquet) – but with an important difference: columns are split into blocks of a fixed-size. This design decision was made because DuckDB's storage format supports in-place ACID modifications to the storage format, including deleting and updating rows, and adding and dropping columns. By partitioning data into fixed size blocks the blocks can be easily reused after they are no longer required and fragmentation is avoided.

![](../images/compression/storageformat.png)


The compression framework operates within the context of the individual _Column Segments_. It operates in two phases. First, the data in the column segment is _analyzed_. In this phase, we scan the data in the segment and find out the best compression algorithm for that particular segment. After that, the _compression_ is performed, and the compressed data is written to the blocks on disk.

While this approach requires two passes over the data within a segment, this does not incur a significant cost, as the amount of data stored in one segment is generally small enough to fit in the CPU caches. A sampling approach for the analyze step could also be considered, but in general we value choosing the best compression algorithm and reducing file size over a minor increase in compression speed.

#### Compression Algorithms

DuckDB implements several lightweight compression algorithms, and we are in the process of adding more to the system. We will go over a few of these compression algorithms and how they work in the following sections.

##### Constant Encoding

Constant encoding is the most straightforward compression algorithm in DuckDB. Constant encoding is used when every single value in a column segment is the same value. In that case, we store only that single value. This encoding is visualized below.

![](../images/compression/constant.png)


When applicable, this encoding technique leads to tremendous space savings. While it might seem like this technique is rarely applicable – in practice it occurs relatively frequently. Columns might be filled with `NULL` values, or have values that rarely change (such as e.g., a `year` column in a stream of sensor data). Because of this compression algorithm, such columns take up almost no space in DuckDB.

##### Run-Length Encoding (RLE)

[Run-length encoding](https://en.wikipedia.org/wiki/Run-length_encoding) (RLE) is a compression algorithm that takes advantage of repeated values in a dataset. Rather than storing individual values, the data set is decomposed into a pair of (value, count) tuples, where the count represents how often the value is repeated. This encoding is visualized below.

![](../images/compression/rle.png)


RLE is powerful when there are many repeating values in the data. This might occur when data is sorted or partitioned on a particular attribute. It is also useful for columns that have many missing (` NULL`) values.

##### Bit Packing

Bit Packing is a compression technique that takes advantage of the fact that integral values rarely span the full range of their data type. For example, four-byte integer values can store values from negative two billion to positive two billion. Frequently the full range of this data type is not used, and instead only small numbers are stored. Bit packing takes advantage of this by removing all of the unnecessary leading zeros when storing values. An example (in decimal) is provided below.

![](../images/compression/bitpacking.png)


For bit packing compression, we keep track of the maximum value for every `1024` values. The maximum value determines the bit packing width, which is the number of bits necessary to store that value. For example, when storing a set of values with a maximum value of `32`, the bit packing width is `5` bits, down from the `32` bits per value that would be required to store uncompressed four-byte integers.

Bit packing is very powerful in practice. It is also convenient to users – as due to this technique there are no storage size differences between using the various integer types. A `BIGINT` column will be stored in the exact same amount of space as an `INTEGER` column. That relieves the user from having to worry about which integer type to choose.

##### Frame of Reference

Frame of Reference encoding is an extension of bit packing, where we also include a frame. The frame is the minimum value found in the set of values. The values are stored as the offset from this frame. An example of this is given below.

![](../images/compression/for.png)


While this might not seem particularly useful at a first glance, it is very powerful when storing dates and timestamps. That is because dates and timestamps are stored as Unix Timestamps in DuckDB, i.e., the offset since `1970-01-01` in either days (for dates) or microseconds (for timestamps). When we have a set of date or timestamp values, the absolute numbers might be very high, but the numbers are all very close together. By applying a frame before bit packing, we can often improve our compression ratio tremendously.

##### Dictionary Encoding

Dictionary encoding works by extracting common values into a separate dictionary, and then replacing the original values with references to said dictionary. An example is provided below.

![](../images/compression/dictionary.png)


Dictionary encoding is particularly efficient when storing text columns with many duplicate entries. The much larger text values can be replaced by small numbers, which can in turn be efficiently bit packed together.

##### FSST

[Fast Static Symbol Table](https://www.vldb.org/pvldb/vol13/p2649-boncz.pdf) compression is an extension to dictionary compression, that not only extracts repetitions of entire strings, but also extracts repetitions _within_ strings. This is effective when storing strings that are themselves unique, but have a lot of repetition within the strings, such as URLs or e-mail addresses. An image illustrating how this works is shown below.

![](../images/compression/fsst.png)


For those interested in learning more, watch the talk by [Peter Boncz here](https://www.youtube.com/watch?v=uJ1KO_UMrQk).

##### Chimp & Patas

[Chimp](https://www.vldb.org/pvldb/vol15/p3058-liakos.pdf) is a very new compression algorithm that is designed to compress floating point values. It is based on [Gorilla compression](https://www.vldb.org/pvldb/vol8/p1816-teller.pdf). The core idea behind Gorilla and Chimp is that floating point values, when XOR'd together, seem to produce small values with many trailing and leading zeros. These algorithms then work on finding an efficient way of storing the trailing and leading zeros.

After implementing Chimp, we have been inspired and worked on implementing Patas, which uses many of the same ideas but optimizes further for higher decompression speed. Expect a future blog post explaining these in more detail soon!

#### Inspecting Compression

The `PRAGMA storage_info` can be used to inspect the storage layout of tables and columns. This can be used to inspect which compression algorithm has been chosen by DuckDB to compress specific columns of a table.

```sql
SELECT * EXCLUDE (column_path, segment_id, start, stats, persistent, block_id, block_offset, has_updates)
FROM pragma_storage_info('taxi')
USING SAMPLE 10 ROWS
ORDER BY row_group_id;
```

| row_group_id | column_name        | column_id | segment_type | count | compression  |
| -----------: | :----------------- | --------: | :----------- | ----: | :----------- |
|            4 | extra              |        13 | FLOAT        | 65536 | Chimp        |
|           20 | tip_amount         |        15 | FLOAT        | 65536 | Chimp        |
|           26 | pickup_latitude    |         6 | VALIDITY     | 65536 | Constant     |
|           46 | tolls_amount       |        16 | FLOAT        | 65536 | RLE          |
|           73 | store_and_fwd_flag |         8 | VALIDITY     | 65536 | Uncompressed |
|           96 | total_amount       |        17 | VALIDITY     | 65536 | Constant     |
|          111 | total_amount       |        17 | VALIDITY     | 65536 | Constant     |
|          141 | pickup_at          |         1 | TIMESTAMP    | 52224 | BitPacking   |
|          201 | pickup_longitude   |         5 | VALIDITY     | 65536 | Constant     |
|          209 | passenger_count    |         3 | TINYINT      | 65536 | BitPacking   |

#### Conclusion & Future Goals

Compression has been tremendously successful in DuckDB, and we have made great strides in reducing the storage requirements of the system. We are still actively working on extending compression within DuckDB, and are looking to improve the compression ratio of the system even further, both by improving our existing techniques and implementing several others. Our goal is to achieve compression on par with Parquet with Snappy, while using only lightweight specialized compression techniques that are very fast to operate on.

## Announcing DuckDB 0.6.0

**Publication date:** 2022-11-14

**Author:** Mark Raasveldt

![](../images/blog/white-headed-duck.jpg)


The DuckDB team is happy to announce the latest DuckDB version (0.6.0) has been released. This release of DuckDB is named "Oxyura" after the [White-headed duck (Oxyura leucocephala)](https://en.wikipedia.org/wiki/White-headed_duck) which is an endangered species native to Eurasia.

To install the new version, please visit the [installation guide](https://duckdb.org/install/index.html). Note that the release is still being rolled out, so not all artifacts may be published yet. The full release notes can be found on [GitHub](https://github.com/duckdb/duckdb/releases/tag/v0.6.0).

#### What's in 0.6.0

The new release contains many improvements to the storage system, general performance improvements, memory management improvements and new features. Below is a summary of the most impactful changes, together with the linked PRs that implement the features.

#### Storage Improvements

As we are working towards stabilizing the storage format and moving towards version 1.0, we have been actively working on improving our storage format, including many [compression improvements](https://duckdb.org/2022/10/28/lightweight-compression).

**Optimistic writing to disk.** In previous DuckDB versions, the data of a single transaction was first loaded into memory, and would only be written to disk on a commit. While this works fine when data is loaded in batches that fit in memory, it does not work well when loading a lot of data in a single transaction, such as when ingesting one very large file into the system.

This version introduces [optimistic writing to disk](https://github.com/duckdb/duckdb/pull/4996). When loading large data sets in a single transaction, data is compressed and streamed to the database file, even before the `COMMIT` has occurred. When the transaction is committed, the data will already have been written to disk, and no further writing has to happen. On a rollback, any optimistically written data is reclaimed by the system.

**Parallel data loading**. In addition to optimistically writing data to disk, this release includes support for parallel data loading into individual tables. This greatly improves performance of data loading on machines that have multiple cores (i.e., all modern machines).

Below is a benchmark comparing loading time of 150 million rows of the Taxi dataset from a Parquet file on an M1 Max with 10 cores:

| Version | Load time |
| ------- | --------: |
| v0.5.1  |    91.4 s |
| v0.6.0  |    17.2 s |

DuckDB supports two modes – the [`order-preserving`](https://github.com/duckdb/duckdb/pull/5082) and the [`non-order-preserving`](https://github.com/duckdb/duckdb/pull/5033) parallel data load.

The order-preserving load preserves the insertion order so that e.g., the first line in your CSV file is the first line in the DuckDB table. The non-order-preserving load does not offer such guarantees – and instead might re-order the data on load. By default the order-preserving load is used, which involves some extra book-keeping. The preservation of insertion order can be disabled using the `SET preserve_insertion_order = false` statement.

#### Compression Improvements

**FSST**. The [Fast Static Symbol Table](https://github.com/duckdb/duckdb/pull/4366) compression algorithm is introduced in this version. This state-of-the-art compression algorithm compresses data *inside* strings using a dictionary, while maintaining support for efficient scans and random look-ups. This greatly increases the compression ratio of strings that have many unique values but with common elements, such as e-mail addresses or URLs.

The compression ratio improvements of the TPC-H SF1 dataset are shown below:

| Compression       |   Size |
| ----------------- | -----: |
| Uncompressed      | 761 MB |
| Dictionary        | 510 MB |
| FSST + Dictionary | 251 MB |

**Chimp**. The [Chimp compression algorithm](https://github.com/duckdb/duckdb/pull/4878) is included, which is the state-of-the-art in lightweight floating point compression. Chimp is an improved version of Gorillas, that achieves both a better compression ratio as well as faster decompression speed.

**Patas**. [Patas](https://github.com/duckdb/duckdb/pull/5044) is a novel floating point compression method that iterates upon the Chimp algorithm by optimizing for a single case in the Chimp algorithm. While Patas generally has a slightly lower compression ratio than Chimp, it has significantly faster decompression speed, almost matching uncompressed data in read speed.

The compression ratio of a dataset containing temperatures of cities stored as double (8-byte floating point numbers) is shown below:

| Compression  |    Size |
| ------------ | ------: |
| Uncompressed | 25.4 MB |
| Chimp        |  9.7 MB |
| Patas        | 10.2 MB |

#### Performance Improvements

DuckDB aims to have very high performance for a wide variety of workloads. As such, we are always working to improve performance for various workloads. This release is no different.

**Parallel CSV Loading (Experimental)**. In this release we are launching [a new experimental parallel CSV reader](https://github.com/duckdb/duckdb/pull/5194). This greatly improves the ingestion speed of large CSV files into the system. While we have done our best to make the parallel CSV reader robust – CSV parsing is a minefield as there is such a wide variety of different files out there – so we have marked the reader as experimental for now.

The parallel CSV reader can be enabled by setting the `experimental_parallel_csv` flag to true. We aim to make the parallel CSV reader the default reader in future DuckDB versions.

```sql
SET experimental_parallel_csv = true;
```

Below is the load time of a 720 MB CSV file containing the `lineitem` table from the `TPC-H` benchmark, 

| Variant         | Load time |
| --------------- | --------: |
| Single-threaded |     3.5 s |
| Parallel        |     0.6 s |

**Parallel CREATE INDEX & Index Memory Management Improvements**. Index creation is also sped up significantly in this release, as [the `CREATE INDEX` statement can now be executed fully in parallel](https://github.com/duckdb/duckdb/pull/4655). In addition, the number of memory allocations done by the ART is greatly reduced through [inlining of small structures](https://github.com/duckdb/duckdb/pull/5292) which both reduces memory size and further improves performance.

The timings of creating an index on a single column with 16 million values is shown below.

| Version | Create index time |
| ------- | ----------------: |
| v0.5.1  |            5.92 s |
| v0.6.0  |            1.38 s |

**Parallel count(DISTINCT)**. Aggregates containing `DISTINCT` aggregates, most commonly used for exact distinct count computation (e.g., `count(DISTINCT col)`) previously had to be executed in single-threaded mode. Starting with v0.6.0, [DuckDB can execute these queries in parallel](https://github.com/duckdb/duckdb/pull/5146), leading to large speed-ups.

#### SQL Syntax Improvements

SQL is the primary way of interfacing with DuckDB – and DuckDB [tries to have an easy to use SQL dialect](https://duckdb.org/2022/05/04/friendlier-sql). This release contains further improvements to the SQL dialect.

**UNION Type**. This release introduces the [UNION type](https://github.com/duckdb/duckdb/pull/4966), which allows sum types to be stored and queried in DuckDB. For example:

```sql
CREATE TABLE messages (u UNION(num INTEGER, error VARCHAR));
INSERT INTO messages VALUES (42);
INSERT INTO messages VALUES ('oh my globs');
SELECT * FROM messages;
```

```text
┌─────────────┐
│      u      │
├─────────────┤
│ 42          │
│ oh my globs │
└─────────────┘
```

Sum types are strongly typed – but they allow a single value in a table to be represented as one of various types. The [union page](#docs:lts:sql:data_types:union) in the documentation contains more information on how to use this new composite type.

**FROM-first**. Starting with this release, DuckDB supports starting queries with the [`FROM` clause](https://github.com/duckdb/duckdb/pull/5076) instead of the `SELECT` clause. In fact, the `SELECT` clause is fully optional now, and defaults to `SELECT *`. That means the following queries are now valid in DuckDB:

```sql
-- SELECT clause is optional, SELECT * is implied (if not included)
FROM tbl;

-- first 5 rows of the table
FROM tbl LIMIT 5;

-- SELECT can be used after the FROM
FROM tbl SELECT l_orderkey;

-- insert all data from tbl1 into tbl2
INSERT INTO tbl2 FROM tbl1;
```

**COLUMNS Expression**. This release adds support for [the `COLUMNS` expression](https://github.com/duckdb/duckdb/pull/5120), inspired by [the ClickHouse syntax](https://clickhouse.com/docs/en/sql-reference/statements/select/#columns-expression). The `COLUMNS` expression allows you to execute expressions or functions on multiple columns without having to duplicate the full expression.

```sql
CREATE TABLE obs (id INTEGER, val1 INTEGER, val2 INTEGER);
INSERT INTO obs VALUES (1, 10, 100), (2, 20, NULL), (3, NULL, 300);
SELECT min(COLUMNS(*)), count(*) FROM obs;
```

```text
┌─────────────┬───────────────┬───────────────┬──────────────┐
│ min(obs.id) │ min(obs.val1) │ min(obs.val2) │ count_star() │
├─────────────┼───────────────┼───────────────┼──────────────┤
│ 1           │ 10            │ 100           │ 3            │
└─────────────┴───────────────┴───────────────┴──────────────┘
```

The `COLUMNS` expression supports all star expressions, including [the `EXCLUDE` and `REPLACE` syntax](#docs:lts:sql:query_syntax:select). In addition, the `COLUMNS` expression can take a regular expression as parameter:

```sql
SELECT COLUMNS('val[0-9]+') FROM obs;
```

```text
┌──────┬──────┐
│ val1 │ val2 │
├──────┼──────┤
│ 10   │ 100  │
│ 20   │ NULL │
│ NULL │ 300  │
└──────┴──────┘
```

**List comprehension support**. List comprehension is an elegant and powerful way of defining operations on lists. DuckDB now also supports [list comprehension](https://github.com/duckdb/duckdb/pull/4926) as part of its SQL dialect. For example, the query below now works:

```sql
SELECT [x + 1 for x in [1, 2, 3]] AS l;
```

```text
┌───────────┐
│     l     │
├───────────┤
│ [2, 3, 4] │
└───────────┘
```

Nested types and structures are very efficiently implemented in DuckDB, and are now also more elegant to work with.

#### Memory Management Improvements

When working with large data sets, memory management is always a potential pain point. By using a streaming execution engine and buffer manager, DuckDB supports many operations on larger than memory data sets. DuckDB also aims to support queries where *intermediate* results do not fit into memory by using disk-spilling techniques, and has support for an [efficient out-of-core sort](https://duckdb.org/2021/08/27/external-sorting), [out-of-core window functions](https://duckdb.org/2021/10/13/windowing) and [an out-of-core hash join](https://github.com/duckdb/duckdb/pull/4189).

This release further improves on that by greatly optimizing the [out-of-core hash join](https://github.com/duckdb/duckdb/pull/4970), resulting in a much more graceful degradation in performance as the data exceeds the memory limit.

| Memory limit (GB) | Old time (s) | New time (s) |
| ----------------: | -----------: | -----------: |
|                10 |         1.97 |         1.96 |
|                 9 |         1.97 |         1.97 |
|                 8 |         2.23 |         2.22 |
|                 7 |         2.23 |         2.44 |
|                 6 |         2.27 |         2.39 |
|                 5 |         2.27 |         2.32 |
|                 4 |         2.81 |         2.45 |
|                 3 |         5.60 |         3.20 |
|                 2 |         7.69 |         3.28 |
|                 1 |        17.73 |         4.35 |

**jemalloc**. In addition, this release bundles the [jemalloc allocator](https://github.com/duckdb/duckdb/pull/4971) with the Linux version of DuckDB by default, which fixes an outstanding issue where the standard `GLIBC` allocator would not return blocks to the operating system, unnecessarily leading to out-of-memory errors on the Linux version. Note that this problem does not occur on macOS or Windows, and as such we continue using the standard allocators there (at least for now).

#### Shell Improvements

DuckDB has a command-line interface that is adapted from SQLite's command line interface, and therefore supports an extremely similar interface to SQLite. All of the tables in this blog post have been generated using the `.mode markdown` in the CLI.

The DuckDB shell also offers several improvements over the SQLite shell, such as syntax highlighting, and this release includes a few new goodies.

**DuckBox Rendering**. This release includes a [new `.mode duckbox` rendering](https://github.com/duckdb/duckdb/pull/5140) that is used by default. This box rendering adapts to the size of the shell, and leaves out columns and rows to provide a better overview of a result. It very quickly renders large result sets by leaving out rows in the middle. That way, typing `SELECT * FROM tbl` in the shell no longer blows it up. In fact, this can now be used to quickly get a good feel of a dataset instead.

The number of rows that are rendered can be changed by using the `.maxrows X` setting, and you can switch back to the old rendering using the `.mode box` command.

```sql
SELECT * FROM '~/Data/nyctaxi/nyc-taxi/2014/04/data.parquet';
```

```text
┌───────────┬─────────────────────┬─────────────────────┬───┬────────────┬──────────────┬──────────────┐
│ vendor_id │      pickup_at      │     dropoff_at      │ … │ tip_amount │ tolls_amount │ total_amount │
│  varchar  │      timestamp      │      timestamp      │   │   float    │    float     │    float     │
├───────────┼─────────────────────┼─────────────────────┼───┼────────────┼──────────────┼──────────────┤
│ CMT       │ 2014-04-08 08:59:39 │ 2014-04-08 09:28:57 │ … │        3.7 │          0.0 │         22.2 │
│ CMT       │ 2014-04-08 14:59:22 │ 2014-04-08 15:04:52 │ … │        1.3 │          0.0 │          7.8 │
│ CMT       │ 2014-04-08 08:45:28 │ 2014-04-08 08:50:41 │ … │        1.2 │          0.0 │          7.2 │
│ CMT       │ 2014-04-08 08:00:20 │ 2014-04-08 08:11:31 │ … │        1.7 │          0.0 │         10.2 │
│ CMT       │ 2014-04-08 08:38:36 │ 2014-04-08 08:44:37 │ … │        1.2 │          0.0 │          7.2 │
│ CMT       │ 2014-04-08 07:52:53 │ 2014-04-08 07:59:12 │ … │        1.3 │          0.0 │          7.8 │
│ CMT       │ 2014-04-08 16:08:16 │ 2014-04-08 16:12:38 │ … │        1.4 │          0.0 │          8.4 │
│ CMT       │ 2014-04-08 12:04:09 │ 2014-04-08 12:14:30 │ … │        1.7 │          0.0 │         10.2 │
│ CMT       │ 2014-04-08 16:18:38 │ 2014-04-08 16:37:04 │ … │        2.5 │          0.0 │         17.5 │
│ CMT       │ 2014-04-08 15:28:00 │ 2014-04-08 15:34:44 │ … │        1.4 │          0.0 │          8.4 │
│  ·        │          ·          │          ·          │ · │         ·  │           ·  │           ·  │
│  ·        │          ·          │          ·          │ · │         ·  │           ·  │           ·  │
│  ·        │          ·          │          ·          │ · │         ·  │           ·  │           ·  │
│ CMT       │ 2014-04-25 00:09:34 │ 2014-04-25 00:14:52 │ … │        2.5 │          0.0 │         10.0 │
│ CMT       │ 2014-04-25 01:59:39 │ 2014-04-25 02:16:07 │ … │        3.5 │          0.0 │         21.0 │
│ CMT       │ 2014-04-24 23:02:08 │ 2014-04-24 23:47:10 │ … │        8.8 │          0.0 │         52.8 │
│ CMT       │ 2014-04-25 01:27:11 │ 2014-04-25 01:56:53 │ … │        4.6 │          0.0 │         27.6 │
│ CMT       │ 2014-04-25 00:15:46 │ 2014-04-25 00:25:37 │ … │        1.0 │          0.0 │         11.5 │
│ CMT       │ 2014-04-25 00:17:53 │ 2014-04-25 00:22:52 │ … │        1.3 │          0.0 │          7.8 │
│ CMT       │ 2014-04-25 03:13:19 │ 2014-04-25 03:21:50 │ … │        2.1 │          0.0 │         12.6 │
│ CMT       │ 2014-04-24 23:53:03 │ 2014-04-25 00:16:01 │ … │       2.85 │          0.0 │        31.35 │
│ CMT       │ 2014-04-25 00:26:08 │ 2014-04-25 00:31:25 │ … │        1.4 │          0.0 │          8.4 │
│ CMT       │ 2014-04-24 23:21:39 │ 2014-04-24 23:33:57 │ … │        1.0 │          0.0 │         11.5 │
├───────────┴─────────────────────┴─────────────────────┴───┴────────────┴──────────────┴──────────────┤
│ 14618759 rows (20 shown)                                                        18 columns (6 shown) │
└──────────────────────────────────────────────────────────────────────────────────────────────────────┘
```


**Context-Aware Auto-Complete**. The shell now also ships with [context-aware auto-complete](https://github.com/duckdb/duckdb/pull/4921). Auto-complete is triggered by pressing the tab character. The shell auto-completes four different groups: (1) keywords, (2) table names + table functions, (3) column names + scalar functions, and (4) file names. The shell looks at the position in the SQL statement to determine which of these auto-completions to trigger. For example:

```sql
S -> SELECT

SELECT s -> student_id

SELECT student_id F -> FROM


SELECT student_id FROM g -> grades

SELECT student_id FROM 'd -> data/

SELECT student_id FROM 'data/ -> data/grades.csv
```

**Progress Bars**. DuckDB has [supported progress bars in queries for a while now](https://github.com/duckdb/duckdb/pull/1432), but they have always been opt-in. In this release we have [prettied up the progress bar](https://github.com/duckdb/duckdb/pull/5187) and enabled it by default in the shell. The progress bar will pop up when a query is run that takes more than 2 seconds, and display an estimated time-to-completion for the query.

```sql
COPY lineitem TO 'lineitem-big.parquet';
```

```text
   32% ▕███████████████████▏                                        ▏ 
```

In the future we aim to enable the progress bar by default in other clients. For now, this can be done manually by running the following SQL queries:

```sql
PRAGMA enable_progress_bar;
PRAGMA enable_print_progress_bar;
```

## Announcing DuckDB 0.7.0

**Publication date:** 2023-02-13

**Author:** Mark Raasveldt

![](../images/blog/labrador_duck.png)


The DuckDB team is happy to announce the latest DuckDB version (0.7.0) has been released. This release of DuckDB is named "Labradorius" after the [Labrador Duck (Camptorhynchus labradorius)](https://en.wikipedia.org/wiki/Labrador_duck) that was native to North America.

To install the new version, please visit the [installation guide](https://duckdb.org/install/index.html). The full release notes can be found on [GitHub](https://github.com/duckdb/duckdb/releases/tag/v0.7.0).

#### What's in 0.7.0

The new release contains many improvements to the JSON support, new SQL features, improvements to data ingestion and export, and other new features. Below is a summary of the most impactful changes, together with the linked PRs that implement the features.

#### Data Ingestion/Export Improvements

**JSON Ingestion.** This version introduces the [`read_json` and `read_json_auto`](https://github.com/duckdb/duckdb/pull/5992) methods. These can be used to ingest JSON files into a tabular format. Similar to `read_csv`, the `read_json` method requires a schema to be specified, while the `read_json_auto` automatically infers the schema of the JSON from the file using sampling. Both [new-line delimited JSON](https://github.com/ndjson/ndjson-spec) and regular JSON are supported.

```sql
FROM 'data/json/with_list.json';
```

| id  | name                             |
| --- | -------------------------------- |
| 1   | [O, Brother,, Where, Art, Thou?] |
| 2   | [Home, for, the, Holidays]       |
| 3   | [The, Firm]                      |
| 4   | [Broadcast, News]                |
| 5   | [Raising, Arizona]               |

**Partitioned Parquet/CSV Export.** DuckDB has been able to ingest [Hive-partitioned Parquet and CSV files](#docs:lts:core_extensions:httpfs:overview::hive-partitioning) for a while. After this release [DuckDB will also be able to _write_ Hive-partitioned data](https://github.com/duckdb/duckdb/pull/5964) using the `PARTITION_BY` clause. These files can be exported locally or remotely to S3 compatible storage. Here is a local example:

```sql
COPY orders TO 'orders' (FORMAT parquet, PARTITION_BY (year, month));
```

This will cause the Parquet files to be written in the following directory structure:

```text
orders
├── year=2021
│    ├── month=1
│    │   ├── file1.parquet
│    │   └── file2.parquet
│    └── month=2
│        └── file3.parquet
└── year=2022
     ├── month=11
     │   ├── file4.parquet
     │   └── file5.parquet
     └── month=12
         └── file6.parquet
```

**Parallel Parquet/CSV Writing.** Parquet and CSV writing are sped up tremendously this release with the [parallel Parquet and CSV writer support](https://github.com/duckdb/duckdb/pull/5756).

| Format  |   Old | New (8T) |
| ------- | ----: | -------: |
| CSV     | 2.6 s |    0.4 s |
| Parquet | 7.5 s |    1.3 s |

Note that currently the parallel writing is currently limited to non-insertion order preserving – which can be toggled by setting the `preserve_insertion_order` setting to false. In a future release we aim to alleviate this restriction and order parallel insertion order preserving writes as well.

#### Multi-Database Support

**Attach Functionality.** This release adds support for [attaching multiple databases](https://github.com/duckdb/duckdb/pull/5764) to the same DuckDB instance. This easily allows data to be transferred between separate DuckDB database files, and also allows data from separate database files to be combined together in individual queries. Remote DuckDB instances (stored on a network accessible location like GitHub, for example) may also be attached.

```sql
ATTACH 'new_db.db';
CREATE TABLE new_db.tbl (i INTEGER);
INSERT INTO new_db.tbl SELECT * FROM range(1000);
DETACH new_db;
```

See the [documentation for more information](#docs:lts:sql:statements:attach).

**SQLite Storage Back-End.** In addition to adding support for attaching DuckDB databases – this release also adds support for [_pluggable database engines_](https://github.com/duckdb/duckdb/pull/6066). This allows extensions to define their own database and catalog engines that can be attached to the system. Once attached, an engine can support both reads and writes. The [SQLite extension](https://github.com/duckdb/duckdb-sqlite) makes use of this to add native read/write support for SQLite database files to DuckDB.

```sql
ATTACH 'sqlite_file.db' AS sqlite (TYPE sqlite);
CREATE TABLE sqlite.tbl (i INTEGER);
INSERT INTO sqlite.tbl VALUES (1), (2), (3);
SELECT * FROM sqlite.tbl;
```

Using this, SQLite database files can be attached, queried and modified as if they are native DuckDB database files. This allows data to be quickly transferred between SQLite and DuckDB – and allows you to use DuckDB's rich SQL dialect to query data stored in SQLite tables.

#### New SQL Features

**Upsert Support.** [Upsert support](https://github.com/duckdb/duckdb/pull/5866) is added with this release using the `ON CONFLICT` clause, as well as the `SQLite` compatible `INSERT OR REPLACE`/`INSERT OR IGNORE` syntax.

```sql
CREATE TABLE movies (id INTEGER PRIMARY KEY, name VARCHAR);
INSERT INTO movies VALUES (1, 'A New Hope');
FROM movies;
```

| id  | name       |
| --- | ---------- |
| 1   | A New Hope |

```sql
INSERT OR REPLACE INTO movies VALUES (1, 'The Phantom Menace');
FROM movies;
```

| id  | name               |
| --- | ------------------ |
| 1   | The Phantom Menace |

See the [documentation for more information](#docs:lts:sql:statements:insert::on-conflict-clause).

**Lateral Joins.** Support for [lateral joins](https://github.com/duckdb/duckdb/pull/5393) is added in this release. Lateral joins are a more flexible variant of correlated subqueries that make working with nested data easier, as they allow [easier unnesting](https://github.com/duckdb/duckdb/pull/5485) of nested data.

**Positional Joins.** While SQL formally models unordered sets, in practice the order of datasets does frequently have a meaning. DuckDB offers guarantees around maintaining the order of rows when loading data into tables or when exporting data back out to a file – as well as when executing queries such as `LIMIT` without a corresponding `ORDER BY` clause.

To improve support for this use case – this release [introduces the `POSITIONAL JOIN`](https://github.com/duckdb/duckdb/pull/5867). Rather than joining on the values of rows – this new join type joins rows based on their position in the table.

```sql
CREATE TABLE t1 AS FROM (VALUES (1), (2), (3)) t(i);
CREATE TABLE t2 AS FROM (VALUES (4), (5), (6)) t(k);
SELECT * FROM t1 POSITIONAL JOIN t2;
```

| i   | k   |
| --- | --- |
| 1   | 4   |
| 2   | 5   |
| 3   | 6   |

#### Python API Improvements

**Query Building.** This release introduces easier incremental query building using the Python API by allowing relations to be queried. This allows you to decompose long SQL queries into multiple smaller SQL queries, and allows you to easily inspect query intermediates.

```python
>>> import duckdb
>>> lineitem = duckdb.sql('FROM lineitem.parquet')
>>> lineitem.limit(3).show()
```

```text
┌────────────┬───────────┬───────────┬───┬───────────────────┬────────────┬──────────────────────┐
│ l_orderkey │ l_partkey │ l_suppkey │ … │  l_shipinstruct   │ l_shipmode │      l_comment       │
│   int32    │   int32   │   int32   │   │      varchar      │  varchar   │       varchar        │
├────────────┼───────────┼───────────┼───┼───────────────────┼────────────┼──────────────────────┤
│          1 │    155190 │      7706 │ … │ DELIVER IN PERSON │ TRUCK      │ egular courts abov…  │
│          1 │     67310 │      7311 │ … │ TAKE BACK RETURN  │ MAIL       │ ly final dependenc…  │
│          1 │     63700 │      3701 │ … │ TAKE BACK RETURN  │ REG AIR    │ riously. regular, …  │
├────────────┴───────────┴───────────┴───┴───────────────────┴────────────┴──────────────────────┤
│ 3 rows                                                                    16 columns (6 shown) │
└────────────────────────────────────────────────────────────────────────────────────────────────┘
```

```python
>>> lineitem_filtered = duckdb.sql('FROM lineitem WHERE l_orderkey>5000')
>>> lineitem_filtered.limit(3).show()
```

```text
┌────────────┬───────────┬───────────┬───┬────────────────┬────────────┬──────────────────────┐
│ l_orderkey │ l_partkey │ l_suppkey │ … │ l_shipinstruct │ l_shipmode │      l_comment       │
│   int32    │   int32   │   int32   │   │    varchar     │  varchar   │       varchar        │
├────────────┼───────────┼───────────┼───┼────────────────┼────────────┼──────────────────────┤
│       5024 │    165411 │       444 │ … │ NONE           │ AIR        │  to the expre        │
│       5024 │     57578 │        84 │ … │ COLLECT COD    │ REG AIR    │ osits hinder caref…  │
│       5024 │    111009 │      3521 │ … │ NONE           │ MAIL       │ zle carefully saut…  │
├────────────┴───────────┴───────────┴───┴────────────────┴────────────┴──────────────────────┤
│ 3 rows                                                                 16 columns (6 shown) │
└─────────────────────────────────────────────────────────────────────────────────────────────┘
```

```python
>>> duckdb.sql('SELECT min(l_orderkey), max(l_orderkey) FROM lineitem_filtered').show()
```

```text
┌─────────────────┬─────────────────┐
│ min(l_orderkey) │ max(l_orderkey) │
│      int32      │      int32      │
├─────────────────┼─────────────────┤
│            5024 │         6000000 │
└─────────────────┴─────────────────┘
```

Note that everything is lazily evaluated. The Parquet file is not read from disk until the final query is executed – and queries are optimized in their entirety. Executing the decomposed query will be just as fast as executing the long SQL query all at once.

**Python Ingestion APIs.** This release adds several [familiar data ingestion and export APIs](https://github.com/duckdb/duckdb/pull/6015) that follow standard conventions used by other libraries. These functions emit relations as well – which can be directly queried again.

```python
>>> lineitem = duckdb.read_csv('lineitem.csv')
>>> lineitem.limit(3).show()
```

```text
┌────────────┬───────────┬───────────┬───┬───────────────────┬────────────┬──────────────────────┐
│ l_orderkey │ l_partkey │ l_suppkey │ … │  l_shipinstruct   │ l_shipmode │      l_comment       │
│   int32    │   int32   │   int32   │   │      varchar      │  varchar   │       varchar        │
├────────────┼───────────┼───────────┼───┼───────────────────┼────────────┼──────────────────────┤
│          1 │    155190 │      7706 │ … │ DELIVER IN PERSON │ TRUCK      │ egular courts abov…  │
│          1 │     67310 │      7311 │ … │ TAKE BACK RETURN  │ MAIL       │ ly final dependenc…  │
│          1 │     63700 │      3701 │ … │ TAKE BACK RETURN  │ REG AIR    │ riously. regular, …  │
├────────────┴───────────┴───────────┴───┴───────────────────┴────────────┴──────────────────────┤
│ 3 rows                                                                    16 columns (6 shown) │
└────────────────────────────────────────────────────────────────────────────────────────────────┘
```

```python
>>> duckdb.sql('SELECT min(l_orderkey) FROM lineitem').show()
```

```text
┌─────────────────┐
│ min(l_orderkey) │
│      int32      │
├─────────────────┤
│               1 │
└─────────────────┘
```

**Polars Integration.** This release adds support for tight integration with the [Polars DataFrame library](https://github.com/pola-rs/polars), similar to our integration with Pandas DataFrames. Results can be converted to Polars DataFrames using the `.pl()` function.

```python
import duckdb
duckdb.sql('SELECT 42').pl()
```

```text
shape: (1, 1)
┌─────┐
│ 42  │
│ --- │
│ i32 │
╞═════╡
│ 42  │
└─────┘
```

In addition, Polars DataFrames can be directly queried using the SQL interface.

```python
import duckdb
import polars as pl
df = pl.DataFrame({'a': 42})
duckdb.sql('SELECT * FROM df').pl()
```

```text
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 42  │
└─────┘
```

**fsspec Filesystem Support.** This release adds support for the [fsspec filesystem API](https://github.com/duckdb/duckdb/pull/5829). [fsspec](https://filesystem-spec.readthedocs.io/en/latest/) allows users to define their own filesystem that they can pass to DuckDB. DuckDB will then use this file system to read and write data to and from. This enables support for storage back-ends that may not be natively supported by DuckDB yet, such as FTP.

```python
import duckdb
from fsspec import filesystem

duckdb.register_filesystem(filesystem('gcs'))

data = duckdb.query("SELECT * FROM read_csv_auto('gcs:///bucket/file.csv')").fetchall()
```

Have a look at the [guide](#docs:lts:guides:python:filesystems) for more information

#### Storage Improvements

**Delta Compression.** Compression of numeric values in the storage is improved using the new [delta and delta-constant compression](https://github.com/duckdb/duckdb/pull/5491). This compression method is particularly effective when compressing values that are equally spaced out. For example, sequences of numbers (` 1, 2, 3, ...`) or timestamps with a fixed interval between them (` 12:00:01, 12:00:02, 12:00:03, ...`).

#### Final Thoughts

The full release notes can be [found on GitHub](https://github.com/duckdb/duckdb/releases/tag/v0.7.0). We would like to thank all of the contributors for their hard work on improving DuckDB.

## JupySQL Plotting with DuckDB

**Publication date:** 2023-02-24

**Author:** Guest post by Eduardo Blancas

**TL;DR:** [JupySQL](https://github.com/ploomber/jupysql) provides a seamless SQL experience in Jupyter and uses DuckDB to visualize larger than memory datasets in matplotlib.

#### Introduction

Data visualization is essential for every data practitioner since it allows us to find patterns that otherwise would be hard to see. The typical approach for plotting tabular datasets involves Pandas and Matplotlib. However, this technique quickly falls short as our data grows, given that Pandas introduces a significant memory overhead, making it challenging even to plot a medium-sized dataset.

In this blog post, we'll use [JupySQL](https://github.com/ploomber/jupysql) and DuckDB to efficiently plot *larger-than-memory* datasets in our laptops. JupySQL is a fork of ipython-sql, which adds SQL cells to Jupyter, that is being actively maintained and enhanced by the team at Ploomber.

Combining JupySQL with DuckDB enables a powerful and user friendly local SQL processing experience, especially when combined with JupySQL's new plotting capabilities. There is no need to get beefy (and expensive!) EC2 machines or configure complex distributed frameworks! Get started with JupySQL and DuckDB with our [Jupyter Notebook guide](#docs:lts:guides:python:jupyter), or go directly to an example [collab notebook](https://colab.research.google.com/drive/1eOA2FYHqEfZWLYssbUxdIpSL3PFxWVjk?usp=sharing)!

*We want JupySQL to offer the best SQL experience in Jupyter, so if you have any feedback, please open an issue on [GitHub!](https://github.com/ploomber/jupysql/issues/new)*

#### The Problem

One significant limitation when using `pandas` and `matplotlib` for data visualization is that we need to load all our data into memory, making it difficult to plot *larger-than-memory* datasets. Furthermore, given the overhead that `pandas` introduces, we might be unable to visualize some smaller datasets that we might think "fit" into memory.

Let's load a sample `.parquet` dataset using pandas to show the memory overhead:

```python
from urllib.request import urlretrieve

_ = urlretrieve("https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2022-01.parquet", 
                "yellow_tripdata_2022-01.parquet")
```

The downloaded `.parquet` file takes 36 MB of disk space:

```python
ls -lh *.parquet
```



```text
-rw-r--r--  1 eduardo  staff    36M Jan 18 14:45 yellow_tripdata_2022-01.parquet
```



Now let's load the `.parquet` as a data frame and see how much memory it takes:

```python
import pandas as pd

df = pd.read_parquet("yellow_tripdata_2022-01.parquet")

df_mb = df.memory_usage().sum() / (1024 ** 2)
print(f"Data frame takes {df_mb:.0f} MB")
```



```text
Data frame takes 357 MB
```



As you can see, we're using almost 10× as much memory as the file size. Given this overhead, we must be much more conservative about what *larger-than-memory* means, as "medium" files might not fit into memory once loaded. But this is just the beginning of our memory problems.

When plotting data, we often need to preprocess it before it's suitable for visualization. However, if we're not careful, these preprocessing steps will copy our data, dramatically increasing memory. Let's show a practical example.

Our sample dataset contains an observation for each NYC yellow cab trip in January 2022. Let's create a boxplot for the trip's distance:

```python
import matplotlib.pyplot as plt

plt.boxplot(df.trip_distance)
_ = plt.title("Trip distance")
```



![8-0](../images/blog/jupysql/serialized/8-0.png)


Wow! It looks like some new yorkers really like taxi rides! Let's put the taxi fans aside to improve the visualization and compute the 99th percentile to use it as the cutoff value:

```python
cutoff = df.trip_distance.quantile(q=0.99)
cutoff
```



```text
19.7
```



Now, we need to filter out observations larger than the cutoff value; but before we do it, let's create a utility function to capture memory usage:

```python
import psutil

def memory_used():
    """Returns memory used in MB"""
    mem = psutil.Process().memory_full_info().uss / (1024 ** 2)
    print(f"Memory used: {mem:.0f} MB")

memory_used()
```



```text
Memory used: 941 MB
```



Let's now filter out the observations:

```python
df_ = df[df.trip_distance < cutoff]
```

Plot the histogram:

```python
plt.boxplot(df_.trip_distance)
_ = plt.title("Trip distance (top 1% observations removed)")
```



![16-0](../images/blog/jupysql/serialized/16-0.png)


We now see more reasonable numbers with the top 1% outliers removed. There are a few trips over 10 miles (perhaps some uptown new yorkers going to Brooklyn for some [delicious pizza?](https://en.wikipedia.org/wiki/Juliana%27s_Pizza))

How much memory are we using now?

```python
memory_used()
```



```text
Memory used: 1321 MB
```



380 MB more! Loading a 36 MB Parquet file turned into >700 MB in memory after loading and applying one preprocessing step!

So, in reality, when we use `pandas`, what fits in memory is much smaller than we think, and even with a laptop equipped with 16 GB of RAM, we'll be extremely limited in terms of what size of a dataset we process. Of course, we could save a lot of memory by exclusively loading the column we plotted and deleting unneeded data copies; however, let's face it, *this never happens in practice*. When exploring data, we rarely know ahead of time which columns we'll need; furthermore, our time is better spent analyzing and visualizing the data than manually deleting data copies.

When facing this challenge, we might consider using a distributed framework; however, this adds so much complexity to the process, and it only partially solves the problem since we'd need to write code to compute the statistics in a distributed fashion. Alternatively, we might consider getting a larger machine, a relatively straightforward (but expensive!) approach if we can access cloud resources. However, this still requires us to move our data, set up a new environment, etc. Fortunately for us, there's DuckDB!

#### DuckDB: A Highly Scalable Backend for Statistical Visualizations

When using functions such as [`hist`](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html) (histogram) or [`boxplot`](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.boxplot.html), `matplotlib` performs two steps:

1. Compute summary statistics
2. Plot data

For example, `boxplot` calls another function called [`boxplot_stats`](https://matplotlib.org/stable/api/cbook_api.html#matplotlib.cbook.boxplot_stats) that returns the statistics required to draw the plot. To create a boxplot, we need several summary statistics, such as the 25th percentile, 50th percentile, and 75th percentile. The following diagram shows a boxplot along with the labels for each part:

![](../images/blog/jupysql/boxplot-labels.png)


The bottleneck in the `pandas` + `matplotlib` approach is the `boxplot_stats` function since it requires a `numpy.array` or `pandas.Series` as an input, forcing us to load all our data into memory. However, we can implement a new version of `boxplot_stats` that pushes the data aggregation step to another analytical engine.

We chose DuckDB because it's extremely powerful and easy to use. There is no need to spin up a server or manage complex configurations: install it with `pip install`, point it to your data files, and that's it; you can start aggregating millions and millions of data points in no time!

You can see the full [implementation here](https://github.com/ploomber/jupysql/blob/6c081823e0d5f9e55a07ec617f5b6188af7a1e58/src/sql/plot.py#L125); essentially, we translated matplotlib's `boxplot_stats` from Python into SQL. For example, the following query will compute the three percentiles we need: 25th, 50th (median), and 75th:

```python
%load_ext sql
%sql duckdb://
```

```sql
%%sql
-- We calculate the percentiles all at once and 
-- then convert from list format into separate columns
-- (Improving performance by reducing duplicate work)
WITH stats AS (
  SELECT
    percentile_disc([0.25, 0.50, 0.75]) WITHIN GROUP 
      (ORDER BY "trip_distance") AS percentiles
  FROM 'yellow_tripdata_2022-01.parquet'
)
SELECT
  percentiles[1] AS q1,
  percentiles[2] AS median,
  percentiles[3] AS q3
FROM stats;
```



<table>
    <tr>
        <th>q1</th>
        <th>median</th>
        <th>q3</th>
    </tr>
    <tr>
        <td>1.04</td>
        <td>1.74</td>
        <td>3.13</td>
    </tr>
</table>


Once we compute all the statistics, we call the [`bxp`](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.bxp.html) function, which draws the boxplot from the input statistics.

This process is already implemented in JupySQL, and you can create a boxplot with the `%sqlplot boxplot` command. Let's see how. But first, let's check how much memory we're using, so we can compare it to the `pandas` version:

```python
memory_used()
```


```text
Memory used: 1351 MB
```



Let's create the boxplot:

```python
%sqlplot boxplot --table yellow_tripdata_2022-01.parquet --column trip_distance
```



![26-1](../images/blog/jupysql/serialized/26-1.png)


Again, we see all these outliers. Let's compute the cutoff value:

```sql
%%sql
SELECT percentile_disc(0.99) WITHIN GROUP (ORDER BY trip_distance)
FROM 'yellow_tripdata_2022-01.parquet'
```



<table>
    <tr>
        <th>quantile_disc(0.99 ORDER BY trip_distance)</th>
    </tr>
    <tr>
        <td>19.7</td>
    </tr>
</table>


Let's define a query that filters out the top 1% of observations. The `--save` option allows us to store this SQL expression and we choose not to execute it.

```sql
%%sql --save no-outliers --no-execute
SELECT *
FROM 'yellow_tripdata_2022-01.parquet'
WHERE trip_distance < 19.7;
```





We can now use `no-outliers` in our `%sqlplot boxplot` command:

```python
%sqlplot boxplot --table no-outliers --column trip_distance --with no-outliers
```



![32-1](../images/blog/jupysql/serialized/32-1.png)


```python
memory_used()
```



```text
Memory used: 1375 MB
```



Memory usage remained pretty much the same (23 MB difference, mostly due to the newly imported modules). Since we're relying on DuckDB for the data aggregation step, the SQL engine takes care of loading, aggregating, and freeing up memory as soon as we're done; this is much more efficient than loading all our data at the same time and keeping unwanted data copies!


#### Using DuckDB to Compute Histogram Statistics

We can extend our recipe to other statistical visualizations, such as histograms.

A histogram allows us to visualize the distribution of a dataset, enabling us to find patterns such as modality, outliers, range of values, etc. Like with the boxplot, when using `pandas` + `matplotlib`, creating a histogram involves loading all our data at once into memory; then, `matplotlib` aggregates and plots it.

In our case, we'll push the aggregation to DuckDB, which will compute the bin positions (X-axis) and heights (Y-axis), then we'll pass this to maptlotlib's `bar` function to create the histogram.

The [implementation](https://github.com/ploomber/jupysql/blob/6c081823e0d5f9e55a07ec617f5b6188af7a1e58/src/sql/plot.py#L283)  involves two steps.

First, given the number of bins chosen by the user (` N_BINS`), we compute the `BIN_SIZE`:

```sql
%%sql
SELECT (max(trip_distance) - min(trip_distance)) / N_BINS
FROM 'yellow_tripdata_2022-01.parquet';
```

Then, using the `BIN_SIZE`, we find the number of observations that fall into each bin:

```sql
%%sql
SELECT
    floor("trip_distance" / BIN_SIZE) * BIN_SIZE,
    count(*) AS count
FROM 'yellow_tripdata_2022-01.parquet'
GROUP BY 1
ORDER BY 1;
```


The intuition for the second query is as follows: given that we have `N_BINS`, the `floor("trip_distance" / BIN_SIZE)` portion will assign each observation to their corresponding bin (1, 2, ..., `N_BINS`), then, we multiply by the bin size to get the value in the `X` axis, while the count represents the value in the `Y` axis. Once we have that, we call the [`bar`](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.bar.html) plotting function.

All these steps are implemented in the `%sqplot histogram` command:

```python
%sqlplot histogram --table no-outliers --column trip_distance --with no-outliers
```



![37-1](../images/blog/jupysql/serialized/37-1.png)


#### Final Thoughts

This blog post demonstrated a powerful approach for plotting large datasets powered using JupySQL and DuckDB. If you need to visualize large datasets, DuckDB offers unmatched simplicity and flexibility!

At [Ploomber](https://ploomber.io/), we're working on building a full-fledged SQL client for Jupyter! Exciting features like automated dataset profiling, autocompletion, and more are coming! So [keep an eye](https://twitter.com/ploomber) on [updates!](https://www.linkedin.com/company/ploomber) If there are features you think we should add to offer the best SQL experience in Jupyter, please [open an issue!](https://github.com/ploomber/jupysql/issues/new)

JupySQL is an actively maintained fork of `ipython-sql`, and it keeps full compatibility with it. If you want to learn more, check out the [GitHub repository](https://github.com/ploomber/jupysql) and the [documentation.](https://jupysql.readthedocs.io/en/latest)

#### Try It Out

To try it yourself, check out this [collab notebook](https://colab.research.google.com/drive/1FpNKAZ_fCNtjStd2aA15aaBZYRELa9Wp?usp=sharing), or here's a snippet you can paste into Jupyter:

```python
from urllib.request import urlretrieve

urlretrieve("https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2022-01.parquet", 
            "yellow_tripdata_2022-01.parquet")

%pip install jupysql duckdb-engine --quiet
%load_ext sql
%sql duckdb://
%sqlplot boxplot --table yellow_tripdata_2022-01.parquet --column trip_distance
```



Note: the commands that begin with `%` or `%%` will only work on Jupyter/IPython. If you want to try this in a regular Python session, check out the [Python API](https://jupysql.readthedocs.io/en/latest/api/python.html#sql-plot).

## Shredding Deeply Nested JSON, One Vector at a Time

**Publication date:** 2023-03-03

**Author:** Laurens Kuiper

**TL;DR:** We recently improved DuckDB's JSON extension so JSON files can be directly queried as if they were tables.

> We updated this blog post in December 2024 to reflect the changes in DuckDB's JSON syntax.

![](../images/blog/jason-duck.jpg)


DuckDB has a [JSON extension](#docs:lts:data:json:overview) that can be installed and loaded through SQL:

```sql
INSTALL 'json';
LOAD 'json';
```

The JSON extension supports various functions to create, read, and manipulate JSON strings.
These functions are similar to the JSON functionality provided by other databases such as [PostgreSQL](https://www.postgresql.org/docs/current/functions-json.html) and [MySQL](https://dev.mysql.com/doc/refman/8.0/en/json.html).
DuckDB uses [yyjson](https://github.com/ibireme/yyjson) internally to parse JSON, a high-performance JSON library written in ANSI C. Many thanks to the yyjson authors and contributors!

Besides these functions, DuckDB is now able to read JSON directly!
This is done by automatically detecting the types and column names, then converting the values within the JSON to DuckDB's vectors.
The automated schema detection dramatically simplifies working with JSON data and subsequent queries on DuckDB's vectors are significantly faster!

#### Reading JSON Automatically with DuckDB

Since [version 0.7.0](https://duckdb.org/2023/02/13/announcing-duckdb-070), DuckDB supports JSON table functions.
To demonstrate these, we will read [`todos.json`](https://duckdb.org/data/json/todos.json), a [fake TODO list](https://jsonplaceholder.typicode.com/todos) containing 200 fake TODO items (only the first two items are shown):

```json
[
  {
    "userId": 1,
    "id": 1,
    "title": "delectus aut autem",
    "completed": false
  },
  {
    "userId": 1,
    "id": 2,
    "title": "quis ut nam facilis et officia qui",
    "completed": false
  },
  ...
]
```

Each TODO item is an entry in the JSON array, but in DuckDB, we'd like a table where each entry is a row.
This is as easy as:

```sql
SELECT * FROM 'todos.json' LIMIT 5;
```

| userId | id  | title                                                           | completed |
| ------ | --- | --------------------------------------------------------------- | --------- |
| 1      | 1   | delectus aut autem                                              | false     |
| 1      | 2   | quis ut nam facilis et officia qui                              | false     |
| 1      | 3   | fugiat veniam minus                                             | false     |
| 1      | 4   | et porro tempora                                                | true      |
| 1      | 5   | laboriosam mollitia et enim quasi adipisci quia provident illum | false     |

Finding out which user completed the most TODO items is as simple as:

```sql
SELECT userId, sum(completed::INTEGER) total_completed
FROM 'todos.json'
GROUP BY userId
ORDER BY total_completed DESC
LIMIT 1;
```

| userId | total_completed |
| ------ | --------------- |
| 5      | 12              |

Under the hood, DuckDB recognizes the `.json` file extension in `'todos.json'`, and calls `read_json('todos.json')` instead.
This function is similar to our `read_csv` function, which [automatically infers column names and types for CSV files](#docs:lts:data:csv:auto_detection).

Like our other table functions, `read_json` supports reading multiple files by passing a list, e.g., `read_json(['file1.json', 'file2.json'])`, but also globbing, e.g., `read_json('file*.json')`.
DuckDB will read multiple files in parallel.

#### Newline Delimited JSON

Not all JSON adheres to the format used in `todos.json`, which is an array of 'records'.
Newline-delimited JSON, or [NDJSON](https://github.com/ndjson/ndjson-spec), stores each row on a new line.
DuckDB also supports reading (and writing!) this format.
First, let's write our TODO list as NDJSON:

```sql
COPY (SELECT * FROM 'todos.json') to 'todos2.json';
```

Again, DuckDB recognizes the `.json` suffix in the output file and automatically infers that we mean to use `(FORMAT json)`.
The created file looks like this (only the first two records are shown):

```json
{"userId":1,"id":1,"title":"delectus aut autem","completed":false}
{"userId":1,"id":2,"title":"quis ut nam facilis et officia qui","completed":false}
...
```

DuckDB can read this file in exactly the same way as the original one:

```sql
SELECT * FROM 'todos2.json';
```

If your JSON file is newline-delimited, DuckDB can parallelize reading.
This can be specified by calling `read_ndjson` or passing the `records = true` parameter to `read_json`:

```sql
SELECT * FROM read_ndjson('todos2.json');
SELECT * FROM read_json('todos2.json', records = true);
```

You can also set `records = auto` to auto-detect whether the JSON file is newline-delimited.

#### Other JSON Formats

When the `read_json` function is used directly, the format of the JSON can be specified using the `format` parameter.
This parameter defaults to `'auto'`, which tells DuckDB to infer what kind of JSON we are dealing with.
The first `format` is `'array'`, while the second is `'nd'`.
This can be specified like so:

```sql
SELECT * FROM read_json('todos.json', format = 'array');
SELECT * FROM read_json('todos2.json', format = 'nd');
```

Another supported format is `unstructured`. With this format, records are not required to be a JSON object but can also be a JSON array, string, or anything supported in JSON.

#### Manual Schemas

What you may also have noticed is the `auto_detect` parameter.
This parameter tells DuckDB to infer the schema, i.e., determine the names and types of the returned columns.
These can manually be specified like so:

```sql
SELECT *
FROM read_json(
    'todos.json',
    columns = {userId: 'INTEGER', id: 'INTEGER', title: 'VARCHAR', completed: 'BOOLEAN'},
    format = 'array'
);
```

You don't have to specify all fields, just the ones you're interested in:

```sql
SELECT *
FROM read_json(
    'todos.json',
    columns = {userId: 'INTEGER', completed: 'BOOLEAN'},
    format = 'array'
);
```

Now that we know how to use the new DuckDB JSON table functions let's dive into some analytics!

#### GitHub Archive Examples

[GH Archive](https://www.gharchive.org) is a project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis.
Every hour, a GZIP compressed, newline-delimited JSON file containing all public events on GitHub is uploaded.
I downloaded a whole day (2023-02-08) of activity using `wget` and stored the 24 files (starting with [`2023-02-08-0.json.gz`](https://data.gharchive.org/2023-02-08-0.json.gz)) in a directory called `gharchive_gz`.
You can get the full day's archive as [`gharchive-2023-02-08.zip`](https://blobs.duckdb.org/data/gharchive-2023-02-08.zip):

Keep in mind that the data is compressed:

```bash
du -sh gharchive_gz
```

```text
2.3G  gharchive_gz
```

Decompressed, one day's worth of GitHub activity amounts to more than 18 GB of JSON.

```bash
gunzip -dc gharchive_gz/* | wc -c
```

```text
18396198934
```

To get a feel of what the data looks like, we run the following query:

```sql
SELECT json_group_structure(json)
FROM (
    SELECT *
    FROM read_ndjson_objects('gharchive_gz/*.json.gz')
    LIMIT 2048
);
```

Here, we use our `read_ndjson_objects` function, which reads the JSON objects in the file as raw JSON, i.e., as strings.
The query reads the first 2048 records of JSON from the JSON files `gharchive_gz` directory and describes the structure.
You can also directly query the JSON files from GH Archive using DuckDB's [`httpfs` extension](#docs:lts:core_extensions:httpfs:overview), but we will be querying the files multiple times, so it is better to download them in this case.

> **Tip.** In the CLI client, you can use `.mode line` to make the output easier to read.

I formatted the result using [an online JSON formatter & validator](https://jsonformatter.curiousconcept.com/):

```json
{
   "id":"VARCHAR",
   "type":"VARCHAR",
   "actor":{
      "id":"UBIGINT",
      "login":"VARCHAR",
      "display_login":"VARCHAR",
      "gravatar_id":"VARCHAR",
      "url":"VARCHAR",
      "avatar_url":"VARCHAR"
   },
   "repo":{
      "id":"UBIGINT",
      "name":"VARCHAR",
      "url":"VARCHAR"
   },
   "payload":{"..."},
   "public":"BOOLEAN",
   "created_at":"VARCHAR",
   "org":{
      "id":"UBIGINT",
      "login":"VARCHAR",
      "gravatar_id":"VARCHAR",
      "url":"VARCHAR",
      "avatar_url":"VARCHAR"
   }
}
```

I left `"payload"` out because it consists of deeply nested JSON, and its formatted structure takes up more than 1000 lines!

So, how many records are we dealing with exactly? Let's count it using DuckDB:

```sql
SELECT count(*) AS count
FROM 'gharchive_gz/*.json.gz';
```

|   count |
| ------: |
| 4434953 |

That's around 4.4M daily events, which amounts to almost 200K events per hour.
This query takes around 7.3 seconds on my laptop, a 2020 MacBook Pro with an M1 chip and 16 GB of memory.
This is the time it takes to decompress the GZIP compression and parse every JSON record.

To see how much time is spent decompressing GZIP in the query, I also created a `gharchive` directory containing the same data but uncompressed.
Running the same query on the uncompressed data takes around 5.4 seconds, almost 2 seconds faster.
We got faster, but we also read more than 18 GB of data from storage, as opposed to 2.3 GB when it was compressed.
So, this comparison really depends on the speed of your storage.
I prefer to keep the data compressed.

As a side note, the speed of this query really shows how fast yyjson is!

So, what kind of events are in the GitHub data?

```sql
SELECT type, count(*) AS count
FROM 'gharchive_gz/*.json.gz'
GROUP BY type
ORDER BY count DESC;
```

| type                          |   count |
| ----------------------------- | ------: |
| PushEvent                     | 2359096 |
| CreateEvent                   |  624062 |
| PullRequestEvent              |  366090 |
| IssueCommentEvent             |  238660 |
| WatchEvent                    |  231486 |
| DeleteEvent                   |  154383 |
| PullRequestReviewEvent        |  131107 |
| IssuesEvent                   |   88917 |
| PullRequestReviewCommentEvent |   79540 |
| ForkEvent                     |   64233 |
| CommitCommentEvent            |   36823 |
| ReleaseEvent                  |   23004 |
| MemberEvent                   |   14872 |
| PublicEvent                   |   14500 |
| GollumEvent                   |    8180 |

This query takes around 7.4 seconds, not much more than the `count(*)` query.
So as we can see, data analysis is very fast once everything has been decompressed and parsed.

The most common event type is the [`PushEvent`](https://docs.github.com/en/developers/webhooks-and-events/events/github-event-types#pushevent), taking up more than half of all events, unsurprisingly, which is people pushing their committed code to GitHub.
The least common event type is the [`GollumEvent`](https://docs.github.com/en/developers/webhooks-and-events/events/github-event-types#gollumevent), taking up less than 1% of all events, which is a creation or update of a wiki page.

If we want to analyze the same data multiple times, decompressing and parsing every time is redundant and slows down the analysis.
Instead, we can create a DuckDB table like so:

```sql
CREATE TABLE events AS
    SELECT * EXCLUDE (payload)
    FROM 'gharchive_gz/*.json.gz';
```

This takes around 9 seconds if you're using an in-memory database.
If you're using an on-disk database, this takes around 13 seconds and results in a database size of 444 MB.
When using an on-disk database, DuckDB ensures the table is persistent and performs [all kinds of compression](https://duckdb.org/2022/10/28/lightweight-compression).
Note that we have temporarily ignored the `payload` field using the convenient `EXCLUDE` clause.

To get a feel of what we've read, we can ask DuckDB to describe the table:

```sql
DESCRIBE SELECT * FROM events;
```

This gives us the following:



| cid | name       | type                                                                                                           | notnull | dflt_value | pk    |
| --- | ---------- | -------------------------------------------------------------------------------------------------------------- | ------- | ---------- | ----- |
| 0   | id         | BIGINT                                                                                                         | false   |            | false |
| 1   | type       | VARCHAR                                                                                                        | false   |            | false |
| 2   | actor      | STRUCT(id UBIGINT, login VARCHAR, display_login VARCHAR, gravatar_id VARCHAR, url VARCHAR, avatar_url VARCHAR) | false   |            | false |
| 3   | repo       | STRUCT(id UBIGINT, name VARCHAR, url VARCHAR)                                                                  | false   |            | false |
| 4   | public     | BOOLEAN                                                                                                        | false   |            | false |
| 5   | created_at | TIMESTAMP                                                                                                      | false   |            | false |
| 6   | org        | STRUCT(id UBIGINT, login VARCHAR, gravatar_id VARCHAR, url VARCHAR, avatar_url VARCHAR)                        | false   |            | false |

As we can see, the `"actor"`, `"repo"` and `"org"` fields, which are JSON objects, have been converted to DuckDB structs.
The `"id"` column was a string in the original JSON but has been converted to a `BIGINT` by DuckDB's automatic type detection.
DuckDB can also detect a few different `DATE`/`TIMESTAMP` formats within JSON strings, as well as `TIME` and `UUID`.

Now that we created the table, we can analyze it like any other DuckDB table!
Let's see how much activity there was in the [`duckdb/duckdb` GitHub repository](https://github.com/duckdb/duckdb) on this specific day:

```sql
SELECT type, count(*) AS count
FROM events
WHERE repo.name = 'duckdb/duckdb'
GROUP BY type
ORDER BY count DESC;
```

| type                          | count |
| ----------------------------- | ----: |
| PullRequestEvent              |    35 |
| IssueCommentEvent             |    30 |
| WatchEvent                    |    29 |
| PushEvent                     |    15 |
| PullRequestReviewEvent        |    14 |
| IssuesEvent                   |     9 |
| PullRequestReviewCommentEvent |     7 |
| ForkEvent                     |     3 |

That's a lot of pull request activity!
Note that this doesn't mean that 35 pull requests were opened on this day, activity within a pull request is also counted.
If we [search through the pull requests for that day](https://github.com/duckdb/duckdb/pulls?q=is%3Apr+created%3A2023-02-08+), we see that there are only 15.
This is more activity than normal because most of the DuckDB developers were busy fixing bugs for the 0.7.0 release.

Now, let's see who was the most active:

```sql
SELECT actor.login, count(*) AS count
FROM events
WHERE repo.name = 'duckdb/duckdb'
  AND type = 'PullRequestEvent'
GROUP BY actor.login
ORDER BY count desc
LIMIT 5;
```

| login    | count |
| -------- | ----: |
| Mytherin |    19 |
| Mause    |     4 |
| carlopi  |     3 |
| Tmonster |     2 |
| lnkuiper |     2 |

As expected, Mark Raasveldt (Mytherin, co-founder of DuckDB Labs) was the most active!
My activity (lnkuiper, software engineer at DuckDB Labs) also shows up.

#### Handling Inconsistent JSON Schemas

So far, we have ignored the `"payload"` of the events.
We ignored it because the contents of this field are different based on the type of event.
We can see how they differ with the following query:

```sql
SELECT json_group_structure(payload) AS structure
FROM (
    SELECT *
    FROM read_json(
        'gharchive_gz/*.json.gz',
        columns = {
            id: 'BIGINT',
            type: 'VARCHAR',
            actor: 'STRUCT(id UBIGINT,
                          login VARCHAR,
                          display_login VARCHAR,
                          gravatar_id VARCHAR,
                          url VARCHAR,
                          avatar_url VARCHAR)',
            repo: 'STRUCT(id UBIGINT, name VARCHAR, url VARCHAR)',
            payload: 'JSON',
            public: 'BOOLEAN',
            created_at: 'TIMESTAMP',
            org: 'STRUCT(id UBIGINT, login VARCHAR, gravatar_id VARCHAR, url VARCHAR, avatar_url VARCHAR)'
        },
        records = true
    )
    WHERE type = 'WatchEvent'
    LIMIT 2048
);
```



| structure            |
| -------------------- |
| {"action":"VARCHAR"} |

The `"payload"` field is simple for events of type `WatchEvent`.
However, if we change the type to `PullRequestEvent`, we get a JSON structure of more than 500 lines when formatted with a JSON formatter.
We don't want to look through all those fields, so we cannot use our automatic schema detection, which will try to get them all.
Instead, we can manually supply the structure of the fields we're interested in.
DuckDB will skip reading the other fields.
Another approach is to store the `"payload"` field as DuckDB's JSON data type and parse it at query time (see the example later in this post!).

I stripped down the JSON structure for the `"payload"` of events with the type `PullRequestEvent` to the things I'm actually interested in:

```json
{
   "action":"VARCHAR",
   "number":"UBIGINT",
   "pull_request":{
      "url":"VARCHAR",
      "id":"UBIGINT",
      "title":"VARCHAR",
      "user":{
         "login":"VARCHAR",
         "id":"UBIGINT",
      },
      "body":"VARCHAR",
      "created_at":"TIMESTAMP",
      "updated_at":"TIMESTAMP",
      "assignee":{
         "login":"VARCHAR",
         "id":"UBIGINT",
      },
      "assignees":[
         {
            "login":"VARCHAR",
            "id":"UBIGINT",
         }
      ],
  }
}
```

This is technically not valid JSON because there are trailing commas.
However, we try to [allow trailing commas wherever possible](https://duckdb.org/2022/05/04/friendlier-sql#trailing-commas) in DuckDB, including JSON!

We can now plug this into the `columns` parameter of `read_json`, but we need to convert it to a DuckDB type first.
I'm lazy, so I prefer to let DuckDB do this for me:

```sql
SELECT typeof(
    json_transform('{}', '{
        "action":"VARCHAR",
        "number":"UBIGINT",
        "pull_request":{
          "url":"VARCHAR",
          "id":"UBIGINT",
          "title":"VARCHAR",
          "user":{
              "login":"VARCHAR",
              "id":"UBIGINT",
          },
          "body":"VARCHAR",
          "created_at":"TIMESTAMP",
          "updated_at":"TIMESTAMP",
          "assignee":{
              "login":"VARCHAR",
              "id":"UBIGINT",
          },
          "assignees":[
              {
                "login":"VARCHAR",
                "id":"UBIGINT",
              }
          ],
        }
    }')
);
```

This gives us back a DuckDB type that we can plug the type into our function!
Note that because we are not auto-detecting the schema, we have to supply `timestampformat` to be able to parse the timestamps correctly.
The key `"user"` must be surrounded by quotes because it is a reserved keyword in SQL:

```sql
CREATE OR REPLACE TABLE pr_events AS
    SELECT *
    FROM read_json(
        'gharchive_gz/*.json.gz',
        columns = {
            id: 'BIGINT',
            type: 'VARCHAR',
            actor: 'STRUCT(id UBIGINT,
                          login VARCHAR,
                          display_login VARCHAR,
                          gravatar_id VARCHAR,
                          url VARCHAR,
                          avatar_url VARCHAR)',
            repo: 'STRUCT(id UBIGINT, name VARCHAR, url VARCHAR)',
            payload: 'STRUCT(
                        action VARCHAR,
                        number UBIGINT,
                        pull_request STRUCT(
                          url VARCHAR,
                          id UBIGINT,
                          title VARCHAR,
                          "user" STRUCT(
                            login VARCHAR,
                            id UBIGINT
                          ),
                          body VARCHAR,
                          created_at TIMESTAMP,
                          updated_at TIMESTAMP,
                          assignee STRUCT(login VARCHAR, id UBIGINT),
                          assignees STRUCT(login VARCHAR, id UBIGINT)[]
                        )
                      )',
            public: 'BOOLEAN',
            created_at: 'TIMESTAMP',
            org: 'STRUCT(id UBIGINT, login VARCHAR, gravatar_id VARCHAR, url VARCHAR, avatar_url VARCHAR)'
        },
        format = 'newline_delimited',
        records = true,
        timestampformat = '%Y-%m-%dT%H:%M:%SZ'
    )
    WHERE type = 'PullRequestEvent';
```

This query completes in around 36 seconds with an on-disk database (resulting size is 478 MB) and 9 seconds with an in-memory database.
If you don't care about preserving insertion order, you can speed the query up with this setting:

```sql
SET preserve_insertion_order = false;
```

With this setting, the query completes in around 27 seconds with an on-disk database and 8.5 seconds with an in-memory database.
The difference between the on-disk and in-memory case is quite substantial here because DuckDB has to compress and persist much more data.

Now we can analyze pull request events! Let's see what the maximum number of assignees is:

```sql
SELECT max(length(payload.pull_request.assignees)) AS max_assignees
FROM pr_events;
```

| max_assignees |
| ------------: |
|            10 |

That's a lot of people reviewing a single pull request!

We can check who was assigned the most:

```sql
WITH assignees AS (
    SELECT payload.pull_request.assignee.login AS assignee
    FROM pr_events
    UNION ALL
    SELECT unnest(payload.pull_request.assignees).login AS assignee
    FROM pr_events
)
SELECT assignee, count(*) AS count
FROM assignees
WHERE assignee NOT NULL
GROUP BY assignee
ORDER BY count DESC
LIMIT 5;
```

| assignee        | count |
| --------------- | ----: |
| poad            |   494 |
| vinayakkulkarni |   268 |
| tmtmtmtm        |   198 |
| fisker          |    98 |
| icemac          |    84 |

That's a lot of assignments – although I suspect there are duplicates in here.

#### Storing as JSON to Parse at Query Time

Specifying the JSON schema of the `"payload"` field was helpful because it allowed us to directly analyze what is there, and subsequent queries are much faster.
Still, it can also be quite cumbersome if the schema is complex.
If you don't want to specify the schema of a field, you can set the type as `'JSON'`:

```sql
CREATE OR REPLACE TABLE pr_events AS
    SELECT *
    FROM read_json(
        'gharchive_gz/*.json.gz',
        columns = {
            id: 'BIGINT',
            type: 'VARCHAR',
            actor: 'STRUCT(id UBIGINT,
                          login VARCHAR,
                          display_login VARCHAR,
                          gravatar_id VARCHAR,
                          url VARCHAR,
                          avatar_url VARCHAR)',
            repo: 'STRUCT(id UBIGINT, name VARCHAR, url VARCHAR)',
            payload: 'JSON',
            public: 'BOOLEAN',
            created_at: 'TIMESTAMP',
            org: 'STRUCT(id UBIGINT, login VARCHAR, gravatar_id VARCHAR, url VARCHAR, avatar_url VARCHAR)'
        },
        format = 'newline_delimited',
        records = true,
        timestampformat = '%Y-%m-%dT%H:%M:%SZ'
    )
    WHERE type = 'PullRequestEvent';
```

This will load the `"payload"` field as a JSON string, and we can use DuckDB's JSON functions to analyze it when querying.
For example:

```sql
SELECT DISTINCT payload->>'action' AS action, count(*) AS count
FROM pr_events
GROUP BY action
ORDER BY count DESC;
```

The `->>` arrow is short-hand for our `json_extract_string` function.
Creating the entire `"payload"` field as a column with type `JSON` is not the most efficient way to get just the `"action"` field, but this example is just to show the flexibility of `read_json`.
The query results in the following table:

| action   |  count |
| -------- | -----: |
| opened   | 189096 |
| closed   | 174914 |
| reopened |   2080 |

As we can see, only a few pull requests have been reopened.

#### Conclusion

DuckDB tries to be an easy-to-use tool that can read all kinds of data formats.
In the 0.7.0 release, we have added support for reading JSON.
JSON comes in many formats and all kinds of schemas.
DuckDB's rich support for nested types (` LIST`, `STRUCT`) allows it to fully “shred” the JSON to a columnar format for more efficient analysis.

We are excited to hear what you think about our new JSON functionality.
If you have any questions or suggestions, please reach out to us on [Discord](https://discord.com/invite/tcvwpjfnZx) or [GitHub](https://github.com/duckdb/duckdb)!

## The Return of the H2O.ai Database-like Ops Benchmark

**Publication date:** 2023-04-14

**Author:** Tom Ebergen

**TL;DR:** We've resurrected the H2O.ai database-like ops benchmark with up to date libraries and plan to keep re-running it.

[Skip directly to the results](#results)

> We published a new blog post on the H2O.ai benchmark in November 2023 and improved the benchmark setup for reproducibility.
> For details, see the new post: ["Updates to the H2O.ai db-benchmark!"]({% post_url 2023-11-03-db-benchmark-update %})

The H2O.ai [Database-like Ops Benchmark](https://h2oai.github.io/db-benchmark/) is a well-known benchmark in the data analytics and R community. The benchmark measures the groupby and join performance of various analytical tools like data.table, Polars, dplyr, ClickHouse, DuckDB and more. Since July 2nd 2021, the benchmark has been dormant, with no result updates or maintenance. Many of the analytical systems measured in the benchmark have since undergone substantial improvements, leaving many of the maintainers curious as to where their analytical tool ranks on the benchmark.

DuckDB has decided to give the H2O.ai benchmark new life and maintain it for the foreseeable future. One reason the DuckDB project has decided to maintain the benchmark is because DuckDB has had 10 new minor releases since the most recent published results on July 2nd, 2021. After managing to run parts of the benchmark on a r3-8xlarge AWS box, DuckDB ranked as a top performer on the benchmark. Additionally, the DuckDB project wants to demonstrate it's commitment to performance by consistently comparing DuckDB with other analytical systems. While DuckDB delivers differentiated ease of use, raw performance and scalability are critically important for solving tough problems fast. Plus, just like many of our fellow data folks, we have a need for speed. Therefore, the decision was made to fork the benchmark, modernize underlying dependencies and run the benchmark on the latest versions of the included systems. You can find the repository on [GitHub](https://github.com/duckdblabs/db-benchmark).

The results of the new benchmark are very interesting, but first a quick summary of the benchmark and what updates took place.

#### The H2O.ai Database-like Ops Benchmark

There are 5 basic grouping tests and 5 advanced grouping tests. The 10 grouping queries all focus on a combination of the following

- Low cardinality (a few big groups)
- High cardinality (lots of very small groups)
- Grouping integer types
- Grouping string types

Each query is run only twice with both results being reported. This way we can see the performance of a cold run and any effects data caching may have. The idea is to avoid reporting any potential "best" results on a hot system. Data analysts only need to run a query once to get their answer. No one drives to the store a second time to get another litre of milk faster.

The time reported is the sum of the time it takes to run all 5 queries twice.

More information about the specific queries can be found below.

##### The Data and Queries

The queries have not changed since the benchmark went dormant. The data is generated in a rather simple manner. Inspecting the datagen files you can see that the columns are generated with small, medium, and large groups of char and int values. Similar generation logic applies to the join data generation.

| Query               | SQL                                                                                                                                                                                | Objective                                                                 |
| ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------- |
| groupby #1          | `SELECT id1, sum(v1) AS v1 FROM tbl GROUP BY id1`                                                                                                                                  | Sum over large cardinality groups, grouped by varchar                     |
| groupby #2          | `SELECT id1, id2, sum(v1) AS v1 FROM tbl GROUP BY id1, id2`                                                                                                                        | Sum over medium cardinality groups, grouped by varchars                   |
| groupby #3          | `SELECT id3, sum(v1) AS v1, mean(v3) AS v3 FROM tbl GROUP BY id3`                                                                                                                  | Sum and mean over many small cardinality groups, grouped by varchar       |
| groupby #4          | `SELECT id4, mean(v1) AS v1, mean(v2) AS v2, mean(v3) AS v3 FROM tbl GROUP BY id4`                                                                                                 | Mean over many large cardinality groups, grouped by integer               |
| groupby #5          | `SELECT id6, sum(v1) AS v1, sum(v2) AS v2, sum(v3) AS v3 FROM tbl GROUP BY id6`                                                                                                    | Sum over many small groups, grouped by integer                            |
| advanced groupby #1 | `SELECT id4, id5, quantile_cont(v3, 0.5) AS median_v3, stddev(v3) AS sd_v3 FROM tbl GROUP BY id4, id5`                                                                             | `quantile_cont` over medium cardinality group, grouped by integers        |
| advanced groupby #2 | `SELECT id3, max(v1)-min(v2) AS range_v1_v2 FROM tbl GROUP BY id3`                                                                                                                 | Range selection over small cardinality groups, grouped by integer         |
| advanced groupby #3 | `SELECT id6, v3 AS largest2_v3 FROM (SELECT id6, v3, row_number() OVER (PARTITION BY id6 ORDER BY v3 DESC) AS order_v3 FROM x WHERE v3 IS NOT NULL) sub_query WHERE order_v3 <= 2` | Advanced group by query                                                   |
| advanced groupby #4 | `SELECT id2, id4, pow(corr(v1, v2), 2) AS r2 FROM tbl GROUP BY id2, id4`                                                                                                           | Arithmetic over medium sized groups, grouped by varchar, integer.         |
| advanced groupby #5 | `SELECT id1, id2, id3, id4, id5, id6, sum(v3) AS v3, count(*) AS count FROM tbl GROUP BY id1, id2, id3, id4, id5, id6`                                                             | Many small groups, the number of groups is the cardinality of the dataset |
| join #1             | `SELECT x.*, small.id4 AS small_id4, v2 FROM x JOIN small USING (id1)`                                                                                                             | Joining a large table (x) with a small-sized table on integer type        |
| join #2             | `SELECT x.*, medium.id1 AS medium_id1, medium.id4 AS medium_id4, medium.id5 AS medium_id5, v2 FROM x JOIN medium USING (id2)`                                                      | Joining a large table (x) with a medium-sized table on integer type       |
| join #3             | `SELECT x.*, medium.id1 AS medium_id1, medium.id4 AS medium_id4, medium.id5 AS medium_id5, v2 FROM x LEFT JOIN medium USING (id2)`                                                 | Left join a large table (x) with a medium-sized table on integer type     |
| join #4             | `SELECT x.*, medium.id1 AS medium_id1, medium.id2 AS medium_id2, medium.id4 AS medium_id4, v2 FROM x JOIN medium USING (id5)`                                                      | Join a large table (x) with a medium table on varchar type                |
| join #5             | `SELECT x.*, big.id1 AS big_id1, big.id2 AS big_id2, big.id4 AS big_id4, big.id5 AS big_id5, big.id6 AS big_id6, v2 FROM x JOIN big USING (id3)`                                   | Join a large table (x) with a large table on integer type.                |

You can find more information about the queries in the [Efficiency of Data Processing](https://jangorecki.gitlab.io/r-talks/2019-12-26_Mumbai_Efficiency-in-data-processing/Efficiency-in-data-processing.pdf) slides.

##### Modifications to the Benchmark & Hardware

No modifications have been made to the queries or the data generation. Some scripts required minor modifications so that the current version of the library could be run. The hardware used is slightly different as the exact AWS offering the benchmark previously used is no longer available. Base libraries have been updated as well. GPU libraries were not tested.

AWS is an [m4.10xlarge](https://aws.amazon.com/ec2/instance-types/) instance:

- CPU model: Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40 GHz
- CPU cores: 40
- RAM model: Unknown
- Memory: 160 GB
- NO GPU specifications
- R upgraded, 4.0.0 -> 4.2.2
- Python upgraded 3.\[6\|7\] -> 3.10

##### Changes Made to Install Scripts of Other Systems

Pandas, Polars, Dask, and ClickHouse required changes to their setup/install scripts. The changes were relatively minor consisting mostly of syntax updates and data ingestion updates. Data ingestion did not affect the reporting timing results.

#### Results



You can also look at the [results](https://duckdblabs.github.io/db-benchmark/). DuckDB's timings have improved significantly since v0.2.7 (released over two years ago). A major contributor to our increased performance is [parallel grouped aggregation](https://duckdb.org/2022/03/07/aggregate-hashtable), merged in March 2022, and [parallel result set materialization](https://github.com/duckdb/duckdb/pull/3700). In addition, DuckDB now supports [enum types](https://duckdb.org/2021/11/26/duck-enum), which makes DuckDB `group by` aggregation even faster. [Improvements to the out-of-core hash join](https://github.com/duckdb/duckdb/pull/4970) were merged as well, further improving the performance of our joins.

#### Questions about Certain Results?

Some solutions may report internal errors for some queries. Feel free to investigate the errors by using the [`repro.sh` scripts](https://github.com/duckdblabs/db-benchmark/blob/main/_setup_utils/repro.sh) and file a GitHub issue to resolve any confusion. In addition, there are many areas in the code where certain query results are automatically nullified. If you believe that is the case for a query for your system or if you have any other questions, you can create a GitHub issue to discuss.

#### Maintenance Plan

DuckDB will continue to maintain this benchmark for the foreseeable future. The process for re-running the benchmarks with updated library versions must still be decided.

Do you have any other questions? Would you like to have your system added to the benchmark? Please feel free to read the README in the [repository](https://github.com/duckdblabs/db-benchmark), and if you still have questions, you can reach out to me at [tom@duckdblabs.com](#mailto:tom@duckdblabs.com) or on our [Discord](https://discord.duckdb.org/)!

## Introducing DuckDB for Swift

**Publication date:** 2023-04-21

**Author:** Tristan Celder

**TL;DR:** DuckDB now has a native Swift API. DuckDB on mobile here we go!

Today we’re excited to announce the [DuckDB API for Swift](https://github.com/duckdb/duckdb-swift). It enables developers on Swift platforms to harness the full power of DuckDB using a native Swift interface with support for great Swift features such as strong typing and concurrency. The API is available not only on Apple platforms, but on Linux too, opening up new opportunities for the growing Swift on Server ecosystem.

#### What’s Included

DuckDB is designed to be fast, reliable and easy to use, and it’s this philosophy that also guided the creation of our new Swift API.

This initial release supports many of the great features of DuckDB right out of the box, including:

- Queries via DuckDB’s enhanced SQL dialect: In addition to basic SQL, DuckDB supports arbitrary and nested correlated subqueries, window functions, collations, complex types (Swift arrays and structs), and more.
- Import and export of JSON, CSV, and Parquet files: Beyond its built-in and super-efficient native file format, DuckDB supports reading in, and exporting out to, JSON, CSV, and Parquet files.
- Strongly typed result sets: DuckDB’s strongly typed result sets are a natural fit for Swift. It’s simple to cast DuckDB columns to their native Swift equivalents, ready for presentation using SwiftUI or as part of an existing TabularData workflow.
- Swift concurrency support: by virtue of their `Sendable` conformance, many of DuckDB’s core underlying types can be safely passed across concurrency contexts, easing the process of designing parallel processing workflows and ensuring responsive UIs.

#### Usage

To demonstrate just how well DuckDB works together with Swift, we’ve created an example project that uses raw data from [NASA’s Exoplanet Archive](https://exoplanetarchive.ipac.caltech.edu) loaded directly into DuckDB.

You’ll see how to:

- Instantiate a DuckDB in-memory Database and Connection
- Populate a DuckDB table with the contents of a remote CSV
- Query a DuckDB database and prepare the results for presentation

Finally, we’ll present our analysis with the help of Apple’s [TabularData Framework](https://developer.apple.com/documentation/tabulardata) and [Swift Charts](https://developer.apple.com/documentation/charts).

##### Instantiating DuckDB

DuckDB supports both file-based and in-memory databases. In this example, as we don’t intend to persist the results of our Exoplanet analysis to disk, we’ll opt for an in-memory Database.

```swift
let database = try Database(store: .inMemory)
```

However, we can’t issue queries just yet. Much like other RDMSs, queries must be issued through a _database connection_. DuckDB supports multiple connections per database. This can be useful to support parallel processing, for example. In our project, we’ll need just the one connection that we’ll eventually access asynchronously.

```swift
let connection = try database.connect()
```

Finally, we’ll create an app-specific type that we’ll use to house our database and connection and through which we’ll eventually define our app-specific queries.

```swift
import DuckDB

final class ExoplanetStore {

    let database: Database
    let connection: Connection

    init(database: Database, connection: Connection) {
        self.database = database
        self.connection = connection
    }
}
```

##### Populating DuckDB with a Remote CSV File

One problem with our current `ExoplanetStore` type is that it doesn’t yet contain any data to query. To fix that, we’ll load it with the data of every Exoplanet discovered to date from [NASA’s Exoplanet Archive](https://exoplanetarchive.ipac.caltech.edu).

There are hundreds of configuration options for this incredible resource, but today we want each exoplanet’s name and its discovery year packaged as a CSV. [Checking the docs](https://exoplanetarchive.ipac.caltech.edu/docs/API_PS_columns.html) gives us the following endpoint:

```text
https://exoplanetarchive.ipac.caltech.edu/TAP/sync?query=select+pl_name+,+disc_year+from+pscomppars&format=csv
```

Once we have our CSV downloaded locally, we can use the following SQL command to load it as a new table within our DuckDB in-memory database. DuckDB’s `read_csv_auto` command automatically infers our table schema and the data is immediately available for analysis.

```sql
CREATE TABLE exoplanets AS
    SELECT * FROM read_csv_auto('downloaded_exoplanets.csv'); 
```

Let’s package this up as a new asynchronous factory method on our `ExoplanetStore` type:

```swift
import DuckDB
import Foundation

final class ExoplanetStore {

    // Factory method to create and prepare a new ExoplanetStore
    static func create() async throws -> ExoplanetStore {

        // Create our database and connection as described above
        let database = try Database(store: .inMemory)
        let connection = try database.connect()

        // Download the CSV from the exoplanet archive
        let (csvFileURL, _) = try await URLSession.shared.download(
            from: URL(string: "https://exoplanetarchive.ipac.caltech.edu/TAP/sync?query=select+pl_name+,+disc_year+from+pscomppars&format=csv")!)

        // Issue our first query to DuckDB
        try connection.execute("""
            CREATE TABLE exoplanets AS (
                SELECT * FROM read_csv_auto('\(csvFileURL.path)')
            );
            """)

        // Create our pre-populated ExoplanetStore instance
        return ExoplanetStore(
            database: database,
            connection: connection
        )
    }

    // Let's make the initializer we defined previously 
    // private. This prevents anyone accidentally instantiating
    // the store without having pre-loaded our Exoplanet CSV
    // into the database
    private init(database: Database, connection: Connection) {
        // ...
    }
}
```

##### Querying the Database

Now that the database is populated with data, it’s ready to be analyzed. Let’s create a query which we can use to plot a chart of the number of exoplanets discovered by year.

```sql
SELECT disc_year, count(disc_year) AS Count
FROM exoplanets
GROUP BY disc_year
ORDER BY disc_year;
```

Issuing the query to DuckDB  from within Swift is simple. We’ll again make use of an async function from which to issue our query. This means the callee won’t be blocked while the query is executing. We’ll then cast the result columns to Swift native types using DuckDB’s `ResultSet` `cast(to:)` family of methods, before finally wrapping them up in a `DataFrame` from the TabularData framework ready for presentation in the UI.

```swift
...

import TabularData

extension ExoplanetStore {

    // Retrieves the number of exoplanets discovered by year  
    func groupedByDiscoveryYear() async throws -> DataFrame {

        // Issue the query we described above
        let result = try connection.query("""
            SELECT disc_year, count(disc_year) AS Count
            FROM exoplanets
            GROUP BY disc_year
            ORDER BY disc_year
            """)

        // Cast our DuckDB columns to their native Swift
        // equivalent types
        let discoveryYearColumn = result[0].cast(to: Int.self)
        let countColumn = result[1].cast(to: Int.self)

        // Use our DuckDB columns to instantiate TabularData
        // columns and populate a TabularData DataFrame
        return DataFrame(columns: [
            TabularData.Column(discoveryYearColumn)
                .eraseToAnyColumn(),
            TabularData.Column(countColumn)
                .eraseToAnyColumn(),
        ])
    }
}
```

##### Visualizing the Results

In just a few lines of code, our database has been created, populated and analyzed – all that’s left to do now is present the results.

![](../images/blog/iphone-simulator-screen-shot.png)

And I have a feeling that we’re just getting started…

For the complete example project – including the SwiftUI views and Chart definitions used to create the screenshot above – clone [the DuckDB Swift repo](https://github.com/duckdb/duckdb-swift) and open up the runnable app project located in `Examples/SwiftUI/ExoplanetExplorer.xcodeproj`. 

We encourage you to modify the code, explore the Exoplanet Archive and DuckDB, and make some discoveries of your own – interplanetary or otherwise!

#### Conclusion

In this article we’ve introduced the brand new Swift API for DuckDB and demonstrated how quickly you can get up and running analyzing data.

With DuckDB’s incredible performance and analysis capabilities and Swift’s vibrant eco-system and platform support, there’s never been a better time to begin exploring analytical datasets in Swift.

We can’t wait to see what you do with it. Feel free to reach out on our [Discord](https://discord.duckdb.org) if you have any questions!

----

The Swift API for DuckDB is packaged using Swift Package Manager and lives in a new top-level repository available at [https://github.com/duckdb/duckdb-swift](https://github.com/duckdb/duckdb-swift).

## PostGEESE? Introducing The DuckDB Spatial Extension

**Publication date:** 2023-04-28

**Author:** Max Gabrielsson

**TL;DR:** DuckDB now has an official [Spatial extension](https://github.com/duckdb/duckdb-spatial) to enable geospatial processing.

Geospatial data has become increasingly important and prevalent in modern-day applications and data engineering workflows, with use-cases ranging from location-based services to environmental monitoring.

While there are many great and specialized tools for working with geospatial data, integrating geospatial capabilities directly into DuckDB has multiple advantages. For one, you get to operate, transform and join your geospatial data alongside your regular, unstructured or time-series data using DuckDBs rich type system and extensions like `JSON` and `ICU`. Secondly, spatial queries involving geometric predicates and relations translate surprisingly well to SQL, which is all about expressing relations after all! Not to mention all the other benefits provided by DuckDB such as transactional semantics, high performance multi-threaded vectorized execution and larger-than-memory data processing.

Therefore, we're very excited to announce that DuckDB now has a [Spatial extension](https://github.com/duckdb/duckdb-spatial) packed with features easily installable from the DuckDB CLI and other DuckDB clients. Simply execute:

```sql
INSTALL spatial;
LOAD spatial;
```

And you're good to go!

*No, we're not calling it GeoDuck either, [that's just gross](https://en.wikipedia.org/wiki/Geoduck).*

#### What's in It?

The core of the extension is a `GEOMETRY` type based on the "Simple Features" geometry model and accompanying functions such as `ST_Area`, `ST_Intersects`. It also provides methods for reading and writing geospatial data formats and converting between coordinate reference systems (details later in the post!). While we're not ready to commit to full compliance with the OGC Simple Feature Access and SQL/MM Standards yet, if you've worked with geospatial functionality in other database systems such as [PostGIS](https://postgis.net) or [SpatiaLite](https://www.gaia-gis.it/fossil/libspatialite/index), you should feel right at home.

Most of the implemented functions are based on the trifecta of foundational geospatial libraries, [GEOS](https://libgeos.org), [GDAL](https://gdal.org/) and [PROJ](https://proj.org/), which provide algorithms, format conversions and coordinate reference system transformations respectively. In particular, we leverage GDAL to provide a set of table and copy functions that enable import and export of tables from and to 50+ different geospatial data formats (so far!), including the most common ones such as Shapefiles, GeoJSON, GeoPackage, KML, GML, WKT, WKB, etc.

Check for yourself by running:

<details markdown='1'>
<summary markdown='span'>
`SELECT * FROM st_drivers();`
</summary>

| short_name     | long_name                                           | can_create | can_copy | can_open | help_url                                                             |
| -------------- | --------------------------------------------------- | ---------- | -------- | -------- | -------------------------------------------------------------------- |
| ESRI Shapefile | ESRI Shapefile                                      | true       | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/shapefile.html>     |
| MapInfo File   | MapInfo File                                        | true       | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/mitab.html>         |
| UK .NTF        | UK .NTF                                             | false      | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/ntf.html>           |
| LVBAG          | Kadaster LV BAG Extract 2.0                         | false      | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/lvbag.html>         |
| S57            | IHO S-57 (ENC)                                      | true       | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/s57.html>           |
| DGN            | Microstation DGN                                    | true       | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/dgn.html>           |
| OGR_VRT        | VRT - Virtual Datasource                            | false      | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/vrt.html>           |
| Memory         | Memory                                              | true       | false    | true     |                                                                      |
| CSV            | Comma Separated Value (.csv)                        | true       | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/csv.html>           |
| GML            | Geography Markup Language (GML)                     | true       | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/gml.html>           |
| GPX            | GPX                                                 | true       | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/gpx.html>           |
| KML            | Keyhole Markup Language (KML)                       | true       | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/kml.html>           |
| GeoJSON        | GeoJSON                                             | true       | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/geojson.html>       |
| GeoJSONSeq     | GeoJSON Sequence                                    | true       | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/geojsonseq.html>    |
| ESRIJSON       | ESRIJSON                                            | false      | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/esrijson.html>      |
| TopoJSON       | TopoJSON                                            | false      | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/topojson.html>      |
| OGR_GMT        | GMT ASCII Vectors (.gmt)                            | true       | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/gmt.html>           |
| GPKG           | GeoPackage                                          | true       | true     | true     | <https://gdal.org/en/release-3.10/drivers/vector/gpkg.html>          |
| SQLite         | SQLite / Spatialite                                 | true       | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/sqlite.html>        |
| WAsP           | WAsP .map format                                    | true       | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/wasp.html>          |
| OpenFileGDB    | ESRI FileGDB                                        | true       | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/openfilegdb.html>   |
| DXF            | AutoCAD DXF                                         | true       | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/dxf.html>           |
| CAD            | AutoCAD Driver                                      | false      | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/cad.html>           |
| FlatGeobuf     | FlatGeobuf                                          | true       | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/flatgeobuf.html>    |
| Geoconcept     | Geoconcept                                          | true       | false    | true     |                                                                      |
| GeoRSS         | GeoRSS                                              | true       | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/georss.html>        |
| VFK            | Czech Cadastral Exchange Data Format                | false      | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/vfk.html>           |
| PGDUMP         | PostgreSQL SQL dump                                 | true       | false    | false    | <https://gdal.org/en/release-3.10/drivers/vector/pgdump.html>        |
| OSM            | OpenStreetMap XML and PBF                           | false      | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/osm.html>           |
| GPSBabel       | GPSBabel                                            | true       | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/gpsbabel.html>      |
| WFS            | OGC WFS (Web Feature Service)                       | false      | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/wfs.html>           |
| OAPIF          | OGC API - Features                                  | false      | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/oapif.html>         |
| EDIGEO         | French EDIGEO exchange format                       | false      | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/edigeo.html>        |
| SVG            | Scalable Vector Graphics                            | false      | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/svg.html>           |
| ODS            | Open Document/ LibreOffice / OpenOffice Spreadsheet | true       | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/ods.html>           |
| XLSX           | MS Office Open XML spreadsheet                      | true       | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/xlsx.html>          |
| Elasticsearch  | Elastic Search                                      | true       | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/elasticsearch.html> |
| Carto          | Carto                                               | true       | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/carto.html>         |
| AmigoCloud     | AmigoCloud                                          | true       | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/amigocloud.html>    |
| SXF            | Storage and eXchange Format                         | false      | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/sxf.html>           |
| Selafin        | Selafin                                             | true       | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/selafin.html>       |
| JML            | OpenJUMP JML                                        | true       | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/jml.html>           |
| PLSCENES       | Planet Labs Scenes API                              | false      | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/plscenes.html>      |
| CSW            | OGC CSW (Catalog  Service for the Web)              | false      | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/csw.html>           |
| VDV            | VDV-451/VDV-452/INTREST Data Format                 | true       | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/vdv.html>           |
| MVT            | Mapbox Vector Tiles                                 | true       | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/mvt.html>           |
| NGW            | NextGIS Web                                         | true       | true     | true     | <https://gdal.org/en/release-3.10/drivers/vector/ngw.html>           |
| MapML          | MapML                                               | true       | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/mapml.html>         |
| TIGER          | U.S. Census TIGER/Line                              | false      | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/tiger.html>         |
| AVCBin         | Arc/Info Binary Coverage                            | false      | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/avcbin.html>        |
| AVCE00         | Arc/Info E00 (ASCII) Coverage                       | false      | false    | true     | <https://gdal.org/en/release-3.10/drivers/vector/avce00.html>        |

</details>

Initially we have prioritized providing a breadth of capabilities by wrapping existing libraries. We're planning to implement more of the core functions and algorithms natively in the future to enable faster performance and more efficient memory management.

As an initial step in this direction, we provide a set of non-standard specialized columnar DuckDB native geometry types such as `POINT_2D`, `BOX_2D`, etc. that should provide better compression and faster execution in exchange for some flexibility, but work around these are still very much experimental.

#### Example Usage

The following demonstrates how you can use the spatial extension to read and export multiple geospatial data formats, transform geometries between different coordinate reference systems and work with spatial property and predicate functions. While this example may be slightly contrived, we want to showcase the power of the currently available features.
You can find the datasets used in this example in the [spatial extension repository](https://github.com/duckdb/duckdb-spatial/tree/main/test/data/nyc_taxi).

Let's import the NYC taxi ride data provided in Parquet format as well as the accompanying taxi zone data from a shapefile, using the `ST_Read` table function provided by the spatial extension. These taxi zones break NYC into polygons that represent regions, for example the Newark Airport. We then create a table for the rides and a table for the zones. Note that `ST_Read` produces a table with a `wkb_geometry` column that contains the geometry data encoded as a WKB (Well-Known Binary) blob, which we then convert to the `GEOMETRY` type using the `ST_GeomFromWKB` function. 

> This may all seem a bit much if you are not familiar with the geospatial ecosystem, but rest assured this is all you really need to get started. In short:
> – [Shapefile](https://en.wikipedia.org/wiki/Shapefile) (.shp, .shx, .dbf) is a common format for storing geometry vector data and auxiliary metadata such as indexes and attributes.
> – [WKB (Well Known Binary)](https://libgeos.org/specifications/wkb/), while not really a file format in itself, is a common binary encoding of vector geometry data, used in e.g., GeoParquet. Comes in multiple flavors, but we're only concerned with "standard" WKB for now.
> – `GEOMETRY` is a DuckDB type that represents a [Simple Features](https://en.wikipedia.org/wiki/Simple_Features) geometry object, which is based on a set of standards modeling vector geometry data as points, linestrings, polygons or collections of such. This is the core data type used by the spatial extension, and what most of the provided functions take and return.

```sql
INSTALL spatial;
LOAD spatial;

CREATE TABLE rides AS
    SELECT * 
    FROM 'yellow_tripdata_2010-01-limit1mil.parquet';

-- Load the NYC taxi zone data from a shapefile using the gdal-based ST_Read function
CREATE TABLE zones AS
    SELECT zone, LocationId, borough, geom 
    FROM ST_Read('taxi_zones/taxi_zones.shx');
```

Let's compare the trip distance to the linear distance between the pickup and dropoff points to figure out how efficient the taxi drivers are (or how dirty the data is, since some diffs seem to be negative). We transform the coordinates from "WGS84" (given by the identifier EPSG:4326), also commonly known as simply latitude/longitude to the "NAD83 / New York Long Island ftUS" (identified as ESRI:102718) coordinate reference system which is a projection with minimal distortion around New York. We then calculate the distance using the `ST_Distance` function. In This case we get the distance in feet since we've converted the coordinates to NAD83 but we can easily convert it into to miles (5280 ft/mile) which is the unit used in the rides dataset so we can compare them correctly.

<details markdown='1'>
<summary markdown='span'>
Wait, what's all this about coordinate reference systems and projections?
</summary>
> The earth is not flat, but sometimes it is useful to pretend it is for the sake of simplicity by "projecting" the coordinates onto a flat surface. The "parameters" of a projection – e.g., where the "origin" is located, what unit coordinates are in, or how the earth's shape is approximated – are encapsulated by a "Spatial Reference System" or "Coordinate Reference System" (CRS) which is usually referenced by a shorthand identifier composed of an authority and a code, e.g., "EPSG:4326" or "ESRI:102718". Projections are always lossy, so its important to use a CRS that is well suited for the "area of interest" your data is in. The spatial extension uses the [PROJ](https://proj.org/) library to handle coordinate reference systems and projections.

</details>

Trips with a distance shorter than the aerial distance are likely to be erroneous, so we use this query to filter out some bad data. The query below takes advantage of DuckDB's ability to refer to column aliases defined within the same select statement. This is a small example of how DuckDB's rich SQL dialect can simplify geospatial analysis.

```sql
CREATE TABLE cleaned_rides AS
    SELECT 
        ST_Point(pickup_latitude, pickup_longitude) AS pickup_point,
        ST_Point(dropoff_latitude, dropoff_longitude) AS dropoff_point,
        dropoff_datetime::TIMESTAMP - pickup_datetime::TIMESTAMP AS time,
        trip_distance,
        ST_Distance(
            ST_Transform(pickup_point, 'EPSG:4326', 'ESRI:102718'), 
            ST_Transform(dropoff_point, 'EPSG:4326', 'ESRI:102718')) / 5280 
            AS aerial_distance, 
        trip_distance - aerial_distance AS diff 
    FROM rides 
    WHERE diff > 0
    ORDER BY diff DESC;
```

<details markdown='1'>
<summary markdown='span'>
`SELECT * FROM rides LIMIT 10;`
</summary>

| vendor_id | pickup_datetime     | dropoff_datetime    | passenger_count | trip_distance      | pickup_longitude   | pickup_latitude | rate_code | store_and_fwd_flag | dropoff_longitude  | dropoff_latitude | payment_type | fare_amount | surcharge | mta_tax | tip_amount | tolls_amount | total_amount |
| --------- | ------------------- | ------------------- | --------------- | ------------------ | ------------------ | --------------- | --------- | ------------------ | ------------------ | ---------------- | ------------ | ----------- | --------- | ------- | ---------- | ------------ | ------------ |
| VTS       | 2010-01-01 00:00:17 | 2010-01-01 00:00:17 | 3               | 0.0                | -73.87105699999998 | 40.773522       | 1         |                    | -73.871048         | 40.773545        | CAS          | 45.0        | 0.0       | 0.5     | 0.0        | 0.0          | 45.5         |
| VTS       | 2010-01-01 00:00:20 | 2010-01-01 00:00:20 | 1               | 0.05               | -73.97512999999998 | 40.789973       | 1         |                    | -73.97498799999998 | 40.790598        | CAS          | 2.5         | 0.5       | 0.5     | 0.0        | 0.0          | 3.5          |
| CMT       | 2010-01-01 00:00:23 | 2010-01-01 00:00:25 | 1               | 0.0                | -73.999431         | 40.71216        | 1         | 0                  | -73.99915799999998 | 40.712421        | No           | 2.5         | 0.5       | 0.5     | 0.0        | 0.0          | 3.5          |
| CMT       | 2010-01-01 00:00:33 | 2010-01-01 00:00:55 | 1               | 0.0                | -73.97721699999998 | 40.749633       | 1         | 0                  | -73.97732899999998 | 40.749629        | Cas          | 2.5         | 0.5       | 0.5     | 0.0        | 0.0          | 3.5          |
| VTS       | 2010-01-01 00:01:00 | 2010-01-01 00:01:00 | 1               | 0.0                | -73.942313         | 40.784332       | 1         |                    | -73.942313         | 40.784332        | Cre          | 10.0        | 0.0       | 0.5     | 2.0        | 0.0          | 12.5         |
| VTS       | 2010-01-01 00:01:06 | 2010-01-01 00:01:06 | 2               | 0.38               | -73.97463          | 40.756687       | 1         |                    | -73.979872         | 40.759143        | CAS          | 3.7         | 0.5       | 0.5     | 0.0        | 0.0          | 4.7          |
| VTS       | 2010-01-01 00:01:07 | 2010-01-01 00:01:07 | 2               | 0.23               | -73.987358         | 40.718475       | 1         |                    | -73.98518          | 40.720468        | CAS          | 2.9         | 0.5       | 0.5     | 0.0        | 0.0          | 3.9          |
| CMT       | 2010-01-01 00:00:02 | 2010-01-01 00:01:08 | 1               | 0.1                | -73.992807         | 40.741418       | 1         | 0                  | -73.995799         | 40.742596        | No           | 2.9         | 0.5       | 0.5     | 0.0        | 0.0          | 3.9          |
| VTS       | 2010-01-01 00:01:23 | 2010-01-01 00:01:23 | 1               | 0.6099999999999999 | -73.98003799999998 | 40.74306        | 1         |                    | -73.974862         | 40.750387        | CAS          | 3.7         | 0.5       | 0.5     | 0.0        | 0.0          | 4.7          |
| VTS       | 2010-01-01 00:01:34 | 2010-01-01 00:01:34 | 1               | 0.02               | -73.954122         | 40.801173       | 1         |                    | -73.95431499999998 | 40.800897        | CAS          | 45.0        | 0.0       | 0.5     | 0.0        | 0.0          | 45.5         |

</details>

<details markdown='1'>
<summary markdown='span'>
`SELECT * FROM zones LIMIT 10;`
</summary>

| zone                    | LocationID | borough       | geom               |
| ----------------------- | ---------- | ------------- | ------------------ |
| Newark Airport          | 1          | EWR           | POLYGON (...)      |
| Jamaica Bay             | 2          | Queens        | MULTIPOLYGON (...) |
| Allerton/Pelham Gardens | 3          | Bronx         | POLYGON (...)      |
| Alphabet City           | 4          | Manhattan     | POLYGON (...)      |
| Arden Heights           | 5          | Staten Island | POLYGON (...)      |
| Arrochar/Fort Wadsworth | 6          | Staten Island | POLYGON (...)      |
| Astoria                 | 7          | Queens        | POLYGON (...)      |
| Astoria Park            | 8          | Queens        | POLYGON (...)      |
| Auburndale              | 9          | Queens        | POLYGON (...)      |
| Baisley Park            | 10         | Queens        | POLYGON (...)      |

</details>

> It should be noted that this is not entirely accurate since the `ST_Distance` function we use does not take into account the curvature of the earth. However, we'll accept it as a good enough approximation for our purposes. Spherical and geodesic distance calculations are on the roadmap!

Now let's join the taxi rides with the taxi zones to get the start and end zone for each ride. We use the `ST_Within` function as our join condition to check if a pickup or dropoff point is within a taxi zone polygon. Again we need to transform the coordinates from WGS84 to the NAD83 since the taxi zone data also use that projection. Spatial joins like these are the bread and butter of geospatial data processing, but we don't currently have any optimizations in place (such as spatial indexes) to speed up these queries, which is why we only use a subset of the data for the following step.

```sql
-- Since we don't have spatial indexes yet, use a smaller dataset for the join.
DELETE FROM cleaned_rides WHERE rowid > 5000;

CREATE TABLE joined AS 
    SELECT 
        pickup_point,
        dropoff_point,
        start_zone.zone AS start_zone,
        end_zone.zone AS end_zone, 
        trip_distance,
        time,
    FROM cleaned_rides 
    JOIN zones AS start_zone 
      ON ST_Within(ST_Transform(pickup_point, 'EPSG:4326', 'ESRI:102718'), start_zone.geom) 
    JOIN zones AS end_zone 
      ON ST_Within(ST_Transform(dropoff_point, 'EPSG:4326', 'ESRI:102718'), end_zone.geom);
```

<details markdown='1'>
<summary markdown='span'>
`SELECT * FROM joined USING SAMPLE 10 ROWS;`
</summary>

| pickup_point                         | dropoff_point                        | start_zone               | end_zone                      | trip_distance | time     |
| ------------------------------------ | ------------------------------------ | ------------------------ | ----------------------------- | ------------- | -------- |
| POINT (40.722223 -73.98385299999998) | POINT (40.715507 -73.992438)         | East Village             | Lower East Side               | 10.3          | 00:19:16 |
| POINT (40.648687 -73.783522)         | POINT (40.649567 -74.005812)         | JFK Airport              | Sunset Park West              | 23.57         | 00:28:00 |
| POINT (40.761603 -73.96661299999998) | POINT (40.760232 -73.96344499999998) | Upper East Side South    | Sutton Place/Turtle Bay North | 17.6          | 00:27:05 |
| POINT (40.697212 -73.937495)         | POINT (40.652377 -73.93983299999998) | Stuyvesant Heights       | East Flatbush/Farragut        | 13.55         | 00:24:00 |
| POINT (40.721462 -73.993583)         | POINT (40.774205 -73.90441699999998) | Lower East Side          | Steinway                      | 28.75         | 01:03:00 |
| POINT (40.716955 -74.004328)         | POINT (40.754688 -73.991612)         | TriBeCa/Civic Center     | Garment District              | 18.4          | 00:46:12 |
| POINT (40.740052 -73.994918)         | POINT (40.75439 -73.98587499999998)  | Flatiron                 | Garment District              | 24.2          | 00:35:25 |
| POINT (40.763017 -73.95949199999998) | POINT (40.763615 -73.959182)         | Lenox Hill East          | Lenox Hill West               | 18.4          | 00:33:46 |
| POINT (40.865663 -73.927458)         | POINT (40.86537 -73.927352)          | Washington Heights North | Washington Heights North      | 10.47         | 00:27:00 |
| POINT (40.738408 -73.980345)         | POINT (40.696038 -73.955493)         | Gramercy                 | Bedford                       | 16.4          | 00:21:47 |

</details>

We can export the joined table to a `GeoJSONSeq` file using the GDAL copy function, passing in a GDAL layer creation option. Since `GeoJSON` only supports a single `GEOMETRY` per record, we use the `ST_MakeLine` function to combine the pickup and dropoff points into a single line geometry. The default coordinate reference system for `GeoJSON` is WGS84, but the coordinate pairs are expected to be in longitude/latitude, so we need to flip the geometry using the `ST_FlipCoordinates` function.

```sql
COPY (
    SELECT 
        ST_MakeLine(pickup_point, dropoff_point)
            .ST_FlipCoordinates()
            .ST_AsWKB()
            AS wkb_geometry,
        start_zone,
        end_zone,
        time::VARCHAR AS trip_time 
    FROM joined) 
TO 'joined.geojsonseq' 
WITH (
    FORMAT gdal,
    DRIVER 'GeoJSONSeq',
    LAYER_CREATION_OPTIONS 'WRITE_BBOX=YES'
);
```

<details markdown='1'>
<summary markdown='span'>
`head -n 10 joined.geojsonseq`
</summary>

```json
{ "type": "Feature", "properties": { "start_zone": "JFK Airport", "end_zone": "Park Slope", "trip_time": "00:52:00" }, "geometry": { "type": "LineString", "coordinates": [ [ -73.789923, 40.643515 ], [ -73.97608, 40.680395 ] ] } }
{ "type": "Feature", "properties": { "start_zone": "JFK Airport", "end_zone": "Park Slope", "trip_time": "00:35:00" }, "geometry": { "type": "LineString", "coordinates": [ [ -73.776445, 40.645422 ], [ -73.98427, 40.670782 ] ] } }
{ "type": "Feature", "properties": { "start_zone": "JFK Airport", "end_zone": "Park Slope", "trip_time": "00:45:42" }, "geometry": { "type": "LineString", "coordinates": [ [ -73.776878, 40.645065 ], [ -73.992153, 40.662571 ] ] } }
{ "type": "Feature", "properties": { "start_zone": "JFK Airport", "end_zone": "Park Slope", "trip_time": "00:36:00" }, "geometry": { "type": "LineString", "coordinates": [ [ -73.788028, 40.641508 ], [ -73.97584, 40.670927 ] ] } }
{ "type": "Feature", "properties": { "start_zone": "JFK Airport", "end_zone": "Park Slope", "trip_time": "00:47:58" }, "geometry": { "type": "LineString", "coordinates": [ [ -73.781855, 40.644749 ], [ -73.980129, 40.663663 ] ] } }
{ "type": "Feature", "properties": { "start_zone": "JFK Airport", "end_zone": "Park Slope", "trip_time": "00:32:10" }, "geometry": { "type": "LineString", "coordinates": [ [ -73.787494, 40.641559 ], [ -73.974694, 40.673479 ] ] } }
{ "type": "Feature", "properties": { "start_zone": "JFK Airport", "end_zone": "Park Slope", "trip_time": "00:36:59" }, "geometry": { "type": "LineString", "coordinates": [ [ -73.790138, 40.643342 ], [ -73.982721, 40.662379 ] ] } }
{ "type": "Feature", "properties": { "start_zone": "JFK Airport", "end_zone": "Park Slope", "trip_time": "00:32:00" }, "geometry": { "type": "LineString", "coordinates": [ [ -73.786952, 40.641248 ], [ -73.97421, 40.676237 ] ] } }
{ "type": "Feature", "properties": { "start_zone": "JFK Airport", "end_zone": "Park Slope", "trip_time": "00:33:21" }, "geometry": { "type": "LineString", "coordinates": [ [ -73.783892, 40.648514 ], [ -73.979283, 40.669721 ] ] } }
{ "type": "Feature", "properties": { "start_zone": "JFK Airport", "end_zone": "Park Slope", "trip_time": "00:35:45" }, "geometry": { "type": "LineString", "coordinates": [ [ -73.776643, 40.645272 ], [ -73.978873, 40.66723 ] ] } }
```

</details>

And there we have it! We pulled tabular data from Parquet, combined it with geospatial data in a shapefile, cleaned and analyzed that combined data, and output it to a human readable geospatial format. The full set of currently supported functions and their implementation status can be found over at the docs in [this table](#docs:lts:core_extensions:spatial:overview::spatial-scalar-functions).

#### What's Next?

While it's probably going to take a while for us to catch up to the full set of functions provided by e.g., PostGIS, we believe that DuckDB's vectorized execution model and columnar storage format will enable a whole new class of optimizations for geospatial processing that we've just begun exploring. Improving the performance of spatial joins and predicates is therefore high on our list of priorities.

There are also some limitations with our `GEOMETRY` type that we would eventually like to tackle, such as the fact that we don't support additional Z and M dimensions, or don't support the full range of geometry sub-types that are mandated by the OGC standard, like curves or polyhedral surfaces. 

We're also interested in supporting spherical and ellipsoidal calculations in the near future, perhaps in the form of a dedicated `GEOGRAPHY` type. 

Wasm builds are also just around the corner!

Please take a look at the [GitHub repository](https://github.com/duckdb/duckdb-spatial) for the full roadmap and to see what we're currently working on. If you would like to help build this capability, please reach out on GitHub!

#### Conclusion

The DuckDB Spatial extension is another step towards making DuckDB a swiss army knife for data engineering and analytics. This extension provides a flexible and familiar `GEOMETRY` type, reprojectable between thousands of coordinate reference systems, coupled with the capability to export and import geospatial data between more than 50 different data sources. All embedded into a single extension with minimal runtime dependencies. This enables DuckDB to fit seamlessly into your existing GIS workflows regardless of which geospatial data formats or projections you're working with.

We are excited to hear what you make of the DuckDB spatial extension. It's still early days but we hope to have a lot more to share in the future as we continue making progress! If you have any questions, suggestions, ideas or issues, please don't hesitate to reach out to us on Discord or GitHub!

## 10 000 Stars on GitHub

**Publication date:** 2023-05-12

**Authors:** Mark Raasveldt, Hannes Mühleisen

Today, DuckDB reached 10 000 stars on [GitHub](https://github.com/duckdb/duckdb). We would like to pause for a second to express our gratitude to [everyone who contributed](https://github.com/duckdb/duckdb/graphs/contributors) to DuckDB and of course all its users. When we started working on DuckDB back in 2018, we would have never dreamt of getting this kind of adoption in such a short time.

From those brave souls who were early adopters of DuckDB back in 2019 to the many today, we are happy you're part of our community. Thank you for your feedback, feature requests and for your enthusiasm in adopting new features and integrations. Thank you for helping each other on our [Discord server](http://discord.duckdb.org/) or in [GitHub Discussions](https://github.com/duckdb/duckdb/discussions). Thank you for spreading the word, too.

We also would like to extend special thanks to the [DuckDB foundation supporters](https://duckdb.org/foundation/index.html), who through their generous donations keep DuckDB independent.

For us, the maintainers of DuckDB, the past few years have also been quite eventful: We spun off from the [research group where DuckDB originated](https://www.cwi.nl/en/groups/database-architectures/) to a [successful company](https://duckdblabs.com/) with close to 20 employees and many excellent partnerships.

We are very much looking forward to what the future will hold for DuckDB. Things are looking bright!

![](../images/blog/wilbur-the-duck.jpg)


## Announcing DuckDB 0.8.0

**Publication date:** 2023-05-17

**Authors:** Mark Raasveldt, Hannes Mühleisen

![](../images/blog/mottled_duck.jpg)


The DuckDB team is happy to announce the latest DuckDB release (0.8.0). This release is named “Fulvigula” after the [Mottled Duck](https://en.wikipedia.org/wiki/Mottled_duck) (Anas Fulvigula) native to the Gulf of Mexico.

To install the new version, please visit the [installation guide](https://duckdb.org/install/index.html). The full release notes can be found on [GitHub](https://github.com/duckdb/duckdb/releases/tag/v0.8.0).

#### What's New in 0.8.0

There have been too many changes to discuss them each in detail, but we would like to highlight several particularly exciting features!

* New pivot and unpivot statements
* Improvements to parallel data import/export
* Time series joins
* Recursive globbing
* Lazy-loading of storage metadata for faster startup times
* User-defined functions for Python
* Arrow Database Connectivity (ADBC) support
* New Swift integration

Below is a summary of those new features with examples, starting with two breaking changes in our SQL dialect that are designed to produce more intuitive results by default.

#### Breaking SQL Changes

This release includes two breaking changes to the SQL dialect: The [division operator uses floating point division by default](https://github.com/duckdb/duckdb/pull/7082), and the [default null sort order is changed from `NULLS FIRST` to `NULLS LAST`](https://github.com/duckdb/duckdb/pull/7174). While DuckDB is still in Beta, we recognize that many DuckDB queries are already used in production. So, the old behavior can be restored using the following settings:

```sql
SET integer_division = true;
SET default_null_order = 'NULLS_FIRST';
```

[**Division Operator**](https://github.com/duckdb/duckdb/pull/7082). The division operator `/` will now always perform a floating point division even with integer parameters. The new operator `//` retains the old semantics and can be used to perform integer division. This makes DuckDB's division operator less error prone for beginners, and consistent with the division operator in Python 3 and other systems in the OLAP space like Spark, Snowflake and BigQuery.

```sql
SELECT 42 / 5, 42 // 5;
```

| (42 / 5) | (42 // 5) |
| -------: | --------: |
|      8.4 |         8 |

[**Default Null Sort Order**](https://github.com/duckdb/duckdb/pull/7174). The default null sort order is changed from `NULLS FIRST` to `NULLS LAST`. The reason for this change is that `NULLS LAST` sort-order is more intuitive when combined with `LIMIT`. With `NULLS FIRST`, Top-N queries always return the `NULL` values first. With `NULLS LAST`, the actual Top-N values are returned instead.

```sql
CREATE TABLE bigdata (col INTEGER);
INSERT INTO bigdata VALUES (NULL), (42), (NULL), (43);
FROM bigdata ORDER BY col DESC LIMIT 3;
```

| v0.7.1 | v0.8.0 |
| -----: | -----: |
|   NULL |     43 |
|   NULL |     42 |
|     43 |   NULL |

#### New SQL Features

[**Pivot and Unpivot**](https://github.com/duckdb/duckdb/pull/6387). There are many shapes and sizes of data, and we do not always have control over the process in which data is generated. While SQL is well-suited for reshaping datasets, turning columns into rows or rows into columns is tedious in vanilla SQL. With this release, DuckDB introduces the `PIVOT` and `UNPIVOT` statements that allow reshaping data sets so that rows are turned into columns or vice versa. A key advantage of DuckDB's syntax is that the column names to pivot or unpivot can be automatically deduced. Here is a short example:

```sql
CREATE TABLE sales (year INTEGER, amount INTEGER);
INSERT INTO sales VALUES (2021, 42), (2022, 100), (2021, 42);
PIVOT sales ON year USING sum(amount);
```

| 2021 | 2022 |
| ---: | ---: |
|   84 |  100 |

The [documentation contains more examples](#docs:lts:sql:statements:pivot).

[**ASOF Joins for Time Series**](https://github.com/duckdb/duckdb/pull/6719). When joining time series data with background fact tables, the timestamps often do not exactly match. In this case it is often desirable to join rows so that the timestamp is joined with the *nearest timestamp*. The ASOF join can be used for this purpose – it performs a fuzzy join to find the closest join partner for each row instead of requiring an exact match.

```sql
CREATE TABLE a (ts TIMESTAMP);
CREATE TABLE b (ts TIMESTAMP);
INSERT INTO a VALUES (TIMESTAMP '2023-05-15 10:31:00'), (TIMESTAMP '2023-05-15 11:31:00');
INSERT INTO b VALUES (TIMESTAMP '2023-05-15 10:30:00'), (TIMESTAMP '2023-05-15 11:30:00');

FROM a ASOF JOIN b ON a.ts >= b.ts;
```

| a.ts                | b.ts                |
| ------------------- | ------------------- |
| 2023-05-15 10:31:00 | 2023-05-15 10:30:00 |
| 2023-05-15 11:31:00 | 2023-05-15 11:30:00 |

Please [refer to the documentation](#docs:lts:guides:sql_features:asof_join) for a more in-depth explanation.

#### Data Integration Improvements

[**Default Parallel CSV Reader**](https://github.com/duckdb/duckdb/pull/6977). In this release, the parallel CSV reader has been vastly improved and is now the default CSV reader. We would like to thank everyone that has tried out the experimental reader for their valuable feedback and reports. The `experimental_parallel_csv` flag has been deprecated and is no longer required. The parallel CSV reader enables much more efficient reading of large CSV files.



```sql
CREATE TABLE lineitem AS FROM lineitem.csv;
```

| v0.7.1 | v0.8.0 |
| -----: | -----: |
|  4.1 s |  1.2 s |

**Parallel [Parquet](https://github.com/duckdb/duckdb/pull/7375), [CSV and JSON Writing](https://github.com/duckdb/duckdb/pull/7368)**. This release includes support for parallel *order-preserving* writing of Parquet, CSV and JSON files. As a result, writing to these file formats is parallel by default, also without disabling insertion order preservation, and writing to these formats is greatly sped up.

```sql
COPY lineitem TO 'lineitem.csv';
COPY lineitem TO 'lineitem.parquet';
COPY lineitem TO 'lineitem.json';
```

| Format  | v0.7.1 | v0.8.0 |
| ------- | -----: | -----: |
| CSV     |  3.9 s |  0.6 s |
| Parquet |  8.1 s |  1.2 s |
| JSON    |  4.4 s |  1.1 s |

[**Recursive File Globbing using `**`**](https://github.com/duckdb/duckdb/pull/6627). This release adds support for recursive globbing where an arbitrary number of subdirectories can be matched using the `**` operator (double-star).

```sql
FROM 'data/glob/crawl/stackoverflow/**/*.csv';
```

[The documentation has been updated](#docs:lts:data:multiple_files:overview) with various examples of this syntax.

#### Storage Improvements

[**Lazy-Loading Table Metadata**](https://github.com/duckdb/duckdb/pull/6715). DuckDB’s internal storage format stores metadata for every row group in a table, such as min-max indexes and where in the file every row group is stored. In the past, DuckDB would load this metadata immediately once the database was opened. However, once the data gets very big, the metadata can also get quite large, leading to a noticeable delay on database startup. In this release, we have optimized the metadata handling of DuckDB to only read table metadata as its being accessed. As a result, startup is near-instantaneous even for large databases, and metadata is only loaded for columns that are actually used in queries. The benchmarks below are for a database file containing a single large TPC-H `lineitem` table (120× SF1) with ~770 million rows and 16 columns:

| Query                   | v0.6.1 | v0.7.1 | v0.8.0 | Parquet |
| ----------------------- | -----: | -----: | -----: | ------: |
| `SELECT 42`             | 1.60 s | 0.31 s | 0.02 s |       - |
| `FROM lineitem LIMIT 1` | 1.62 s | 0.32 s | 0.03 s |  0.27 s |

#### Clients

[**User-Defined Scalar Functions for Python**](https://github.com/duckdb/duckdb/pull/7171). Arbitrary Python functions can now be registered as scalar functions within SQL queries. This will only work when using DuckDB from Python, because it uses the actual Python runtime that DuckDB is running within. While plain Python values can be passed to the function, there is also a vectorized variant that uses PyArrow under the hood for higher efficiency and better parallelism.

```python
import duckdb

from duckdb.typing import *
from faker import Faker

def random_date():
     fake = Faker()
     return fake.date_between()

duckdb.create_function('random_date', random_date, [], DATE)
res = duckdb.sql('SELECT random_date()').fetchall()
print(res)
# [(datetime.date(2019, 5, 15),)]
```

See the [documentation](#docs:lts:clients:python:function) for more information.

[**Arrow Database Connectivity Support (ADBC)**](https://github.com/duckdb/duckdb/pull/7086). ADBC is a database API standard for database access libraries that uses Apache Arrow to transfer query result sets and to ingest data. Using Arrow for this is particularly beneficial for columnar data management systems which traditionally suffered a performance hit by emulating row-based APIs such as JDBC/ODBC. From this release, DuckDB natively supports ADBC. We’re happy to be one of the first systems to offer native support, and DuckDB’s in-process design fits nicely with ADBC.

[**Swift Integration**](https://duckdb.org/2023/04/21/swift). DuckDB has gained another official language integration: Swift. Swift is a language developed by Apple that most notably is used to create Apps for Apple devices, but also increasingly used for server-side development. The DuckDB Swift API allows developers on all swift platforms to harness DuckDB using a native Swift interface with support for Swift features like strong typing and concurrency.

#### Final Thoughts

The full release notes can be [found on GitHub](https://github.com/duckdb/duckdb/releases/tag/v0.8.0). We would like to thank all of the contributors for their hard work on improving DuckDB.

## Correlated Subqueries in SQL

**Publication date:** 2023-05-26

**Author:** Mark Raasveldt

Subqueries in SQL are a powerful abstraction that allow simple queries to be used as composable building blocks. They allow you to break down complex problems into smaller parts, and subsequently make it easier to write, understand and maintain large and complex queries.

DuckDB uses a state-of-the-art subquery decorrelation optimizer that allows subqueries to be executed very efficiently. As a result, users can freely use subqueries to create expressive queries without having to worry about manually rewriting subqueries into joins. For more information, skip to the [Performance](#::performance) section.

#### Types of Subqueries

SQL subqueries exist in two main forms: subqueries as *expressions* and subqueries as *tables*. Subqueries that are used as expressions can be used in the `SELECT` or `WHERE` clauses. Subqueries that are used as tables can be used in the `FROM` clause. In this blog post we will focus on subqueries used as *expressions*. A future blog post will discuss subqueries as *tables*.

Subqueries as expressions exist in three forms.

* Scalar subqueries
* `EXISTS`
* `IN`/`ANY`/`ALL`

All of the subqueries can be either *correlated* or *uncorrelated*. An uncorrelated subquery is a query that is independent from the outer query. A correlated subquery is a subquery that contains expressions from the outer query. Correlated subqueries can be seen as *parameterized subqueries*.

##### Uncorrelated Scalar Subqueries

Uncorrelated scalar subqueries can only return *a single value*. That constant value is then substituted and used in the query. As an example of why this is useful – imagine that we want to select all of the shortest flights in our dataset. We could run the following query to obtain the shortest flight distance:

```sql
SELECT min(distance)
FROM ontime;
```

| min(distance) |
| ------------: |
|          31.0 |

We could manually take this distance and use it in the `WHERE` clause to obtain all flights on this route.

```sql
SELECT uniquecarrier, origincityname, destcityname, flightdate
FROM ontime
WHERE distance = 31.0;
```

| uniquecarrier | origincityname | destcityname   | flightdate |
| ------------- | -------------- | -------------- | ---------: |
| AS            | Petersburg, AK | Wrangell, AK   | 2017-01-15 |
| AS            | Wrangell, AK   | Petersburg, AK | 2017-01-15 |
| AS            | Petersburg, AK | Wrangell, AK   | 2017-01-16 |

However – this requires us to hardcode the constant inside the query. By using the first query as a *subquery* we can compute the minimum distance as part of the query.

```sql
SELECT uniquecarrier, origincityname, destcityname, flightdate
FROM ontime
WHERE distance = (
     SELECT min(distance)
     FROM ontime
);
```


##### Correlated Scalar Subqueries

While uncorrelated subqueries are powerful, they come with a hard restriction: only a *single value* can be returned. Often, what we want to do is *parameterize* the query, so that we can return different values per row.

For example, suppose that we want to find all of the shortest flights *for each carrier*. We can find the shortest flight for a *specific carrier* using the following parameterized query:

```sql
PREPARE min_distance_per_carrier AS
SELECT min(distance)
FROM ontime
WHERE uniquecarrier = ?;
```

We can execute this prepared statement to obtain the minimum distance for a specific carrier.

```sql
EXECUTE min_distance_per_carrier('UA');
```

| min(distance) |
| ------------: |
|          67.0 |

If we want to use this parameterized query as a subquery, we need to use a *correlated subquery*. Correlated subqueries allow us to use parameterized queries as scalar subqueries by referencing columns from *the outer query*. We can obtain the set of shortest flights per carrier using the following query:

```sql
SELECT uniquecarrier, origincityname, destcityname, flightdate, distance
FROM ontime AS ontime_outer
WHERE distance = (
     SELECT min(distance)
     FROM ontime
     WHERE uniquecarrier = ontime_outer.uniquecarrier
);
```

| uniquecarrier | origincityname      | destcityname    | flightdate | distance |
| ------------- | ------------------- | --------------- | ---------- | -------: |
| AS            | Wrangell, AK        | Petersburg, AK  | 2017-01-01 |     31.0 |
| NK            | Fort Lauderdale, FL | Orlando, FL     | 2017-01-01 |    177.0 |
| VX            | Las Vegas, NV       | Los Angeles, CA | 2017-01-01 |    236.0 |

Notice how the column from the *outer* relation (` ontime_outer`) is used *inside* the query. This is what turns the subquery into a *correlated subquery*. The column from the outer relation (` ontime_outer.uniquecarrier`) is a *parameter* for the subquery. Logically the subquery is executed once for every row that is present in `ontime`, where the value for the column at that row is substituted as a parameter.

In order to make it more clear that the correlated subquery is in essence a *parameterized query*, we can create a scalar macro that contains the query using DuckDB's [macros](#docs:lts:sql:statements:create_macro).

```sql
CREATE MACRO min_distance_per_carrier(param) AS (
     SELECT min(distance)
     FROM ontime
     WHERE uniquecarrier = param
);
```

We can then use the macro in our original query as if it is a function.

```sql
SELECT uniquecarrier, origincityname, destcityname, flightdate, distance
FROM ontime AS ontime_outer
WHERE distance = min_distance_per_carrier(ontime_outer.uniquecarrier);
```

This gives us the same result as placing the correlated subquery inside of the query, but is cleaner as we can decompose the query into multiple segments more effectively.

##### EXISTS

`EXISTS` can be used to check if a given subquery has any results. This is powerful when used as a correlated subquery. For example, we can use `EXISTS` if we want to obtain the *last flight that has been flown on each route*.

We can obtain a list of all flights on a given route past a certain date using the following query:

```sql
PREPARE flights_after_date AS
SELECT uniquecarrier, origincityname, destcityname, flightdate, distance
FROM ontime
WHERE origin = ? AND dest = ? AND flightdate > ?;
```

```sql
EXECUTE flights_after_date('LAX', 'JFK', DATE '2017-05-01');
```

| uniquecarrier | origincityname  | destcityname | flightdate | distance |
| ------------- | --------------- | ------------ | ---------- | -------: |
| AA            | Los Angeles, CA | New York, NY | 2017-08-01 |   2475.0 |
| AA            | Los Angeles, CA | New York, NY | 2017-08-02 |   2475.0 |
| AA            | Los Angeles, CA | New York, NY | 2017-08-03 |   2475.0 |

Now in order to obtain the *last flight on a route*, we need to find flights *for which no later flight exists*.

```sql
SELECT uniquecarrier, origincityname, destcityname, flightdate, distance
FROM ontime AS ontime_outer
WHERE NOT EXISTS (
     SELECT uniquecarrier, origincityname, destcityname, flightdate, distance
     FROM ontime
     WHERE origin = ontime_outer.origin
       AND dest = ontime_outer.dest
       AND flightdate > ontime_outer.flightdate
);
```

| uniquecarrier | origincityname        | destcityname          | flightdate | distance |
| ------------- | --------------------- | --------------------- | ---------- | -------: |
| AA            | Daytona Beach, FL     | Charlotte, NC         | 2017-02-27 |    416.0 |
| EV            | Abilene, TX           | Dallas/Fort Worth, TX | 2017-02-15 |    158.0 |
| EV            | Dallas/Fort Worth, TX | Durango, CO           | 2017-02-13 |    674.0 |

##### IN / ANY / ALL

`IN` can be used to check if a *given value* exists within the result returned by the subquery. For example, we can obtain a list of all carriers that have performed more than `250 000` flights in the dataset using the following query:

```sql
SELECT uniquecarrier
FROM ontime
GROUP BY uniquecarrier
HAVING count(*) > 250000;
```

We can then use an `IN` clause to obtain all flights performed by those carriers.

```sql
SELECT *
FROM ontime
WHERE uniquecarrier IN (
     SELECT uniquecarrier
     FROM ontime
     GROUP BY uniquecarrier
     HAVING count(*) > 250000
);
```

A correlated subquery can be useful here if we want to not count the total amount of flights performed by each carrier, but count the total amount of flights *for the given route*. We can select all flights performed by carriers that have performed *at least 1000 flights on a given route* using the following query.

```sql
SELECT *
FROM ontime AS ontime_outer
WHERE uniquecarrier IN (
     SELECT uniquecarrier
     FROM ontime
     WHERE ontime.origin = ontime_outer.origin
       AND ontime.dest = ontime_outer.dest
     GROUP BY uniquecarrier
     HAVING count(*) > 1000
);
```

`ANY` and `ALL` are generalizations of `IN`. `IN` checks if the value is present in the set returned by the subquery. This is equivalent to `= ANY(...)`. The `ANY` and `ALL` operators can be used to perform other comparison operators (such as `>`, `<`, `<>`). The above query can be rewritten to `ANY` in the following form.

```sql
SELECT *
FROM ontime AS ontime_outer
WHERE uniquecarrier = ANY (
     SELECT uniquecarrier
     FROM ontime
     WHERE ontime.origin = ontime_outer.origin
       AND ontime.dest = ontime_outer.dest
     GROUP BY uniquecarrier
     HAVING count(*) > 1000
);
```

##### Performance

Whereas scalar subqueries are logically executed *once*, correlated subqueries are logically executed *once per row*. As such, it is natural to think that correlated subqueries are very expensive and should be avoided for performance reasons.

While that is true in many SQL systems, it is not the case in DuckDB. In DuckDB, subqueries are **always** *decorrelated*. DuckDB uses a state-of-the-art subquery decorrelation algorithm as described in the [Unnesting Arbitrary Queries](https://cs.emis.de/LNI/Proceedings/Proceedings241/383.pdf) paper. This allows all subqueries to be decorrelated and executed as a single, much more efficient, query.

In DuckDB, correlation does not imply performance degradation.

If we look at the query plan for the correlated scalar subquery using `EXPLAIN`, we can see that the query has been transformed into a hash aggregate followed by a hash join. This allows the query to be executed very efficiently.

```sql
EXPLAIN SELECT uniquecarrier, origincityname, destcityname, flightdate, distance
FROM ontime AS ontime_outer
WHERE distance = (
     SELECT min(distance)
     FROM ontime
     WHERE uniquecarrier = ontime_outer.uniquecarrier
);
```

```text
┌───────────────────────────┐
│         HASH_JOIN         │ 
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │ 
│      uniquecarrier =      │ 
│       uniquecarrier       ├──────────────┐
└─────────────┬─────────────┘              │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         SEQ_SCAN          ││       HASH_GROUP_BY       │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           ontime          ││       uniquecarrier       │
└───────────────────────────┘│       min(distance)       │
                             └─────────────┬─────────────┘
                             ┌─────────────┴─────────────┐
                             │         SEQ_SCAN          │
                             │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
                             │           ontime          │
                             └───────────────────────────┘
```

We can see the drastic performance difference that subquery decorrelation has when we compare the run-time of this query in DuckDB with the run-time in Postgres and SQLite. When running the above query on the [`ontime` dataset](https://www.transtats.bts.gov/Homepage.asp) for `2017` with roughly `~4 million` rows, we get the following performance results:

| DuckDB |  Postgres |    SQLite |
| -----: | --------: | --------: |
| 0.06 s | >48 hours | >48 hours |

As Postgres and SQLite do not de-correlate the subquery, the query is not just *logically*, but *actually* executed once for every row. As a result, the subquery is executed *4 million times* in those systems, which takes an immense amount of time.

In this case, it is possible to manually decorrelate the query and generate the following SQL:

```sql
SELECT ontime.uniquecarrier, origincityname, destcityname, flightdate, distance
FROM ontime
JOIN (
     SELECT uniquecarrier, min(distance) AS min_distance
     FROM ontime
     GROUP BY uniquecarrier
  ) AS subquery 
  ON ontime.uniquecarrier = subquery.uniquecarrier
 AND distance = min_distance;
```

By performing the de-correlation manually, the performance of SQLite and Postgres improves significantly. However, both systems remain over 30× slower than DuckDB.

| DuckDB | Postgres | SQLite |
| -----: | -------: | -----: |
| 0.06 s |   1.98 s | 2.81 s |

Note that while it is possible to manually decorrelate certain subqueries by rewriting the SQL, it is not always possible to do so. As described in the [Unnesting Arbitrary Queries paper](https://cs.emis.de/LNI/Proceedings/Proceedings241/383.pdf), special join types that are not present in SQL are necessary to decorrelate arbitrary queries.

In DuckDB, these special join types will be automatically generated by the system to decorrelate all subqueries. In fact, DuckDB does not have support for executing subqueries that are not decorrelated. All subqueries will be decorrelated before DuckDB executes them.

##### Conclusion

Subqueries are a very powerful tool that allow you to take arbitrary queries and convert them into ad-hoc functions. When used in combination with DuckDB's powerful subquery decorrelation, they can be executed extremely efficiently, making previously intractable queries not only possible, but fast.

## From Waddle to Flying: Quickly Expanding DuckDB's Functionality with Scalar Python UDFs

**Publication date:** 2023-07-07

**Authors:** Pedro Holanda, Thijs Bruineman, Phillip Cloud

**TL;DR:** DuckDB now supports vectorized Scalar Python User Defined Functions (UDFs). By implementing Python UDFs, users can easily expand the functionality of DuckDB while taking advantage of DuckDB's fast execution model, SQL and data safety.

![](../images/blog/bird-dance.gif)


User Defined Functions (UDFs) enable users to extend the functionality of a Database Management System (DBMS) to perform domain-specific tasks that are not implemented as built-in functions. For instance, users who frequently need to export private data can benefit from an anonymization function that masks the local part of an email while preserving the domain. Ideally, this function would be executed directly in the DBMS. This approach offers several advantages:

1) **Performance.** The function could be executed using the same execution model (e.g., streaming results, beyond-memory/out-of-core execution) of the DBMS, and without any unnecessary transformations.

2) **Easy Use.** UDFs can be seamlessly integrated into SQL queries, allowing users to leverage the power of SQL to call the functions. This eliminates the need for passing data through a separate database connector and executing external code. The functions can be utilized in various SQL contexts (e.g., subqueries, join conditions).

3) **Safety.** The sensitive data never leaves the DBMS process.

There are two main reasons users often refrain from implementing UDFs. 1) There are security concerns associated with UDFs. Since UDFs are custom code created by users and executed within the DBMS process, there is a potential risk of crashing the server. However, when it comes to DuckDB, an embedded database, this concern is mitigated as each analyst runs their own DuckDB process separately. Therefore, the impact on server stability is not a significant worry. 2) The difficulty of implementation is a common deterrent for users. High-Performance UDFs are typically only supported in low-level languages. UDFs in higher-level languages like Python incur significant performance costs. Consequently many users cannot quickly implement their UDFs without investing a significant amount of time in learning a low-level language and understanding the internal details of the DBMS.

DuckDB followed a similar approach. As a DBMS tailored for analytical tasks, performance is a key consideration, leading to the implementation of its core in C++. Consequently, the initial focus of extensibility efforts [was centered around C++](https://www.youtube.com/watch?v=UKo_LQyLTko&ab_channel=DuckDBLabs). However, this  duck is not limited to just waddling; it can also fly. So we are delighted to announce the [recent addition](https://github.com/duckdb/duckdb/pull/7171) of Scalar Python UDFs to DuckDB.

DuckDB provides support for two distinct types of Python UDFs, differing in the Python object used for communication between [DuckDB's native data types](#docs:lts:sql:data_types:overview) and the Python process. These communication layers include support for [Python built-in types](#docs:lts:sql:data_types:overview) and [PyArrow Tables](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html).

The two approaches exhibit two key differences:

1) **Zero-Copy.** PyArrow Tables leverage our [zero-copy integration with Arrow](https://duckdb.org/2021/12/03/duck-arrow), enabling efficient translation of data types to Python-Land with zero-copy cost.

2) **Vectorization.** PyArrow Table functions operate on a chunk level, processing chunks of data containing up to 2048 rows. This approach maximizes cache locality and leverages vectorization. On the other hand, the built-in types UDF implementation operates on a per-row basis.

This blog post aims to demonstrate how you can extend DuckDB using Python UDFs, with a particular emphasis on PyArrow-powered UDFs. In our quick-tour section, we will provide examples using the PyArrow UDF types. For those interested in benchmarks, you can jump ahead to the [benchmark section below](#::benchmarks). If you want to see a detailed description of the Python UDF API, please refer to our [documentation](#docs:lts:clients:python:function).

#### Python UDFs

This section depicts several practical examples of using Python UDFs. Each example uses a different type of Python UDF.

##### Quick-Tour

To demonstrate the usage of Python UDFs in DuckDB, let's consider the following example. We have a dictionary called `world_cup_titles` that maps countries to the number of World Cups they have won. We want to create a Python UDF that takes a country name as input, searches for the corresponding value in the dictionary, and returns the number of World Cups won by that country. If the country is not found in the dictionary, the UDF will return `NULL`.

Here's an example implementation:

```python
import duckdb
from duckdb.typing import *

con = duckdb.connect()

# Dictionary that maps countries and world cups they won
world_cup_titles = {
    "Brazil": 5,
    "Germany": 4,
    "Italy": 4,
    "Argentina": 2,
    "Uruguay": 2,
    "France": 2,
    "England": 1,
    "Spain": 1
}

# Function that will be registered as an UDF, simply does a lookup in the python dictionary
def world_cups(x):
     return world_cup_titles.get(x)

# We register the function
con.create_function("wc_titles", world_cups, [VARCHAR], INTEGER)
```

That's it, the function is then registered and ready to be called through SQL.

```python
# Let's create an example countries table with the countries we are interested in using
con.execute("CREATE TABLE countries (country VARCHAR)")
con.execute("INSERT INTO countries VALUES ('Brazil'), ('Germany'), ('Italy'), ('Argentina'), ('Uruguay'), ('France'), ('England'), ('Spain'), ('Netherlands')")
# We can simply call the function through SQL, and even use the function return to eliminate the countries that never won a world cup
con.sql("SELECT country, wc_titles(country) AS world_cups FROM countries").fetchall()
# [('Brazil', 5), ('Germany', 4), ('Italy', 4), ('Argentina', 2), ('Uruguay', 2), ('France', 2), ('England', 1), ('Spain', 1), ('Netherlands', None)]
```

##### Generating Fake Data with Faker (Built-In Type UDF)

Here is an example that demonstrates the usage of the [Faker library](https://faker.readthedocs.io/en/master/)  to generate a scalar function in DuckDB, which returns randomly generated dates. The function, named `random_date`, does not require any inputs and outputs a `DATE` column. Since Faker utilizes built-in Python types, the function directly returns them.
One important thing to notice is that a function that is not deterministic based on its input must be marked as having `side_effects`.

```python
import duckdb

# By importing duckdb.typing we can specify DuckDB Types directly without using strings
from duckdb.typing import *

from faker import Faker

# Our Python UDF generates a random date every time it's called
def random_date():
     fake = Faker()
     return fake.date_between()
```

We then have to register the Python function in DuckDB using `create_function`. Since our function doesn't require any inputs, we can pass an empty list as the `argument_type_list`. As the function returns a date, we specify `DATE` from `duckdb.typing` as the `return_type`. Note that since our `random_date()` function returns a built-in Python type (` datetime.date`), we don't need to specify the UDF type.

```python
# To exemplify the effect of side-effect, let's first run the function without marking it.
duckdb.create_function('random_date', random_date, [], DATE)

# After registration, we can use the function directly via SQL
# Notice that without side_effect=True, it's not guaranteed that the function will be re-evaluated.
res = duckdb.sql('SELECT random_date() FROM range (3)').fetchall()
# [(datetime.date(2003, 8, 3),), (datetime.date(2003, 8, 3),), (datetime.date(2003, 8, 3),)]

# Now let's re-add the function with side-effects marked as true.
duckdb.remove_function('random_date')
duckdb.create_function('random_date', random_date, [], DATE, side_effects=True)
res = duckdb.sql('SELECT random_date() FROM range (3)').fetchall()
# [(datetime.date(2020, 11, 29),), (datetime.date(2009, 5, 18),), (datetime.date(2018, 5, 24),)]
```

##### Swap String Case (PyArrow Type UDF)

One issue with using built-in types is that you don't benefit from zero-copy, vectorization and cache locality. Using PyArrow as a UDF type should be favored to leverage these optimizations.

To demonstrate a PyArrow function, let's consider a simple example where we want to transform lowercase characters to uppercase and uppercase characters to lowercase. Fortunately, PyArrow already has a function for this in the compute engine, and it's as simple as calling `pc.utf8_swapcase(x)`.

```python
import duckdb

# By importing duckdb.typing we can specify DuckDB Types directly without using strings
from duckdb.typing import *

import pyarrow as pa
import pyarrow.compute as pc

def swap_case(x):
     # Swap the case of the 'column' using utf8_swapcase and return the result
     return pc.utf8_swapcase(x)

con = duckdb.connect()
# To register the function, we must define it's type to be 'arrow'
con.create_function('swap_case', swap_case, [VARCHAR], VARCHAR, type='arrow')

res = con.sql("SELECT swap_case('PEDRO HOLANDA')").fetchall()
# [('pedro holanda',)]
```


##### Predicting Taxi Fare Costs (Ibis + PyArrow UDF)

Python UDFs offer significant power as they enable users to leverage the extensive Python ecosystem and tools, including libraries like [PyTorch](https://pytorch.org/) and [Tensorflow](https://www.tensorflow.org/) that efficiently implement machine learning operations.

Additionally the [Ibis project](https://ibis-project.org/) offers a DataFrame API with great DuckDB integration and supports both of DuckDB's native Python and PyArrow UDFs.

In this example, we demonstrate the usage of a pre-built PyTorch model to estimate taxi fare costs based on the traveled distance. You can find a complete example [in this blog post by the Ibis team](https://ibis-project.org/blog/rendered/torch/).

```python
import torch
import pyarrow as pa
import ibis
import ibis.expr.datatypes as dt

from ibis.expr.operations import udf


# The code to generate the model is not specified in this snippet, please refer to the provided link for more information
model = ...

# Function that uses the model and a traveled distance input tensor to predict values, please refer to the provided link for more information
def predict_linear_regression(model, tensor: torch.Tensor) -> torch.Tensor:
    ...


# Indicate to ibis that this is a scalar user-defined function whose input format is pyarrow
@udf.scalar.pyarrow
def predict_fare(x: dt.float64) -> dt.float32:
    # `x` is a pyarrow.ChunkedArray; the `dt.float64` annotation indicate the element type of the ChunkedArray.

    # Transform the data from PyArrow to the required torch tensor format and dimension.
    tensor = torch.from_numpy(x.to_numpy()[:, None]).float()

    # Call the actual prediction function, which also returns a torch tensor.
    predicted = predict_linear_regression(model, tensor).ravel()
    return pa.array(predicted.numpy())


# Execute a query on the NYC Taxi Parquet file to showcase our model's predictions, the actual fare amount, and the distance.
expr = (
    ibis.read_parquet('yellow_tripdata_2016-02.parquet')
    .mutate(
        "fare_amount",
        "trip_distance",
        predicted_fare=lambda t: predict_fare(t.trip_distance),
    )
)
df = expr.execute()
```

By utilizing Python UDFs in DuckDB with Ibis, you can seamlessly incorporate machine learning models and perform predictions directly within your Ibis code and SQL queries. The example demonstrates how to predict taxi fare costs based on distance using a PyTorch model, showcasing the integration of machine learning capabilities within DuckDB's SQL environment driven by Ibis.

#### Benchmarks

In this section, we will perform simple benchmark comparisons to demonstrate the performance differences between two different types of Python UDFs. The benchmark will measure the execution time, and peak memory consumption. The benchmarks are executed 5 times, and the median value is considered. The benchmark is conducted on a Mac Apple M1 with 16 GB of RAM.


##### Built-In Python vs. PyArrow

To benchmark these UDF types, we create UDFs that take an integral column as input, add one to each value, and return the result. The code used for this benchmark section can be found in a [GitHub Gist](https://gist.github.com/pdet/ebd201475581756c29e4533a8fa4106e). 

```python
import pyarrow.compute as pc
import duckdb
import pyarrow as pa

# Built-In UDF
def add_built_in_type(x):
    return x + 1

# Arrow UDF
def add_arrow_type(x):
    return pc.add(x, 1)

con = duckdb.connect()

# Registration
con.create_function('built_in_types', add_built_in_type, ['BIGINT'], 'BIGINT', type='native')
con.create_function('add_arrow_type', add_arrow_type, ['BIGINT'], 'BIGINT', type='arrow')

# Integer View with 10,000,000 elements.
con.sql("""
     SELECT i
     FROM range(10000000) tbl(i);
""").to_view("numbers")

# Calls for both UDFs
native_res = con.sql("SELECT sum(add_built_in_type(i)) FROM numbers").fetchall()
arrow_res = con.sql("SELECT sum(add_arrow_type(i)) FROM numbers").fetchall()
```


| Name     | Time (s) |
| -------- | -------: |
| Built-In |     5.37 |
| PyArrow  |     0.35 |

We can observe a performance difference of more than one order of magnitude between the two UDFs. The difference in performance is primarily due to three factors:

1) In Python, object construction and general use is rather slow. This is due to several reasons, including automatic memory management, interpretation, and dynamic typing.
2) The PyArrow UDF does not require any data copying.
3) The PyArrow UDF is executed in a vectorized fashion, processing chunks of data instead of individual rows.


##### Python UDFs vs. External Functions

Here we compare the usage of a Python UDF with an external function. In this case, we have a function that calculates the sum of the lengths of all strings in a column. You can find the code used for this benchmark section in a [GitHub Gist](https://gist.github.com/pdet/2907290725539d390df7981e799ed593).

```python
import duckdb
import pyarrow as pa

# Function used in UDF
def string_length_arrow(x):
     tuples = len(x)
     values = [len(i.as_py()) if i.as_py() != None else 0 for i in x]
     array = pa.array(values, type=pa.int32(), size=tuples)
     return array


# Same Function but external to the database
def exec_external(con):
     arrow_table = con.sql("SELECT i FROM strings tbl(i)").arrow()
     arrow_column = arrow_table['i']
     tuples = len(arrow_column)
     values = [len(i.as_py()) if i.as_py() != None else 0 for i in arrow_column]
     array = pa.array(values, type=pa.int32(), size=tuples)
     arrow_tbl = pa.Table.from_arrays([array], names=['i'])
     return con.sql("SELECT sum(i) FROM arrow_tbl").fetchall()


con = duckdb.connect()
con.create_function('strlen_arrow', string_length_arrow, ['VARCHAR'], int, type='arrow')

con.sql("""
     SELECT
          CASE WHEN i != 0 AND i % 42 = 0
          THEN
               NULL
          ELSE
               repeat(chr((65 + (i % 26))::INTEGER), (4 + (i % 12))) END
          FROM range(10000000) tbl(i);
""").to_view("strings")

con.sql("SELECT sum(strlen_arrow(i)) FROM strings tbl(i)").fetchall()

exec_external(con)
```


| Name     | Time (s) | Peak memory consumption (MB) |
| -------- | -------: | ---------------------------: |
| External |     5.65 |                      584.032 |
| UDF      |     5.63 |                      112.848 |


Here we can see that there is no significant regression in performance when utilizing UDFs. However, you still have the benefits of safer execution and the utilization of SQL. In our example, we can also notice that the external function materializes the entire query, resulting in a 5× higher peak memory consumption compared to the UDF approach.

#### Conclusions and Further Development

Scalar Python UDFs are now supported in DuckDB, marking a significant milestone in extending the functionality of the database. This enhancement empowers users to perform complex computations using a high-level language. Additionally, Python UDFs can leverage DuckDB's zero-copy integration with Arrow, eliminating data transfer costs and ensuring efficient query execution.

While the introduction of Python UDFs is a major step forward, our work in this area is ongoing. Our roadmap includes the following focus areas:

1. **Aggregate/Table-Producing UDFs**: Currently, users can create Scalar UDFs, but we are actively working on supporting Aggregation Functions (which perform calculations on a set of values and return a single result) and Table-Producing Functions (which return tables without limitations on the number of columns and rows).

2. **Types**: Scalar Python UDFs currently support most DuckDB types, with the exception of ENUM types and BIT types. We are working towards expanding the type support to ensure comprehensive functionality.

If you encounter any problems using our Python UDFs, please open an issue in [DuckDB's issue tracker](https://github.com/duckdb/duckdb-python/issues).

## DuckDB ADBC – Zero-Copy Data Transfer via Arrow Database Connectivity

**Publication date:** 2023-08-04

**Author:** Pedro Holanda

**TL;DR:** DuckDB has added support for [Arrow Database Connectivity (ADBC)](https://arrow.apache.org/adbc/0.5.1/index.html), an API standard that enables efficient data ingestion and retrieval from database systems, similar to [Open Database Connectivity (ODBC)](https://learn.microsoft.com/en-us/sql/odbc/microsoft-open-database-connectivity-odbc?view=sql-server-ver16) interface. However, unlike ODBC, ADBC specifically caters to the columnar storage model, facilitating fast data transfers between a columnar database and an external application.

![](../images/blog/adbc/duck-arrow.jpg)


Database interface standards allow developers to write application code that is independent of the underlying database management system (DBMS) being used. DuckDB has supported two standards that have gained popularity in the past few decades: [the core interface of ODBC](https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/interface-conformance-levels?view=sql-server-ver16) and [Java Database Connectivity (JDBC)](https://en.wikipedia.org/wiki/Java_Database_Connectivity). Both interfaces are designed to fully support database connectivity and management, with JDBC being catered for the Java environment. With these APIs, developers can query DBMS agnostically, retrieve query results, run prepared statements, and manage connections.

These interfaces were designed in the early 90s when row-wise database systems reigned supreme. As a result, they were primarily intended for transferring data in a row-wise format. However, in the mid-2000s, columnar-wise database systems started gaining a lot of traction due to their drastic performance advantages for data analysis (you can find myself giving a brief exemplification of this difference [at EuroPython](https://youtu.be/egN4TwVyJss?t=643)). This means that these APIs offer no support for transferring data in a columnar-wise format (or, in the case of ODBC, [some support](https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/column-wise-binding?view=sql-server-ver16) with a lot of added complexity). In practice, when analytical, column-wise systems like DuckDB make use of these APIs, [converting the data between these representation formats becomes a major bottleneck](https://hannes.muehleisen.org/publications/p852-muehleisen.pdf).

The figure below depicts how a developer can use these APIs to query a DuckDB database. For example, developers can submit SQL queries via the API, which then uses a DuckDB driver to internally call the proper functions. A query result is then produced in DuckDB's internal columnar representation, and the driver takes care of transforming it to the JDBC or ODBC row-wise result format. This transformation has significant costs for rearranging and copying the data, quickly becoming a major bottleneck.

![](../images/blog/adbc/duck-odbc-jdbc-light.png)



To overcome this transformation cost, ADBC has been proposed, with a generic API to support database operations while using the [Apache Arrow memory format](https://arrow.apache.org/) to send data in and out of the DBMS. DuckDB now supports the [ADBC specification](https://arrow.apache.org/adbc/0.5.1/cpp/api/adbc.html). Due to DuckDB's [zero-copy integration with the Arrow format](https://duckdb.org/2021/12/03/duck-arrow), using ADBC as an interface is rather efficient, since there is only a small *constant* cost to transform DuckDB query results to the Arrow format.

The figure below depicts the query execution flow when using ADBC. Note that the main difference between ODBC/JDBC is that the result does not need to be transformed to a row-wise format.

![](../images/blog/adbc/duck-adbc-light.png)



#### Quick Tour

For our quick tour, we will illustrate an example of round-tripping data using DuckDB-ADBC via Python. Please note that DuckDB-ADBC can also be utilized with other programming languages. Specifically, you can find C++ DuckDB-ADBC examples and tests in the [DuckDB GitHub repository](https://github.com/duckdb/duckdb/blob/main/test/api/adbc/test_adbc.cpp) along with usage examples available in C++.
For convenience, you can also find a ready-to-run version of this tour in a [Colab notebook](https://colab.research.google.com/drive/11CEI62jRMHG5GtK0t_h6xSn6ne8W7dvS?usp=sharing).
If you would like to see a more detailed explanation of the DuckDB-ADBC API or view a C++ example, please refer to our [documentation page](#docs:lts:clients:adbc).

##### Setup

For this example, you must have a dynamic library from the latest bleeding-edge version of DuckDB, pyarrow, and the [adbc-driver-manager](https://github.com/apache/arrow-adbc/tree/main/python/adbc_driver_manager). The ADBC driver manager is a Python package developed by [Voltron Data](https://voltrondata.com/). The driver manager is compliant with [DB-API 2.0](https://peps.python.org/pep-0249/). It wraps ADBC, making its usage more straightforward. For details on ADBC drivers, see the find the documentation of the [ADBC Driver Manager](https://arrow.apache.org/adbc/0.5.1/python/api/adbc_driver_manager.html).

> While DuckDB is already DB-API compliant in Python, what sets ADBC apart is that you do not need a DuckDB module installed and loaded. Additionally, unlike the DB-API, it does not utilize row-wise as its data transfer format of choice.

```bash
pip install pyarrow
pip install adbc-driver-manager
```

##### Insert Data

First, we need to include the necessary libraries that will be used in this tour. Mainly, PyArrow and the DBAPI from the ADBC Driver Manager.

```python
import pyarrow
from adbc_driver_manager import dbapi
```

Next, we can create a connection via ADBC with DuckDB. This connection simply requires the path to DuckDB's driver and the entrypoint function name. DuckDB's entrypoint is `duckdb_adbc_init`.
By default, connections are established with an in-memory database. However, if desired, you have the option to specify the `path` variable and connect to a local DuckDB instance, allowing you to store the data on disk.
Note that these are the only variables in ADBC that are not DBMS agnostic; instead, they are set by the user, often through a configuration file.

```python
con = dbapi.connect(driver="path/to/duckdb.lib", entrypoint="duckdb_adbc_init", db_kwargs={"path": "test.db"})
```

To insert the data, we can simply call the `adbc_ingest` function with a cursor from our connection. It requires the name of the table we want to perform the ingestion to and the Arrow Python object we want to ingest. This function also has two modes: `append`, where data is appended to an existing table, and `create`, where the table does not exist yet and will be created with the input data. By default, it's set to create, so we don't need to define it here.

```python
table = pyarrow.table(
     [
          ["Tenacious D", "Backstreet Boys", "Wu Tang Clan"],
          [4, 10, 7]

     ],
     names=["Name", "Albums"],
)

with con.cursor() as cursor:
     cursor.adbc_ingest("Bands", table)
```

 After calling `adbc_ingest`, the table is created in the DuckDB connection and the data is fully inserted.

##### Read Data

To read data from DuckDB, one simply needs to use the `execute` function with a SQL query and then return the cursor's result to the desired Arrow format, such as a PyArrow Table in this example.

```python
with con.cursor() as cursor:
     cursor.execute("SELECT * FROM Bands")
     cursor.fetch_arrow_table()
```

#### Benchmark ADBC vs ODBC

In our benchmark section, we aim to evaluate the differences in data reading from DuckDB via ADBC and ODBC. This benchmark was executed on an Apple M1 Max with 32 GB of RAM and involves outputting and inserting the `lineitem` table of TPC-H SF1. You can find the repository with the code used to run this benchmark on [GitHub](https://github.com/pdet/connector_benchmark).

| Name | Time (s) |
| ---- | -------: |
| ODBC |   28.149 |
| ADBC |    0.724 |

The time difference between ODBC and ADBC is 38×. This significant contrast results from the extra allocations and copies that exist in ODBC.

#### Conclusions

DuckDB now supports the ADBC standard for database connection. ADBC is particularly efficient when combined with DuckDB, thanks to its use of the Arrow zero-copy integration.

ADBC is particularly interesting because it can drastically decrease interactions between analytic systems compared to ODBC. For example, if software that already support ODBC, e.g., if [MS-Excel](https://www.microsoft.com/en-us/microsoft-365/excel) was to implement ADBC, integrations with columnar systems like DuckDB could benefit from this significant difference in performance.

DuckDB-ADBC is currently supported via the C Interface and through the Python ADBC Driver Manager. We will add more extensive tutorials for other languages to our [documentation webpage](#docs:lts:index). Please feel free to let us know your preferred language for interacting with DuckDB via ADBC!

If you encounter any problems using ADBC, please open an issue in [DuckDB's issue tracker](https://github.com/duckdb/duckdb/issues).

## Even Friendlier SQL with DuckDB

**Publication date:** 2023-08-23

**Author:** Alex Monahan

**TL;DR:** DuckDB continues to push the boundaries of SQL syntax to both simplify queries and make more advanced analyses possible. Highlights include dynamic column selection, queries that start with the FROM clause, function chaining, and list comprehensions. We boldly go where no SQL engine has gone before! For more details, see the documentation for [friendly SQL features](/docs/guides/sql_features/friendly_sql).

![](../images/blog/ai_generated_star_trek_rubber_duck.png)


Who says that SQL should stay frozen in time, chained to a 1999 version of the specification? As a comparison, do folks remember what JavaScript felt like before Promises? Those didn’t launch until 2012! It’s clear that innovation at the programming syntax layer can have a profoundly positive impact on an entire language ecosystem.

We believe there are many valid reasons for innovation in the SQL language, among them opportunities to simplify basic queries and also to make more dynamic analyses possible. Many of these features arose from community suggestions! Please let us know your SQL pain points on [Discord](https://discord.duckdb.org/) or [GitHub](https://github.com/duckdb/duckdb/discussions) and join us as we change what it feels like to write SQL!

If you have not had a chance to read the first installment in this series, please take a quick look to the prior blog post, [“Friendlier SQL with DuckDB”](https://duckdb.org/2022/05/04/friendlier-sql).

#### The Future Is Now

The first few enhancements in this list were included in the “Ideas for the Future” section of the prior post.

##### Reusable Column Aliases

When working with incremental calculated expressions in a select statement, traditional SQL dialects force you to either write out the full expression for each column or create a common table expression (CTE) around each step of the calculation. Now, any column alias can be reused by subsequent columns within the same select statement. Not only that, but these aliases can be used in the where and order by clauses as well.

###### Old Way 1: Repeat Yourself

```sql
SELECT 
    'These are the voyages of the starship Enterprise...' AS intro,
    instr('These are the voyages of the starship Enterprise...', 'starship')
        AS starship_loc
    substr('These are the voyages of the starship Enterprise...',
    instr('These are the voyages of the starship Enterprise...', 'starship')
        + len('starship') + 1) AS trimmed_intro;
```

###### Old Way 2: All the CTEs

```sql
WITH intro_cte AS (
    SELECT
        'These are the voyages of the starship Enterprise...' AS intro
), starship_loc_cte AS (
    SELECT
        intro,
        instr(intro, 'starship') AS starship_loc
    FROM intro_cte
)
SELECT
    intro,
    starship_loc,
    substr(intro, starship_loc + len('starship') + 1) AS trimmed_intro
FROM starship_loc_cte;
```

###### New Way

```sql
SELECT 
     'These are the voyages of the starship Enterprise...' AS intro,
     instr(intro, 'starship') AS starship_loc,
     substr(intro, starship_loc + len('starship') + 1) AS trimmed_intro;
```


| intro                                               | starship_loc | trimmed_intro |
| :-------------------------------------------------- | :----------- | :------------ |
| These are the voyages of the starship Enterprise... | 30           | Enterprise... |

##### Dynamic Column Selection

Databases typically prefer strictness in column definitions and flexibility in the number of rows. This can help by enforcing data types and recording column level metadata. However, in data science workflows and elsewhere, it is very common to dynamically generate columns (for example during feature engineering).

No longer do you need to know all of your column names up front! DuckDB can select and even modify columns based on regular expression pattern matching, `EXCLUDE` or `REPLACE` modifiers, and even lambda functions (see the [section on lambda functions below](#::list-lambda-functions) for details!).

Let’s take a look at some facts gathered about the first season of Star Trek. Using DuckDB’s [`httpfs` extension](#docs:lts:core_extensions:httpfs:overview), we can query a CSV dataset directly from GitHub. It has several columns so let’s `DESCRIBE` it.

```sql
INSTALL httpfs;
LOAD httpfs;

CREATE TABLE trek_facts AS
    SELECT *
    FROM 'https://blobs.duckdb.org/data/Star_Trek-Season_1.csv';

DESCRIBE trek_facts;
```


| column_name                             | column_type | null | key  | default | extra |
| :-------------------------------------- | :---------- | :--- | :--- | :------ | :---- |
| season_num                              | BIGINT      | YES  | NULL | NULL    | NULL  |
| episode_num                             | BIGINT      | YES  | NULL | NULL    | NULL  |
| aired_date                              | DATE        | YES  | NULL | NULL    | NULL  |
| cnt_kirk_hookups                        | BIGINT      | YES  | NULL | NULL    | NULL  |
| cnt_downed_redshirts                    | BIGINT      | YES  | NULL | NULL    | NULL  |
| bool_aliens_almost_took_over_planet     | BIGINT      | YES  | NULL | NULL    | NULL  |
| bool_aliens_almost_took_over_enterprise | BIGINT      | YES  | NULL | NULL    | NULL  |
| cnt_vulcan_nerve_pinch                  | BIGINT      | YES  | NULL | NULL    | NULL  |
| cnt_warp_speed_orders                   | BIGINT      | YES  | NULL | NULL    | NULL  |
| highest_warp_speed_issued               | BIGINT      | YES  | NULL | NULL    | NULL  |
| bool_hand_phasers_fired                 | BIGINT      | YES  | NULL | NULL    | NULL  |
| bool_ship_phasers_fired                 | BIGINT      | YES  | NULL | NULL    | NULL  |
| bool_ship_photon_torpedoes_fired        | BIGINT      | YES  | NULL | NULL    | NULL  |
| cnt_transporter_pax                     | BIGINT      | YES  | NULL | NULL    | NULL  |
| cnt_damn_it_jim_quote                   | BIGINT      | YES  | NULL | NULL    | NULL  |
| cnt_im_givin_her_all_shes_got_quote     | BIGINT      | YES  | NULL | NULL    | NULL  |
| cnt_highly_illogical_quote              | BIGINT      | YES  | NULL | NULL    | NULL  |
| bool_enterprise_saved_the_day           | BIGINT      | YES  | NULL | NULL    | NULL  |

###### `COLUMNS()` with Regular Expressions

The `COLUMNS` expression can accept a string parameter that is a regular expression and will return all column names that match the pattern. How did warp change over the first season? Let’s examine any column name that contains the word `warp`.

```sql
SELECT
    episode_num,
    COLUMNS('.*warp.*')
FROM trek_facts;
```


| episode_num | cnt_warp_speed_orders | highest_warp_speed_issued |
| :---------- | :-------------------- | :------------------------ |
| 0           | 1                     | 1                         |
| 1           | 0                     | 0                         |
| 2           | 1                     | 1                         |
| 3           | 1                     | 0                         |
| ...         | ...                   | ...                       |
| 27          | 1                     | 1                         |
| 28          | 0                     | 0                         |
| 29          | 2                     | 8                         |

The `COLUMNS` expression can also be wrapped by other functions to apply those functions to each selected column. Let’s simplify the above query to look at the maximum values across all episodes:

```sql
SELECT
    max(COLUMNS('.*warp.*'))
FROM trek_facts;
```


| max(trek_facts.cnt_warp_speed_orders) | max(trek_facts.highest_warp_speed_issued) |
| :------------------------------------ | :---------------------------------------- |
| 5                                     | 8                                         |

We can also create a `WHERE` clause that applies across multiple columns. All columns must match the filter criteria, which is equivalent to combining them with `AND`. Which episodes had at least 2 warp speed orders and at least a warp speed level of 2?

```sql
SELECT
    episode_num,
    COLUMNS('.*warp.*')
FROM trek_facts
WHERE
    COLUMNS('.*warp.*') >= 2;
    -- cnt_warp_speed_orders >= 2 
    -- AND 
    -- highest_warp_speed_issued >= 2
```


| episode_num | cnt_warp_speed_orders | highest_warp_speed_issued |
| :---------- | :-------------------- | :------------------------ |
| 14          | 3                     | 7                         |
| 17          | 2                     | 7                         |
| 18          | 2                     | 8                         |
| 29          | 2                     | 8                         |

##### `COLUMNS()` with `EXCLUDE` and `REPLACE`

Individual columns can also be either excluded or replaced prior to applying calculations on them. For example, since our dataset only includes season 1, we do not need to find the `max` of that column. It would be highly illogical. 

```sql
SELECT
    max(COLUMNS(* EXCLUDE season_num))
FROM trek_facts;
```


| max(trek_facts.<br>episode_num) | max(trek_facts.<br>aired_date) | max(trek_facts.<br>cnt_kirk_hookups) | ...  | max(trek_facts.<br>bool_enterprise_saved_the_day) |
| :------------------------------ | :----------------------------- | :----------------------------------- | :--- | :------------------------------------------------ |
| 29                              | 1967-04-13                     | 2                                    | ...  | 1                                                 |

The `REPLACE` syntax is also useful when applied to a dynamic set of columns. In this example, we want to convert the dates into timestamps prior to finding the maximum value in each column. Previously this would have required an entire subquery or CTE to pre-process just that single column!

```sql
SELECT
    max(COLUMNS(* REPLACE aired_date::timestamp AS aired_date))
FROM trek_facts;
```


| max(trek_facts.<br>season_num) | max(trek_facts.<br>episode_num) | max(aired_date := <br>CAST(aired_date AS TIMESTAMP)) | ...  | max(trek_facts.<br>bool_enterprise_saved_the_day) |
| :----------------------------- | :------------------------------ | :--------------------------------------------------- | :--- | :------------------------------------------------ |
| 1                              | 29                              | 1967-04-13 00:00:00                                  | ...  | 1                                                 |

##### `COLUMNS()` with Lambda Functions

The most flexible way to query a dynamic set of columns is through a [lambda function](#docs:lts:sql:functions:nested::lambda-functions). This allows for any matching criteria to be applied to the names of the columns, not just regular expressions. See more details about lambda functions below. 

For example, if using the `LIKE` syntax is more comfortable, we can select columns matching a `LIKE` pattern rather than with a regular expression.

```sql
SELECT
    episode_num,
    COLUMNS(lambda col: col LIKE '%warp%')
FROM trek_facts
WHERE
    COLUMNS(lambda col: LIKE '%warp%') >= 2;
```


| episode_num | cnt_warp_speed_orders | highest_warp_speed_issued |
| :---------- | :-------------------- | :------------------------ |
| 14          | 3                     | 7                         |
| 17          | 2                     | 7                         |
| 18          | 2                     | 8                         |
| 29          | 2                     | 8                         |

##### Automatic JSON to Nested Types Conversion

The first installment in the series mentioned JSON dot notation references as future work. However, the team has gone even further! Instead of referring to JSON-typed columns using dot notation, JSON can now be [automatically parsed](https://duckdb.org/2023/03/03/json) into DuckDB’s native types for significantly faster performance, compression, as well as that friendly dot notation!

First, install and load the `httpfs` and `json` extensions if they don't come bundled with the client you are using. Then query a remote JSON file directly as if it were a table!

```sql
INSTALL httpfs;
LOAD httpfs;
INSTALL json;
LOAD json;

SELECT 
     starfleet[10].model AS starship 
FROM 'https://raw.githubusercontent.com/vlad-saling/star-trek-ipsum/master/src/content/content.json';
```

| starship                                                                                                                                                      |
| :------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| USS Farragut - NCC-1647 - Ship on which James Kirk served as a phaser station operator. Attacked by the Dikironium Cloud Creature, killing half the crew. ad. |

Now for some new SQL capabilities beyond the ideas from the prior post!

#### `FROM` First in `SELECT` Statements

When building a query, the first thing you need to know is where your data is coming `FROM`. Well then why is that the second clause in a `SELECT` statement?? No longer! DuckDB is building SQL as it should have always been – putting the `FROM` clause first! This addresses one of the longest standing complaints about SQL, and the DuckDB team implemented it in 2 days. 

```sql
FROM my_table SELECT my_column;
```

Not only that, the `SELECT` statement can be completely removed and DuckDB will assume all columns should be `SELECT`ed. Taking a look at a table is now as simple as:

```sql
FROM my_table;
-- SELECT * FROM my_table
```

Other statements like `COPY` are simplified as well.

```sql
COPY (FROM trek_facts) TO 'phaser_filled_facts.parquet';
```

This has an additional benefit beyond saving keystrokes and staying in a development flow state: autocomplete will have much more context when you begin to choose columns to query. Give the AI a helping hand!

Note that this syntax is completely optional, so your `SELECT * FROM` keyboard shortcuts are safe, even if they are obsolete... 🙂

#### Function Chaining

Many SQL blogs advise the use of CTEs instead of subqueries. Among other benefits, they are much more readable. Operations are compartmentalized into discrete chunks and they can be read in order top to bottom instead of forcing the reader to work their way inside out.

DuckDB enables the same interpretability improvement for every scalar function! Use the dot operator to chain functions together, just like in Python. The prior expression in the chain is used as the first argument to the subsequent function.

```sql
SELECT 
     ('Make it so')
          .upper()
          .string_split(' ')
          .list_aggr('string_agg','.')
          .concat('.') AS im_not_messing_around_number_one;
```


| im_not_messing_around_number_one |
| :------------------------------- |
| MAKE.IT.SO.                      |

Now compare that with the old way...

```sql
SELECT 
     concat(
          list_aggr(
               string_split(
                    upper('Make it stop'),
               ' '),
          'string_agg','.'),
     '.') AS oof;
```


| oof           |
| :------------ |
| MAKE.IT.STOP. |

#### Union by Name

DuckDB aims to blend the best of databases and dataframes. This new syntax is inspired by the [concat function in Pandas](https://pandas.pydata.org/docs/reference/api/pandas.concat.html). Rather than vertically stacking tables based on column position, columns are matched by name and stacked accordingly. Simply replace `UNION` with `UNION BY NAME` or `UNION ALL` with `UNION ALL BY NAME`.

For example, we had to add some new alien species proverbs in The Next Generation:

```sql
CREATE TABLE proverbs AS
     SELECT 
          'Revenge is a dish best served cold' AS klingon_proverb 
     UNION ALL BY NAME 
     SELECT 
          'You will be assimilated' AS borg_proverb,
          'If winning is not important, why keep score?' AS klingon_proverb;

FROM proverbs;
```


| klingon_proverb                              | borg_proverb            |
| :------------------------------------------- | :---------------------- |
| Revenge is a dish best served cold           | NULL                    |
| If winning is not important, why keep score? | You will be assimilated |

This approach has additional benefits. As seen above, not only can tables with different column orders be combined, but so can tables with different numbers of columns entirely. This is helpful as schemas migrate, and is particularly useful for DuckDB’s [multi-file reading capabilities](#docs:lts:data:multiple_files:combining_schemas::union-by-name).

#### Insert by Name

Another common situation where column order is strict in SQL is when inserting data into a table. Either the columns must match the order exactly, or all of the column names must be repeated in two locations within the query.

Instead, add the keywords `BY NAME` after the table name when inserting. Any subset of the columns in the table in any order can be inserted.

```sql
INSERT INTO proverbs BY NAME 
     SELECT 'Resistance is futile' AS borg_proverb;

SELECT * FROM proverbs;
```


| klingon_proverb                              | borg_proverb            |
| :------------------------------------------- | :---------------------- |
| Revenge is a dish best served cold           | NULL                    |
| If winning is not important, why keep score? | You will be assimilated |
| NULL                                         | Resistance is futile    |

#### Dynamic `PIVOT` and `UNPIVOT`

Historically, databases are not well-suited for pivoting operations. However, DuckDB’s `PIVOT` and `UNPIVOT` clauses can create or stack dynamic column names for a truly flexible pivoting capability! In addition to that flexibility, DuckDB also provides both the SQL standard syntax and a friendlier shorthand. 

For example, let’s take a look at some procurement forecast data just as the Earth-Romulan war was beginning:

```sql
CREATE TABLE purchases (item VARCHAR, year INTEGER, count INTEGER);

INSERT INTO purchases
    VALUES ('phasers', 2155, 1035),
           ('phasers', 2156, 25039),
           ('phasers', 2157, 95000),
           ('photon torpedoes', 2155, 255),
           ('photon torpedoes', 2156, 17899),
           ('photon torpedoes', 2157, 87492);

FROM purchases;
```


| item             | year | count |
| :--------------- | :--- | :---- |
| phasers          | 2155 | 1035  |
| phasers          | 2156 | 25039 |
| phasers          | 2157 | 95000 |
| photon torpedoes | 2155 | 255   |
| photon torpedoes | 2156 | 17899 |
| photon torpedoes | 2157 | 87492 |

It is easier to compare our phaser needs to our photon torpedo needs if each year’s data is visually close together. Let’s pivot this into a friendlier format! Each year should receive its own column (but each year shouldn’t need to be specified in the query!), we want to sum up the total `count`, and we still want to keep a separate group (row) for each `item`. 

```sql
CREATE TABLE pivoted_purchases AS
     PIVOT purchases 
          ON year 
          USING sum(count) 
          GROUP BY item;

FROM pivoted_purchases;
```


| item             | 2155 | 2156  | 2157  |
| :--------------- | :--- | :---- | :---- |
| phasers          | 1035 | 25039 | 95000 |
| photon torpedoes | 255  | 17899 | 87492 |

Looks like photon torpedoes went on sale...

Now imagine the reverse situation. Scotty in engineering has been visually analyzing and manually constructing his purchases forecast. He prefers things pivoted so it’s easier to read. Now you need to fit it back into the database! This war may go on for a bit, so you may need to do this again next year. Let’s write an `UNPIVOT` query to return to the original format that can handle any year.

The `COLUMNS` expression will use all columns except `item`. After stacking, the column containing the column names from `pivoted_purchases` should be renamed to `year`, and the values within those columns represent the `count`. The result is the same dataset as the original.

```sql
UNPIVOT pivoted_purchases
     ON COLUMNS(* EXCLUDE item)
     INTO
          NAME year
          VALUE count;
```


| item             | year | count |
| :--------------- | :--- | :---- |
| phasers          | 2155 | 1035  |
| phasers          | 2156 | 25039 |
| phasers          | 2157 | 95000 |
| photon torpedoes | 2155 | 255   |
| photon torpedoes | 2156 | 17899 |
| photon torpedoes | 2157 | 87492 |

More examples are included as a part of our [DuckDB 0.8.0 announcement post](https://duckdb.org/2023/05/17/announcing-duckdb-080#new-sql-features), and the [`PIVOT`](#docs:lts:sql:statements:pivot) and [`UNPIVOT`](#docs:lts:sql:statements:unpivot) documentation pages highlight more complex queries.

Stay tuned for a future post to cover what is happening behind the scenes!

#### List Lambda Functions

List lambdas allow for operations to be applied to each item in a list. These do not need to be pre-defined – they are created on the fly within the query.

In this example, a lambda function is used in combination with the `list_transform` function to shorten each official ship name.

```sql
SELECT 
     (['Enterprise NCC-1701', 'Voyager NCC-74656', 'Discovery NCC-1031'])
          .list_transform(lambda x: x.string_split(' ')[1]) AS short_name;
```


| ship_name                        |
| :------------------------------- |
| [Enterprise, Voyager, Discovery] |

Lambdas can also be used to filter down the items in a list. The lambda returns a list of booleans, which is used by the `list_filter` function to select specific items. The `contains` function is using the [function chaining](#::function-chaining) described earlier.

```sql
SELECT 
     (['Enterprise NCC-1701', 'Voyager NCC-74656', 'Discovery NCC-1031'])
          .list_filter(lambda x: x.contains('1701')) AS the_original;
```


| the_original          |
| :-------------------- |
| [Enterprise NCC-1701] |

#### List Comprehensions

What if there was a simple syntax to both modify and filter a list? DuckDB takes inspiration from Python’s approach to list comprehensions to dramatically simplify the above examples. List comprehensions are syntactic sugar – these queries are rewritten into lambda expressions behind the scenes!

Within brackets, first specify the transformation that is desired, then indicate which list should be iterated over, and finally include the filter criteria. 

```sql
SELECT 
     [x.string_split(' ')[1] 
     FOR x IN ['Enterprise NCC-1701', 'Voyager NCC-74656', 'Discovery NCC-1031'] 
     IF x.contains('1701')] AS ready_to_boldly_go;
```


| ready_to_boldly_go |
| :----------------- |
| [Enterprise]       |

#### Exploding Struct.*

A struct in DuckDB is a set of key/value pairs. Behind the scenes, a struct is stored with a separate column for each key. As a result, it is computationally easy to explode a struct into separate columns, and now it is also syntactically simple as well! This is another example of allowing SQL to handle dynamic column names.

```sql
WITH damage_report AS (
     SELECT {'gold_casualties':5, 'blue_casualties':15, 'red_casualties': 10000} AS casualties
) 
FROM damage_report
SELECT 
     casualties.*;
```


| gold_casualties | blue_casualties | red_casualties |
| :-------------- | :-------------- | :------------- |
| 5               | 15              | 10000          |

#### Automatic Struct Creation

DuckDB exposes an easy way to convert any table into a single-column struct. Instead of `SELECT`ing column names, `SELECT` the table name itself.

```sql
WITH officers AS (
     SELECT 'Captain' AS rank, 'Jean-Luc Picard' AS name 
     UNION ALL 
     SELECT 'Lieutenant Commander', 'Data'
) 
FROM officers 
SELECT officers;
```


| officers                                     |
| :------------------------------------------- |
| {'rank': Captain, 'name': Jean-Luc Picard}   |
| {'rank': Lieutenant Commander, 'name': Data} |

#### Union Data Type

DuckDB utilizes strong typing to provide high performance and enforce data quality. However, DuckDB is also as forgiving as possible using approaches like implicit casting to avoid always having to cast between data types. 

Another way DuckDB enables flexibility is the new `UNION` data type. A `UNION` data type allows for a single column to contain multiple types of values. This can be thought of as an “opt-in” to SQLite’s flexible data typing rules (the opposite direction of SQLite’s recently announced [strict tables](https://www.sqlite.org/stricttables.html)).

By default DuckDB will seek the common denominator of data types when combining tables together. The below query results in a `VARCHAR` column:

```sql
SELECT 'The Motion Picture' AS movie UNION ALL 
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5 UNION ALL
SELECT 6 UNION ALL
SELECT 'First Contact';
```


| movie              |
| :----------------- |
| The Motion Picture |
| First Contact      |
| 6                  |
| 5                  |
| 4                  |
| 3                  |
| 2                  |

However, if a `UNION` type is used, each individual row retains its original data type. A `UNION` is defined using key-value pairs with the key as a name and the value as the data type. This also allows the specific data types to be pulled out as individual columns:

```sql
CREATE TABLE movies (
     movie UNION(num INTEGER, name VARCHAR)
);
INSERT INTO movies VALUES
     ('The Motion Picture'), (2), (3), (4), (5), (6), ('First Contact');

FROM movies 
SELECT 
     movie,
     union_tag(movie) AS type,
     movie.name,
     movie.num;
```


| movie              | type | name               | num  |
| :----------------- | :--- | :----------------- | :--- |
| The Motion Picture | name | The Motion Picture |      |
| 2                  | num  |                    | 2    |
| 3                  | num  |                    | 3    |
| 4                  | num  |                    | 4    |
| 5                  | num  |                    | 5    |
| 6                  | num  |                    | 6    |
| First Contact      | name | First Contact      |      |

#### Additional Friendly Features

Several other friendly features are worth mentioning and some are powerful enough to warrant their own blog posts. 

DuckDB takes a nod from the [`describe` function in Pandas](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html) and implements a `SUMMARIZE` keyword that will calculate a variety of statistics about each column in a dataset for a quick, high-level overview. Simply prepend `SUMMARIZE` to any table or `SELECT` statement. 

Have a look at the [correlated subqueries post](https://duckdb.org/2023/05/26/correlated-subqueries-in-sql) to see how to use subqueries that refer to each others’ columns. DuckDB’s advanced optimizer improves correlated subquery performance by orders of magnitude, allowing for queries to be expressed as naturally as possible. What was once an anti-pattern for performance reasons can now be used freely!

DuckDB has added more ways to `JOIN` tables together that make expressing common calculations much easier. Some like `LATERAL`, `ASOF`, `SEMI`, and `ANTI` joins are present in other systems, but have high-performance implementations in DuckDB. DuckDB also adds a new `POSITIONAL` join that combines by the row numbers in each table to match the commonly used Pandas capability of joining on row number indexes. See the [`JOIN` documentation](#docs:lts:sql:query_syntax:from) for details, and look out for a blog post describing DuckDB’s state of the art `ASOF` joins!

#### Summary and Future Work

DuckDB aims to be the easiest database to use. Fundamental architectural decisions to be in-process, have zero dependencies, and have strong typing contribute to this goal, but the friendliness of its SQL dialect has a strong impact as well. By extending the industry-standard PostgreSQL dialect, DuckDB aims to provide the simplest way to express the data transformations you need. These changes range from altering the ancient clause order of the `SELECT` statement to begin with `FROM`, allowing a fundamentally new way to use functions with chaining, to advanced nested data type calculations like list comprehensions. Each of these features are available in the 0.8.1 release.

Future work for friendlier SQL includes:
* Lambda functions with more than 1 argument, like `list_zip`
* Underscores as digit separators (Ex: `1_000_000` instead of `1000000`)
* Extension user experience, including autoloading
* Improvements to file globbing
* Your suggestions!

Please let us know what areas of SQL can be improved! We welcome your feedback on [Discord](https://discord.duckdb.org/) or [GitHub](https://github.com/duckdb/duckdb/discussions).

Live long and prosper! 🖖

## DuckDB's AsOf Joins: Fuzzy Temporal Lookups

**Publication date:** 2023-09-15

**Author:** Richard Wesley

**TL;DR:** DuckDB supports AsOf Joins – a way to match nearby values. They are especially useful for searching event tables for temporal analytics.

Do you have time series data that you want to join,
but the timestamps don't quite match?
Or do you want to look up a value that changes over time
using the times in another table?
And did you end up writing convoluted (and slow) inequality joins to get your results?
Then this post is for you!

#### What Is an AsOf Join?

Time series data is not always perfectly aligned.
Clocks may be slightly off, or there may be a delay between cause and effect.
This can make connecting two sets of ordered data challenging.
AsOf Joins are a tool for solving this and other similar problems.

One of the problems that AsOf Joins are used to solve is
finding the value of a varying property at a specific point in time.
This use case is so common that it is where the name came from:
_Give me the value of the property **as of this time.**_

More generally, however, AsOf joins embody some common temporal analytic semantics,
which can be cumbersome and slow to implement in standard SQL.

##### Portfolio Example

Let's start with a concrete example.
Suppose we have a table of stock [`prices`](https://duckdb.org/data/prices.csv) with timestamps:


| ticker | when                | price |
| :----- | :------------------ | ----: |
| APPL   | 2001-01-01 00:00:00 |     1 |
| APPL   | 2001-01-01 00:01:00 |     2 |
| APPL   | 2001-01-01 00:02:00 |     3 |
| MSFT   | 2001-01-01 00:00:00 |     1 |
| MSFT   | 2001-01-01 00:01:00 |     2 |
| MSFT   | 2001-01-01 00:02:00 |     3 |
| GOOG   | 2001-01-01 00:00:00 |     1 |
| GOOG   | 2001-01-01 00:01:00 |     2 |
| GOOG   | 2001-01-01 00:02:00 |     3 |

We have another table containing portfolio [`holdings`](https://duckdb.org/data/holdings.csv) at various points in time:


| ticker | when                | shares |
| :----- | :------------------ | -----: |
| APPL   | 2000-12-31 23:59:30 |   5.16 |
| APPL   | 2001-01-01 00:00:30 |   2.94 |
| APPL   | 2001-01-01 00:01:30 |  24.13 |
| GOOG   | 2000-12-31 23:59:30 |   9.33 |
| GOOG   | 2001-01-01 00:00:30 |  23.45 |
| GOOG   | 2001-01-01 00:01:30 |  10.58 |
| DATA   | 2000-12-31 23:59:30 |   6.65 |
| DATA   | 2001-01-01 00:00:30 |  17.95 |
| DATA   | 2001-01-01 00:01:30 |  18.37 |

We can compute the value of each holding at that point in time by finding
the most recent price before the holding's timestamp by using an AsOf Join:

```sql
SELECT h.ticker, h.when, price * shares AS value
FROM holdings h ASOF JOIN prices p
  ON h.ticker = p.ticker
 AND h.when >= p.when;
```

This attaches the value of the holding at that time to each row:


| ticker | when                | value |
| :----- | :------------------ | ----: |
| APPL   | 2001-01-01 00:00:30 |  2.94 |
| APPL   | 2001-01-01 00:01:30 | 48.26 |
| GOOG   | 2001-01-01 00:00:30 | 23.45 |
| GOOG   | 2001-01-01 00:01:30 | 21.16 |

It essentially executes a function defined by looking up nearby values in the `prices` table.
Note also that missing `ticker` values do not have a match and don't appear in the output.

##### Outer AsOf Joins

Because AsOf produces at most one match from the right hand side,
the left side table will not grow as a result of the join,
but it could shrink if there are missing times on the right.
To handle this situation, you can use an *outer* AsOf Join:

```sql
SELECT h.ticker, h.when, price * shares AS value
FROM holdings h ASOF LEFT JOIN prices p
  ON h.ticker = p.ticker
 AND h.when >= p.when
ORDER BY ALL;
```

As you might expect, this will produce `NULL` prices and values instead of dropping left side rows
when there is no ticker or the time is before the prices begin.


| ticker | when                | value |
| :----- | :------------------ | ----: |
| APPL   | 2000-12-31 23:59:30 |       |
| APPL   | 2001-01-01 00:00:30 |  2.94 |
| APPL   | 2001-01-01 00:01:30 | 48.26 |
| GOOG   | 2000-12-31 23:59:30 |       |
| GOOG   | 2001-01-01 00:00:30 | 23.45 |
| GOOG   | 2001-01-01 00:01:30 | 21.16 |
| DATA   | 2000-12-31 23:59:30 |       |
| DATA   | 2001-01-01 00:00:30 |       |
| DATA   | 2001-01-01 00:01:30 |       |

##### Windowing Alternative

Standard SQL can implement this kind of join,
but you need to use a window function and an inequality join.
These can both be fairly expensive operations, but the query would look like this:

```sql
WITH state AS (
    SELECT
        ticker,
        price,
        "when",
        lead("when", 1, 'infinity')
            OVER (PARTITION BY ticker ORDER BY "when") AS end
    FROM prices
)
SELECT h.ticker, h.when, price * shares AS value
FROM holdings h
INNER JOIN state s
        ON h.ticker = s.ticker
      AND h.when >= s.when
      AND h.when < s.end;
```

The default value of `infinity` is used to make sure there is an end value for the last row that can be compared.
Here is what the `state` CTE looks like for our example:


| ticker | price | when                | end                 |
| :----- | ----: | :------------------ | :------------------ |
| APPL   |     1 | 2001-01-01 00:00:00 | 2001-01-01 00:01:00 |
| APPL   |     2 | 2001-01-01 00:01:00 | 2001-01-01 00:02:00 |
| APPL   |     3 | 2001-01-01 00:02:00 | infinity            |
| GOOG   |     1 | 2001-01-01 00:00:00 | 2001-01-01 00:01:00 |
| GOOG   |     2 | 2001-01-01 00:01:00 | 2001-01-01 00:02:00 |
| GOOG   |     3 | 2001-01-01 00:02:00 | infinity            |
| MSFT   |     1 | 2001-01-01 00:00:00 | 2001-01-01 00:01:00 |
| MSFT   |     2 | 2001-01-01 00:01:00 | 2001-01-01 00:02:00 |
| MSFT   |     3 | 2001-01-01 00:02:00 | infinity            |

In the case where there is no equality condition, the planner would have to use an inequality join,
which can be very expensive.
And even in the equality condition case, 
the resulting hash join may end up with long chains of identical `ticker` keys that will all match and need pruning.

#### Why AsOf?

If SQL can compute AsOf joins already, why do we need a new join type?
There are two big reasons: expressibility and performance.
The windowing alternative is more verbose and harder to understand than the AsOf syntax,
so making it easier to say what you are doing helps others (or even you!) understand what is happening.

The syntax also makes it easier for DuckDB to understand what you want and produce your results faster.
The window and inequality join version loses the valuable information that the intervals do not overlap.
It also prevents the query optimizer from moving the join 
because SQL insists that windowing happens *after* joins.
By treating the operation *as a join* with *known data constraints*,
DuckDB can move the join for performance and use a tailored join algorithm.
The algorithm we use is to sort the right side table and then do a kind of merge join with the left side values.
But unlike a standard merge join, 
AsOf can stop searching when it finds the first match because there is at most one match.

##### State Tables

You may be wondering why the Common Table Expression in the `WITH` clause was called *state*.
This is because the `prices` table is really an example of what in temporal analytics is called an *event table* .
The rows of an event table contain timestamps and what happened at that time (i.e., events).
The events in the `prices` table are changes to the price of a stock.
Another common example of an event table is a structured log file:
Each row of the log records when something "happened" – usually a change to a part of the system.

Event tables are difficult to work with because each fact only has the start time.
In order to know whether the fact is still true (or true at a specific time) you need the end time as well.
A table with both the start and end time is called a *state table*.
Converting event tables to state tables is a common temporal data preparation task,
and the windowing CTE above shows how to do it in general using SQL.

##### Sentinel Values

One limitation of the windowing approach is that 
the ordering type needs to have sentinel value that can be used if it does not support `infinity`,
either an unused value or `NULL`.

Both of these choices are potentially problematic.
In the first case, it may not be easy to determine an upper sentinel value 
(suppose the ordering was a string column?)
In the second case, you would need to write the condition as 
`h.when < s.end OR s.end IS NULL`
and using an `OR` like this in a join condition makes comparisons slow and hard to optimize.
Moreover, if the ordering column is already using `NULL` to indicate missing values,
this option is not available.

For most state tables, there are suitable choices (e.g., large dates) 
but one of the advantages of AsOf is that it can avoid having to design a state table 
if it is not needed for the analytic task.

##### Event Table Variants

So far we have been using a standard type of event table 
where the timestamps are assumed to be the start of the state transitions.
But AsOf can now use any inequality, which allows it to handle other types of event tables.

To explore this, let's use two very simple tables with no equality conditions.
The build side will just have four integer "timestamps" with alphabetic values:


| Time | Value |
| ---: | ----: |
|    1 |     a |
|    2 |     b |
|    3 |     c |
|    4 |     d |

The probe table will just be the time values plus the midpoints,
and we can make a table showing what value each probe time matches
for greater than or equal to:


| Probe | >=  |
| ----: | --- |
|   0.5 |     |
|   1.0 | a   |
|   1.5 | a   |
|   2.0 | b   |
|   2.5 | b   |
|   3.0 | c   |
|   3.5 | c   |
|   4.0 | d   |
|   4.5 | d   |

This shows us that the interval a probe value matches is in the half-open interval `[Tn, Tn+1)`.

Now let's see what happens if use strictly greater than as the inequality:


| Probe | >   |
| ----: | --- |
|   0.5 |     |
|   1.0 |     |
|   1.5 | a   |
|   2.0 | a   |
|   2.5 | b   |
|   3.0 | b   |
|   3.5 | c   |
|   4.0 | c   |
|   4.5 | d   |

Now we can see that the interval a probe value matches is in the half-open interval `(Tn, Tn+1]`.
The only difference is that the interval is closed at the end instead of the beginning.
This means that for this inequality type, the time is not part of the interval.

What if the inequality goes in the other direction, say less than or equal to?


| Probe | <=  |
| ----: | --- |
|   0.5 | a   |
|   1.0 | a   |
|   1.5 | b   |
|   2.0 | b   |
|   2.5 | c   |
|   3.0 | c   |
|   3.5 | d   |
|   4.0 | d   |
|   4.5 |     |

Again, we have half-open intervals, but this time we are matching the _previous_ interval `(Tn-1, Tn]`.
One way to interpret this is that the times in the build table are the _end_ of the interval,
instead of the beginning.
Also, unlike greater than or equal to,
the interval is closed at the end instead of the beginning.
Adding this to what we found for strictly greater than,
we can interpret this as meaning that the lookup times are part of the interval
when non-strict inequalities are used.

We can check this by looking at the last inequality: strictly less than:


| Probe | <   |
| ----: | --- |
|   0.5 | a   |
|   1.0 | b   |
|   1.5 | b   |
|   2.0 | c   |
|   2.5 | c   |
|   3.0 | d   |
|   3.5 | d   |
|   4.0 |     |
|   4.5 |     |

In this case the matching intervals are `[Tn-1, Tn)`.
This is a strict inequality, so the table time is not in the interval,
and it is a less than, so the time is the end of the interval.

To sum up, here is the full list:


| Inequality | Interval   |
| ---------- | ---------- |
| >          | (Tn, Tn+1] |
| >=         | [Tn, Tn+1) |
| <=         | (Tn-1, Tn] |
| <          | [Tn-1, Tn) |

We now have two natural interpretations of what the inequalities mean:

* The greater (resp. less) than inequalities mean the time is the beginning (resp. end) of the interval.
* The strict (resp. non-strict) inequalities mean the time is excluded from (resp. included in) the interval. 

So if we know whether the time marks the start or the end of the event,
and whether the time is include or excluded, we can choose the appropriate AsOf inequality.

##### Usage

So far we have been explicit about specifying the conditions for AsOf,
but SQL also has a simplified join condition syntax
for the common case where the column names are the same in both tables.
This syntax uses the `USING` keyword to list the fields that should be compared for equality.
AsOf also supports this syntax, but with two restrictions:

* The last field is the inequality
* The inequality is `>=` (the most common case)

Our first query can then be written as:

```sql
SELECT ticker, h.when, price * shares AS value
FROM holdings h
ASOF JOIN prices p
    USING(ticker, "when");
```

Be aware that if you don't explicitly list the columns in the `SELECT`,
the ordering field value will be the probe value, not the build value.
For a natural join, this is not an issue because all the conditions are equalities,
but for AsOf, one side has to be chosen.
Since AsOf can be viewed as a lookup function,
it is more natural to return the "function arguments" than the function internals.

##### Under the Hood

What an AsOf Join is really doing is allowing you to treat an event table as a state table for join operations.
By knowing the semantics of the join, it can avoid creating a full state table
and be more efficient than a general inequality join.

Let's start by looking at how the windowing version works.
Remember that we used this query to convert the event table to a state table:

```sql
WITH state AS (
    SELECT
        ticker,
        price,
        "when",
        lead("when", 1, 'infinity')
            OVER (PARTITION BY ticker ORDER BY "when") AS end
    FROM prices
);
```

The state table CTE is created by hash partitioning the table on `ticker`,
sorting on `when` and then computing another column that is just `when` shifted down by one.
The join is then implemented with a hash join on `ticker` and two comparisons on `when`.

If there was no `ticker` column (e.g., the prices were for a single item)
then the join would be implemented using our inequality join operator,
which would materialise and sort both sides because it doesn't know that the ranges are disjoint.

The AsOf operator uses all three operator pipeline APIs to consolidate and collect rows.
During the `sink` phase, AsOf hash partitions and sorts the right hand side to make a temporary state table.
(In fact it uses the same code as Window,
but without unnecessarily materialising the end column.)
During the `operator` phase, it filters out (or returns) rows that cannot match
because of `NULL` values in the predicate expressions,
and then hash partitions and sorts the remaining rows into a cache.
Finally, during the `source` phase, it matches hash partitions
and then merge joins the sorted values within each hash partition.

#### Benchmarks

Because AsOf joins can be implemented in various ways using standard SQL queries,
benchmarking is really about comparing the various alternatives.

One alternative is a debugging `PRAGMA` for AsOf called `debug_asof_iejoin`, 
which implements the join using Window and IEJoin.
This allows us to easily toggle between the implementations and compare runtimes.

Other alternatives combine equi-joins and window functions.
The equi-join is used to implement the equality matching conditions,
and the window is used to select the closest inequality.
We will now look at two different windowing techniques and compare their performance.
If you wish to skip this section,
the bottom line is that while they are sometimes a bit faster,
the AsOf join has the most consistent behavior of all the algorithms.

##### Window as State Table

The first benchmark compares a hash join with a state table.
It probes a 5M row table of values
built from 100K timestamps and 50 partitioning keys
using a self-join where only 50% of the keys are present
and the timestamps have been shifted to be halfway between the originals:

```sql
CREATE OR REPLACE TABLE build AS (
    SELECT k, '2001-01-01 00:00:00'::TIMESTAMP + INTERVAL (v) MINUTE AS t, v
    FROM range(0, 100_000) vals(v), range(0, 50) keys(k)
);

CREATE OR REPLACE TABLE probe AS (
    SELECT k * 2 AS k, t - INTERVAL (30) SECOND AS t
    FROM build
);
```

The `build` table looks like this:

| k   | t                   | v   |
| --- | ------------------- | --- |
| 0   | 2001-01-01 00:00:00 | 0   |
| 0   | 2001-01-01 00:01:00 | 1   |
| 0   | 2001-01-01 00:02:00 | 2   |
| 0   | 2001-01-01 00:03:00 | 3   |
| ... | ...                 | ... |

and the probe table looks like this (with only even values for k):

| k   | t                   |
| --- | ------------------- |
| 0   | 2000-12-31 23:59:30 |
| 0   | 2001-01-01 00:00:30 |
| 0   | 2001-01-01 00:01:30 |
| 0   | 2001-01-01 00:02:30 |
| 0   | 2001-01-01 00:03:30 |
| ... | ...                 |

The benchmark just does the join and sums up the `v` column:

```sql
SELECT sum(v)
FROM probe
ASOF JOIN build USING(k, t);
```

The debugging `PRAGMA` does not allow us to use a hash join,
but we can create the state table in a CTE again and use an inner join:

```sql
-- Hash Join implementation
WITH state AS (
    SELECT k, 
      t AS begin, 
      v, 
      lead(t, 1, 'infinity'::TIMESTAMP) OVER (PARTITION BY k ORDER BY t) AS end
    FROM build
)
SELECT sum(v)
FROM probe p
INNER JOIN state s 
        ON p.t >= s.begin
       AND p.t < s.end
       AND p.k = s.k;
```

This works because the planner assumes that equality conditions are more selective
than inequalities and generates a hash join with a filter.

Running the benchmark, we get results like this:


| Algorithm  | Median of 5 |
| :--------- | ----------: |
| AsOf       |     0.425 s |
| IEJoin     |     3.522 s |
| State Join |   192.460 s |

The runtime improvement of AsOf over IEJoin here is about 9×.
The horrible performance of the Hash Join is caused by the long (100K) bucket chains in the hash table.

The second benchmark tests the case where the probe side is about 10× smaller than the build side:

```sql
CREATE OR REPLACE TABLE probe AS
    SELECT k, 
      '2021-01-01T00:00:00'::TIMESTAMP +
          INTERVAL (random() * 60 * 60 * 24 * 365) SECOND AS t,
    FROM range(0, 100_000) tbl(k);

CREATE OR REPLACE TABLE build AS
    SELECT r % 100_000 AS k, 
      '2021-01-01T00:00:00'::TIMESTAMP +
          INTERVAL (random() * 60 * 60 * 24 * 365) SECOND AS t,
      (random() * 100_000)::INTEGER AS v
    FROM range(0, 1_000_000) tbl(r);

SELECT sum(v)
FROM probe p
ASOF JOIN build b
  ON p.k = b.k
 AND p.t >= b.t

-- Hash Join Version
WITH state AS (
  SELECT k, 
    t AS begin, 
    v, 
    lead(t, 1, 'infinity'::TIMESTAMP)
        OVER (PARTITION BY k ORDER BY t) AS end
  FROM build
)
SELECT sum(v)
FROM probe p
INNER JOIN state s
        ON p.t >= s.begin
       AND p.t < s.end
       AND p.k = s.k;
```


| Algorithm  | Median of 5 runs |
| :--------- | ---------------: |
| State Join |          0.065 s |
| AsOf       |          0.077 s |
| IEJoin     |         49.508 s |

Now the runtime improvement of AsOf over IEJoin is huge (~500×)
because it can leverage the partitioning to eliminate almost all of the equality mismatches.

The Hash Join implementation does much better here because 
the optimizer notices that the probe side is smaller and builds the hash table on the "probe" table.
Also, the probe values here are unique, so the hash table chains are minimal.

##### Window with Ranking

Another way to use the window operator is to:

* Join the tables on the equality predicates
* Filter to pairs where the build time is before the probe time
* Partition the result on both the equality keys _and_ the probe timestamp
* Sort the partitions on the build timestamp _descending_
* Filter out all value except rank 1 (i.e., the largest build time <= the probe time)

The query looks like:

```sql
WITH win AS (
    SELECT p.k, p.t, v,
        rank() OVER (PARTITION BY p.k, p.t ORDER BY b.t DESC) AS r
    FROM probe p INNER JOIN build b
      ON p.k = b.k
    AND p.t >= b.t
    QUALIFY r = 1
) 
SELECT k, t, v
FROM win;
```

The advantage of this windowing query is that it does not require sentinel values,
so it can work with any data type.
The disadvantage is that it creates many more partitions 
because it includes both timestamps, which requires more complex sorting.
Moreover, because it applies the window _after_ the join,
it can produce huge intermediates that can result in external sorting
and expensive out-of-memory operations.

For this benchmark, we will be using three build tables,
and two probe tables, all containing 10K integer equality keys.
The probe tables have either 1 or 15 timestamps per key:

```sql
CREATE OR REPLACE TABLE probe15 AS
    SELECT k, t
    FROM range(10_000) cs(k), 
         range('2022-01-01'::TIMESTAMP, '2023-01-01'::TIMESTAMP, INTERVAL 26 DAY) ts(t);

CREATE OR REPLACE TABLE probe1 AS
    SELECT k, '2022-01-01'::TIMESTAMP t
    FROM range(10_000) cs(k);
```

The build tables are much larger and have approximately
10/100/1000× the number of entries as the 15 element tables:

```sql
-- 10:1
CREATE OR REPLACE TABLE build10 AS
    SELECT k, t, (random() * 1000)::DECIMAL(7, 2) AS v
    FROM range(10_000) ks(k), 
         range('2022-01-01'::TIMESTAMP, '2023-01-01'::TIMESTAMP, INTERVAL 59 HOUR) ts(t);

-- 100:1
CREATE OR REPLACE TABLE build100 AS
    SELECT k, t, (random() * 1000)::DECIMAL(7, 2) AS v
    FROM range(10_000) ks(k), 
         range('2022-01-01'::TIMESTAMP, '2023-01-01'::TIMESTAMP, INTERVAL 350 MINUTE) ts(t);

-- 1000:1
CREATE OR REPLACE TABLE build1000 AS
    SELECT k, t, (random() * 1000)::DECIMAL(7, 2) AS v
    FROM range(10_000) ks(k), 
         range('2022-01-01'::TIMESTAMP, '2023-01-01'::TIMESTAMP, INTERVAL 35 MINUTE) ts(t);
```

The AsOf join queries are:

```sql
-- AsOf/IEJoin
SELECT p.k, p.t, v
FROM probe p ASOF JOIN build b
  ON p.k = b.k
 AND p.t >= b.t
ORDER BY 1, 2;

-- Rank
WITH win AS (
    SELECT p.k, p.t, v,
        rank() OVER (PARTITION BY p.k, p.t ORDER BY b.t DESC)  AS r
    FROM probe p INNER JOIN build b
      ON p.k = b.k
    AND p.t >= b.t
    QUALIFY r = 1
)
SELECT k, t, v
FROM win
ORDER BY 1, 2;
```

The results are shown here:

![](../images/asof-rank.png)


(Median of 5 except for Rank/15/1000).

* For all ratios with 15 probes, AsOf is the most performant.
* For small ratios with 15 probes, Rank beats IEJoin (both with windowing), but by 100:1 it is starting to explode.
* For single element probes, Rank is most effective, but even there, its edge over AsOf is only about 50% at scale.

This shows that AsOf could be possibly be improved upon, but predicting where that happens would be tricky,
and getting it wrong would have enormous costs.

#### Future Work

DuckDB can now execute AsOf joins for all inequality types with reasonable performance.
In some cases, the performance gain is several orders of magnitude over the standard SQL versions –
even with our fast inequality join operator.

While the current AsOf operator is completely general,
there are a couple of planning optimisations that could be applied here.

* When there are selective equality conditions, it is likely that a hash join with filtering against a materialised state table would be significantly faster. If we can detect this and suitable sentinel values are available, the planner could choose to use a hash join instead of the default AsOf implementation.
* There are also use cases where the probe table is much smaller than the build table, along with equality conditions, and performing a hash join against the *probe* table could yield significant performance improvements.

Nevertheless, remember that one of the advantages of SQL is that it is a declarative language:  
You specify *what* you want and leave it up to the database to figure out *how*.
Now that we have defined the semantics of the AsOf join,
you the user can write queries saying this is *what* you want – and we are free to keep improving the *how*!

#### Happy Joining!

One of the most interesting parts of working on DuckDB is that it stretches the traditional SQL model of unordered data.
DuckDB makes it easy to query *ordered* data sets such as data frames and Parquet files,
and when you have data like that, you expect to be able to do ordered analysis!
Implementing Fast Sorting, Fast Windowing and Fast AsOf joins is how we are making this expectation a reality.

## Announcing DuckDB 0.9.0

**Publication date:** 2023-09-26

**Authors:** Mark Raasveldt, Hannes Mühleisen

![](../images/blog/yellow-billed-duck.jpg)


The DuckDB team is happy to announce the latest DuckDB release (0.9.0). This release is named Undulata after the [Yellow-billed duck](https://en.wikipedia.org/wiki/Yellow-billed_duck) native to Africa.

To install the new version, please visit the [installation guide](https://duckdb.org/install/index.html). The full release notes can be found on [GitHub](https://github.com/duckdb/duckdb/releases/tag/v0.9.0).

#### What's New in 0.9.0

There have been too many changes to discuss them each in detail, but we would like to highlight several particularly exciting features!

* Out-of-Core Hash Aggregate
* Storage Improvements
* Index Improvements
* DuckDB-Wasm Extensions
* Extension Auto-Loading
* Improved AWS Support
* Iceberg Support
* Azure Support
* PySpark-Compatible API

Below is a summary of those new features with examples, starting with a change in our SQL dialect that is designed to produce more intuitive results by default.

#### Breaking SQL Changes

[**Struct Auto-Casting**](https://github.com/duckdb/duckdb/pull/8942). Previously the names of struct entries were ignored when determining auto-casting rules. As a result, struct field names could be silently renamed. Starting with this release, this will result in an error instead.

```sql
CREATE TABLE structs (s STRUCT(i INTEGER));
INSERT INTO structs VALUES ({'k': 42});
```

```console
Mismatch Type Error: Type STRUCT(k INTEGER) does not match with STRUCT(i INTEGER). Cannot cast STRUCTs with different names
```

Unnamed structs constructed using the `ROW` function can still be inserted into struct fields.

```sql
INSERT INTO structs VALUES (ROW(42));
```

#### Core System Improvements

**[Out-of-Core Hash Aggregates](https://github.com/duckdb/duckdb/pull/7931)** and **[Hash Aggregate Performance Improvements.](https://github.com/duckdb/duckdb/pull/8475)** When working with large data sets, memory management is always a potential pain point. By using a streaming execution engine and buffer manager, DuckDB supports many operations on larger than memory data sets. DuckDB also aims to support queries where *intermediate* results do not fit into memory by using disk-spilling techniques.

In this release, support for disk-spilling techniques is further extended through the support for out-of-core hash aggregates. Now, hash tables constructed during `GROUP BY` queries or `DISTINCT` operations that do not fit in memory due to a large number of unique groups will spill data to disk instead of throwing an out-of-memory exception. Due to the clever use of radix partitioning, performance degradation is gradual, and performance cliffs are avoided. Only the subset of the table that does not fit into memory will be spilled to disk.

The performance of our hash aggregate has also improved in general, especially when there are many groups. For example, we compute the number of unique rows in a data set with 30 million rows and 15 columns by using the following query:

```sql
SELECT count(*) FROM (SELECT DISTINCT * FROM tbl);
```

If we keep all the data in memory, the query should use around 6 GB. However, we can still complete the query if less memory is available. In the table below, we can see how the runtime is affected by lowering the memory limit:

| memory limit | v0.8.1 | v0.9.0 |
| -----------: | -----: | -----: |
|        10 GB | 8.52 s | 2.91 s |
|         9 GB | 8.52 s | 3.45 s |
|         8 GB | 8.52 s | 3.45 s |
|         7 GB | 8.52 s | 3.47 s |
|         6 GB |    OOM | 3.41 s |
|         5 GB |    OOM | 3.67 s |
|         4 GB |    OOM | 3.87 s |
|         3 GB |    OOM | 4.20 s |
|         2 GB |    OOM | 4.39 s |
|         1 GB |    OOM | 4.91 s |

**[Compressed Materialization.](https://github.com/duckdb/duckdb/pull/7644)** DuckDB's streaming execution engine has a low memory footprint, but more memory is required for operations such as grouped aggregation. The memory footprint of these operations can be reduced by compression. DuckDB already uses [many compression techniques in its storage format](https://duckdb.org/2022/10/28/lightweight-compression), but many of these techniques are too costly to use during query execution. However, certain lightweight compression techniques are so cheap that the benefit of the reducing memory footprint outweight the cost of (de)compression.

In this release, we add support for compression of strings and integer types right before data goes into the grouped aggregation and sorting operators. By using statistics, both types are compressed to the smallest possible integer type. For example, if we have the following table:

```text
┌───────┬─────────┐
│  id   │  name   │
│ int32 │ varchar │
├───────┼─────────┤
│   300 │ alice   │
│   301 │ bob     │
│   302 │ eve     │
│   303 │ mallory │
│   304 │ trent   │
└───────┴─────────┘
```

The `id` column uses a 32-bit integer. From our statistics we know that the minimum value is 300, and the maximum value is 304. We can subtract 300 and cast to an 8-bit integer instead, reducing the width from 4 bytes down to 1.

The `name` column uses our internal string type, which is 16 bytes wide. However, our statistics tell us that the longest string here is only 7 bytes. We can fit this into a 64-bit integer like so:

```text
alice   -> alice005
bob     -> bob00003
eve     -> eve00003
mallory -> mallory7
trent   -> trent005
```

This reduces the width from 16 bytes down to 8. To support sorting of compressed strings, we flip the bytes on big-endian machines so that our comparison operators are still correct:

```text
alice005 -> 500ecila
bob00003 -> 30000bob
eve00003 -> 30000eve
mallory7 -> 7yrollam
trent005 -> 500tnert
```

By reducing the size of query intermediates, we can prevent/reduce spilling data to disk, reducing the need for costly I/O operations, thereby improving query performance.

**Window Function Performance Improvements ([#7831](https://github.com/duckdb/duckdb/pull/7831), [#7996](https://github.com/duckdb/duckdb/pull/7996), [#8050](https://github.com/duckdb/duckdb/pull/8050), [#8491](https://github.com/duckdb/duckdb/pull/8491)).** This release features many improvements to the performance of Window functions due to improved vectorization of the code, more re-use of partial aggregates and improved parallelism through work stealing of tasks. As a result, performance of [Window functions has improved significantly, particularly in scenarios where there are no or few partitions](https://github.com/duckdb/duckdb/issues/7809#issuecomment-1679387022).

```sql
SELECT
    sum(driver_pay) OVER (
        ORDER BY dropoff_datetime ASC
        RANGE BETWEEN
        INTERVAL 3 DAYS PRECEDING AND
        INTERVAL 0 DAYS FOLLOWING
    )
FROM tripdata;
```


| Version | Run time |
| ------: | -------: |
|  v0.8.0 |   33.8 s |
|  v0.9.0 |    3.8 s |

#### Storage Improvements

[**Vacuuming of Deleted Row Groups**](https://github.com/duckdb/duckdb/pull/7794). Starting with this release, when deleting data using `DELETE` statements, entire row groups that are deleted will be automatically cleaned up. Support is also added to [truncate the database file on checkpoint](https://github.com/duckdb/duckdb/pull/7824) which allows the database file to be reduced in size after data is deleted. Note that this only occurs if the deleted row groups are located at the end of the file. The system does not yet move around data in order to reduce the size of the file on disk. Instead, free blocks earlier on in the file are re-used to store later data.

**Index Storage Improvements ([#7930](https://github.com/duckdb/duckdb/pull/7930), [#8112](https://github.com/duckdb/duckdb/pull/8112), [#8437](https://github.com/duckdb/duckdb/pull/8437), [#8703](https://github.com/duckdb/duckdb/pull/8703))**. Many improvements have been made to both the in-memory footprint, and the on-disk footprint of ART indexes. In particular for indexes created to maintain `PRIMARY KEY`, `UNIQUE` or `FOREIGN KEY` constraints the storage and in-memory footprint is drastically reduced.

```sql
CREATE TABLE integers (i INTEGER PRIMARY KEY);
INSERT INTO integers FROM range(10000000);
```


| Version |   Size |
| ------- | -----: |
| v0.8.0  | 278 MB |
| v0.9.0  |  78 MB |

In addition, due to improvements in the manner in which indexes are stored on disk they can now be written to disk incrementally instead of always requiring a full rewrite. This allows for much quicker checkpointing for tables that have indexes.

#### Extensions

[**Extension Auto-Loading**](https://github.com/duckdb/duckdb/pull/8732). Starting from this release, DuckDB supports automatically installing and loading of trusted extensions. As many workflows rely on core extensions that are not bundled, such as `httpfs`, many users found themselves having to remember to load the required extensions up front. With this change, the extensions will instead be automatically loaded (and optionally installed) when used in a query.

For example, in Python the following code snippet now works without needing to explicitly load the `httpfs` or `json` extensions.

```python
import duckdb

duckdb.sql("FROM 'https://raw.githubusercontent.com/duckdb/duckdb/main/data/json/example_n.ndjson'")
```

The set of autoloadable extensions is limited to official extensions distributed by DuckDB Labs, and can be [found here](https://github.com/duckdb/duckdb/blob/8feb03d274892db0e7757cd62c145b18dfa930ec/scripts/generate_extensions_function.py#L298). The behavior can also be disabled using the `autoinstall_known_extensions` and `autoload_known_extensions` settings, or through the more general `enable_external_access` setting. See the [configuration options](#docs:lts:configuration:overview).

[**DuckDB-Wasm Extensions**](https://github.com/duckdb/duckdb-wasm/pull/1403). This release adds support for loadable extensions to DuckDB-Wasm. Previously, any extensions that you wanted to use with the Wasm client had to be baked in. With this release, extensions can be loaded dynamically instead. When an extension is loaded, the Wasm bundle is downloaded and the functionality of the extension is enabled. Give it a try in our [Wasm shell](https://shell.duckdb.org).

```sql
LOAD inet;
SELECT '127.0.0.1'::INET;
```

[**AWS Extension**](https://github.com/duckdb/duckdb-aws). This release marks the launch of the DuckDB AWS extension. This extension contains AWS related features that rely on the AWS SDK. Currently, the extension contains one function, `LOAD_AWS_CREDENTIALS`, which uses the AWS [Credential Provider Chain](https://docs.aws.amazon.com/sdkref/latest/guide/standardized-credentials.html#credentialProviderChain) to automatically fetch and set credentials:

```sql
CALL load_aws_credentials();
SELECT * FROM 's3://some-bucket/that/requires/authentication.parquet';
```

[See the documentation for more information](#docs:lts:core_extensions:aws).

[**Experimental Iceberg Extension**](https://github.com/duckdb/duckdb-iceberg). This release marks the launch of the DuckDB Iceberg extension. This extension adds support for reading tables stored in the [Iceberg format](https://iceberg.apache.org).

```sql
SELECT count(*)
FROM iceberg_scan('data/iceberg/lineitem_iceberg', allow_moved_paths = true);
```

[See the documentation for more information](#docs:lts:core_extensions:iceberg:overview).

[**Experimental Azure Extension**](https://github.com/duckdb/duckdb-azure). This release marks the launch of the DuckDB Azure extension. This extension allows for DuckDB to natively read data stored on Azure, in a similar manner to how it can read data stored on S3.

```sql
SET azure_storage_connection_string = '<your_connection_string>';
SELECT * FROM 'azure://<my_container>/*.csv';
```

[See the documentation for more information](#docs:lts:core_extensions:azure).

#### Clients

[**Experimental PySpark API**](https://github.com/duckdb/duckdb/pull/8083). This release features the addition of an experimental Spark API to the Python client. The API aims to be fully compatible with the PySpark API, allowing you to use the Spark API as you are familiar with but while utilizing the power of DuckDB. All statements are translated to DuckDB's internal plans using our [relational API](#docs:lts:clients:python:relational_api) and executed using DuckDB's query engine.

```python
from duckdb.experimental.spark.sql import SparkSession as session
from duckdb.experimental.spark.sql.functions import lit, col
import pandas as pd

spark = session.builder.getOrCreate()

pandas_df = pd.DataFrame({
    'age': [34, 45, 23, 56],
    'name': ['Joan', 'Peter', 'John', 'Bob']
})

df = spark.createDataFrame(pandas_df)
df = df.withColumn(
    'location', lit('Seattle')
)
res = df.select(
    col('age'),
    col('location')
).collect()

print(res)
#[
#    Row(age=34, location='Seattle'),
#    Row(age=45, location='Seattle'),
#    Row(age=23, location='Seattle'),
#    Row(age=56, location='Seattle')
#]
```

Note that the API is currently experimental and features are still missing. We are very interested in feedback. Please report any functionality that you are missing, either through Discord or on GitHub.


#### Final Thoughts

The full release notes can be [found on GitHub](https://github.com/duckdb/duckdb/releases/tag/v0.9.0). We would like to thank all of the contributors for their hard work on improving DuckDB.

## DuckDB's CSV Sniffer: Automatic Detection of Types and Dialects

**Publication date:** 2023-10-27

**Author:** Pedro Holanda

**TL;DR:** DuckDB is primarily focused on performance, leveraging the capabilities of modern file formats. At the same time, we also pay attention to flexible, non-performance-driven formats like CSV files. To create a nice and pleasant experience when reading from CSV files, DuckDB implements a CSV sniffer that automatically detects CSV dialect options, column types, and even skips dirty data. The sniffing process allows users to efficiently explore CSV files without needing to provide any input about the file format.

![](../images/blog/csv-sniffer/ducktetive.jpg)


There are many different file formats that users can choose from when storing their data. For example, there are performance-oriented binary formats like Parquet, where data is stored in a columnar format, partitioned into row groups, and heavily compressed. However, Parquet is known for its rigidity, requiring specialized systems to read and write these files.

On the other side of the spectrum, there are files with the CSV (comma-separated values) format, which I like to refer to as the 'Woodstock of data'. CSV files offer the advantage of flexibility; they are structured as text files, allowing users to manipulate them with any text editor, and nearly any data system can read and execute queries on them.

However, this flexibility comes at a cost. Reading a CSV file is not a trivial task, as users need a significant amount of prior knowledge about the file. For instance, [DuckDB's CSV reader](#docs:lts:data:csv:overview) offers more than 25 configuration options. I've found that people tend to think I'm not working hard enough if I don't introduce at least three new options with each release. *Just kidding.* These options include specifying the delimiter, quote and escape characters, determining the number of columns in the CSV file, and identifying whether a header is present while also defining column types. This can slow down an interactive data exploration process, and make analyzing new datasets a cumbersome and less enjoyable task.

One of the raison d'être of DuckDB is to be pleasant and easy to use, so we don't want our users to have to fiddle with CSV files and input options manually. Manual input should be reserved only for files with rather unusual choices for their CSV dialect (where a dialect comprises the combination of the delimiter, quote, escape, and newline values used to create that file) or for specifying column types.

Automatically detecting CSV options can be a daunting process. Not only are there many options to investigate, but their combinations can easily lead to a search space explosion. This is especially the case for CSV files that are not well-structured. Some might argue that CSV files have a [specification](https://datatracker.ietf.org/doc/html/rfc4180), but the truth of the matter is that the "specification" changes as soon as a single system is capable of reading a flawed file. And, oh boy, I've encountered my fair share of semi-broken CSV files that people wanted DuckDB to read in the past few months.

DuckDB implements a [multi-hypothesis CSV sniffer](https://hannes.muehleisen.org/publications/ssdbm2017-muehleisen-csvs.pdf) that automatically detects dialects, headers, date/time formats, column types, and identifies dirty rows to be skipped. Our ultimate goal is to automatically read anything resembling a CSV file, to never give up and never let you down! All of this is achieved without incurring a substantial initial cost when reading CSV files. In the bleeding edge version, the sniffer runs when reading a CSV file by default. Note that the sniffer will always prioritize any options set by the user (e.g., if the user sets `,` as the delimiter, the sniffer won't try any other options and will assume that the user input is correct).

In this blog post, I will explain how the current implementation works, discuss its performance, and provide insights into what comes next!

#### DuckDB's Automatic Detection

The process of parsing CSV files is depicted in the figure below. It currently consists of five different phases, which will be detailed in the next sections.

The CSV file used in the overview example is as follows:

```csv
Name, Height, Vegetarian, Birthday
"Pedro", 1.73, False, 30-07-92
... imagine 2048 consistent rows ...
"Mark", 1.72, N/A, 20-09-92
```

![](../images/blog/csv-sniffer/sniffer.png)


In the first phase, we perform _Dialect Detection_, where we select the dialect candidates that generate the most per-row columns in the CSV file while maintaining consistency (i.e., not exhibiting significant variations in the number of columns throughout the file). In our example, we can observe that, after this phase, the sniffer successfully detects the necessary options for the delimiter, quotes, escapes, and new line delimiters.

The second phase, referred to as _Type Detection_, involves identifying the data types for each column in our CSV file. In our example, our sniffer recognizes four column types: `VARCHAR`, `DOUBLE`, `BOOL`, and `DATE`.

The third step, known as _Header Detection_, is employed to ascertain whether our file includes a header. If a header is present, we use it to set the column names; otherwise, we generate them automatically. In our example, there is a header, and each column gets its name defined in there.

Now that our columns have names, we move on to the fourth, optional phase: _Type Replacement_. DuckDB's CSV reader provides users with the option to specify column types by name. If these types are specified, we replace the detected types with the user's specifications.

Finally, we progress to our last phase, _Type Refinement_. In this phase, we analyze additional sections of the file to validate the accuracy of the types determined during the initial type detection phase. If necessary, we refine them. In our example, we can see that the `Vegetarian` column was initially categorized as `BOOL`. However, upon further examination, it was found to contain the string `N/A`, leading to an upgrade of the column type to `VARCHAR` to accommodate all possible values.

The automatic detection is only executed on a sequential sample of the CSV file. By default, the size of the sample is 20,480 tuples (i.e., 10 DuckDB execution chunks). This can be configured via the `sample_size` option, and can be set to -1 in case the user wants to sniff the complete file. Since the same data is repeatedly read with various options, and users can scan the entire file, all CSV buffers generated during sniffing are cached and efficiently managed to ensure high performance.

Of course, running the CSV Sniffer on very large files will have a drastic impact on the overall performance (see our [benchmark section below](#::varying-sampling-size)). In these cases, the sample size should be kept at a reasonable level.

In the next subsections, I will describe each phase in detail.

##### Dialect Detection

In the _Dialect Detection_, we identify the delimiter, quotes, escapes, and new line delimiters of a CSV file.

Our delimiter search space consists of the following delimiters: `,`, `|`, `;`, `\t`. If the file has a delimiter outside the search space, it must be provided by the user (e.g., `delim='?'`). Our quote search space is `"`, `'` and `\0`, where `\0` is a string terminator indicating no quote is present; again, users can provide custom characters outside the search space (e.g., `quote='?'`). The search space of escape values depends on the value of the quote option, but in summary, they are the same as quotes with the addition of `\`, and again, they can also be provided by the user (` escape='?'`). Finally, the last detected option is the new line delimiters; they can be `\r`, `\n`, `\r\n`, and a mix of everything (trust me, I've seen a real-world CSV file that used a mix).

By default, the dialect detection runs on 24 different combinations of dialect configurations. To determine the most promising configuration, we calculate the number of columns each CSV tuple would produce under each of these configurations. The one that results in the most columns with the most consistent rows will be chosen.

The calculation of consistent rows depends on additional user-defined options. For example, the `null_padding` option will pad missing columns with NULL values. Therefore, rows with missing columns will have the missing columns padded with `NULL`.

If `null_padding` is set to true, CSV files with inconsistent rows will still be considered, but a preference will be given to configurations that minimize the occurrence of padded rows. If `null_padding` is set to false, the dialect detector will skip inconsistent rows at the beginning of the CSV file. As an example, consider the following CSV file.

```csv
I like my csv files to have notes to make dialect detection harder
I also like commas like this one : ,
A,B,C
1,2,3
4,5,6
```

Here the sniffer would detect that with the delimiter set to `,` the first row has one column, the second has two, but the remaining rows have 3 columns. Hence, if `null_padding` is set to false, it would still select `,` as a delimiter candidate, by assuming the top rows are dirty notes. (Believe me, CSV notes are a thing!). Resulting in the following table:

```csv
A,B,C
1, 2, 3
4, 5, 6
```

If `null_padding` is set to true, all lines would be accepted, resulting in the following table:

```csv
'I like my csv files to have notes to make dialect detection harder', None, None
'I also like commas like this one : ', None, None
'A', 'B', 'C'
'1', '2', '3'
'4', '5', '6'
```

If the `ignore_errors` option is set, then the configuration that yields the most columns with the least inconsistent rows will be picked.

##### Type Detection

After deciding the dialect that will be used, we detect the types of each column. Our _Type Detection_ considers the following types: `SQLNULL`, `BOOLEAN`, `BIGINT`, `DOUBLE`, `TIME`, `DATE`, `TIMESTAMP`, `VARCHAR`. These types are ordered in specificity, which means we first check if a column is a `SQLNULL`; if not, if it's a `BOOLEAN`, and so on, until it can only be a `VARCHAR`. DuckDB has more types than the ones used by default. Users can also define which types the sniffer should consider via the `auto_type_candidates` option.

At this phase, the type detection algorithm goes over the first chunk of data (i.e., 2048 tuples). This process starts on the second valid row (i.e., not a note) of the file. The first row is stored separately and not used for type detection. It will be later detected if the first row is a header or not. The type detection runs a per-column, per-value casting trial process to determine the column types. It starts off with a unique, per-column array with all types to be checked. It tries to cast the value of the column to that type; if it fails, it removes the type from the array, attempts to cast with the new type, and continues that process until the whole chunk is finished.

At this phase, we also determine the format of `DATE` and `TIMESTAMP` columns. The following formats are considered for `DATE` columns:

* `%m-%d-%Y`
* `%m-%d-%y`
* `%d-%m-Y`
* `%d-%m-%y`
* `%Y-%m-%d`
* `%y-%m-%d`

The following are considered for `TIMESTAMP` columns:

* `%Y-%m-%dT%H:%M:%S.%f`
* `%Y-%m-%d %H:%M:%S.%f`
* `%m-%d-%Y %I:%M:%S %p`
* `%m-%d-%y %I:%M:%S %p`
* `%d-%m-%Y %H:%M:%S`
* `%d-%m-%y %H:%M:%S`
* `%Y-%m-%d %H:%M:%S`
* `%y-%m-%d %H:%M:%S`

For columns that use formats outside this search space, they must be defined with the `dateformat` and `timestampformat` options.

As an example, let's consider the following CSV file.

```csv
Name, Age
,
Jack Black, 54
Kyle Gass, 63.2
```

The first row [`Name`, `Age`] will be stored separately for the header detection phase. The second row [`NULL`, `NULL`] will allow us to cast the first and second columns to `SQLNULL`. Therefore, their type candidate arrays will be the same: [`SQLNULL`, `BOOLEAN`, `BIGINT`, `DOUBLE`, `TIME`, `DATE`, `TIMESTAMP`, `VARCHAR`].

In the third row [`Jack Black`, `54`], things become more interesting. With 'Jack Black,' the type candidate array for column 0 will exclude all values with higher specificity, as 'Jack Black' can only be converted to a `VARCHAR`. The second column cannot be converted to either `SQLNULL` or `BOOLEAN`, but it will succeed as a `BIGINT`. Hence, the type candidate for the second column will be [`BIGINT`, `DOUBLE`, `TIME`, `DATE`, `TIMESTAMP`, `VARCHAR`].

In the fourth row, we have [`Kyle Gass`, `63.2`]. For the first column, there's no problem since it's also a valid `VARCHAR`. However, for the second column, a cast to `BIGINT` will fail, but a cast to `DOUBLE` will succeed. Hence, the new array of candidate types for the second column will be [`DOUBLE`, `TIME`, `DATE`, `TIMESTAMP`, `VARCHAR`].

##### Header Detection

The _Header Detection_ phase simply obtains the first valid line of the CSV file and attempts to cast it to the candidate types in our columns. If there is a cast mismatch, we consider that row as the header; if not, we treat the first row as actual data and automatically generate a header.

In our previous example, the first row was [`Name`, `Age`], and the column candidate type arrays were [`VARCHAR`] and [`DOUBLE`, `TIME`, `DATE`, `TIMESTAMP`, `VARCHAR`]. `Name` is a string and can be converted to `VARCHAR`. `Age` is also a string, and attempting to cast it to `DOUBLE` will fail. Since the casting fails, the auto-detection algorithm considers the first row as a header, resulting in the first column being named `Name` and the second as `Age`.

If a header is not detected, column names will be automatically generated with the pattern `column${x}`, where x represents the column's position (0-based index) in the CSV file.

##### Type Replacement

Now that the auto-detection algorithm has discovered the header names, if the user specifies column types, the types detected by the sniffer will be replaced with them in the _Type Replacement_ phase. For example, we can replace the `Age` type with `FLOAT` by using:

```sql
SELECT *
FROM read_csv('greatest_band_in_the_world.csv', types = {'Age': 'FLOAT'});
```

This phase is optional and will only be triggered if there are manually defined types.

##### Type Refinement

The _Type Refinement_ phase performs the same tasks as type detection; the only difference is the granularity of the data on which the casting operator works, which is adjusted for performance reasons. During type detection, we conduct cast checks on a per-column, per-value basis.

In this phase, we transition to a more efficient vectorized casting algorithm. The validation process remains the same as in type detection, with types from type candidate arrays being eliminated if a cast fails.

#### How Fast Is the Sniffing?

To analyze the impact of running DuckDB's automatic detection, we execute the sniffer on the [NYC taxi dataset](https://www.kaggle.com/datasets/elemento/nyc-yellow-taxi-trip-data/). The file consists of 19 columns, 10,906,858 tuples and is 1.72 GB in size.

The cost of sniffing the dialect column names and types is approximately 4% of the total cost of loading the data. 


| Name     | Time (s) |
| -------- | -------- |
| Sniffing | 0.11     |
| Loading  | 2.43     |

##### Varying Sampling Size

Sometimes, CSV files can have dialect options or more refined types that appear only later in the CSV file. In those cases, the `sample_size` option becomes an important tool for users to ensure that the sniffer examines enough data to make the correct decision. However, increasing the `sample_size` also leads to an increase in the total runtime of the sniffer because it uses more data to detect all possible dialects and types.

Below, you can see how increasing the default sample size by multiplier (see X axis) affects the sniffer's runtime on the NYC dataset. As expected, the total time spent on sniffing increases linearly with the total sample size.

![](../images/blog/csv-sniffer/sample.png)


##### Varying Number of Columns

The other main characteristic of a CSV file that will affect the auto-detection is the number of columns the file has. Here, we test the sniffer against a varying number of `INTEGER` type columns in files with 10,906,858 tuples. The results are depicted in the figure below. We can see that from one column to two, we have a steeper increase in runtime. That's because, for single columns, we have a simplified dialect detection due to the lack of delimiters. For the other columns, as expected, we have a more linear increase in runtime, depending on the number of columns.

![](../images/blog/csv-sniffer/columns.png)


#### Conclusion & Future Work

If you have unusual CSV files and want to query, clean up, or normalize them, DuckDB is already one of the top solutions available. It is very easy to get started. To read a CSV file with the sniffer, you can simply:

```sql
SELECT *
FROM 'path/to/csv_file.csv';
```

DuckDB's CSV auto-detection algorithm is an important tool to facilitate the exploration of CSV files. With its default options, it has a low impact on the total cost of loading and reading CSV files. Its main goal is to always be capable of reading files, doing a best-effort job even on files that are ill-defined.

We have a list of points related to the sniffer that we would like to improve in the future.

1. *Advanced Header Detection.* We currently determine if a CSV has a header by identifying a type mismatch between the first valid row and the remainder of the CSV file. However, this can generate false negatives if, for example, all the columns of a CSV are of a type `VARCHAR`. We plan on enhancing our Header Detection to perform matches with commonly used names for headers.
2. *Adding Accuracy and Speed Benchmarks.* We currently implement many accuracy and regression tests; however, due to the CSV's inherent flexibility, manually creating test cases is quite daunting. The plan moving forward is to implement a whole accuracy and regression test suite using the [Pollock Benchmark](https://www.vldb.org/pvldb/vol16/p1870-vitagliano.pdf)
3. *Improved Sampling.* We currently execute the auto-detection algorithm on a sequential sample of data. However, it's very common that new settings are only introduced later in the file (e.g., quotes might be used only in the last 10% of the file). Hence, being able to execute the sniffer in distinct parts of the file can improve accuracy.
4. *Multi-Table CSV File.* Multiple tables can be present in the same CSV file, which is a common scenario when exporting spreadsheets to CSVs. Therefore, we would like to be able to identify and support these.
5. *Null-String Detection.* We currently do not have an algorithm in place to identify the representation of null strings.
6. *Decimal Precision Detection.* We also don't automatically detect decimal precision yet. This is something that we aim to tackle in the future.
7. *Parallelization.* Despite DuckDB's CSV Reader being fully parallelized, the sniffer is still limited to a single thread. Parallelizing it in a similar fashion to what is done with the CSV Reader (description coming in a future blog post) would significantly enhance sniffing performance and enable full-file sniffing.
8. *Sniffer as a stand-alone function.* Currently, users can utilize the `DESCRIBE` query to acquire information from the sniffer, but it only returns column names and types. We aim to expose the sniffing algorithm as a stand-alone function that provides the complete results from the sniffer. This will allow users to easily configure files using the exact same options without the need to rerun the sniffer.

## Updates to the H2O.ai db-benchmark!

**Publication date:** 2023-11-03

**Author:** Tom Ebergen

**TL;DR:** The H2O.ai db-benchmark has been updated with new results. In addition, the AWS EC2 instance used for benchmarking has been changed to a c6id.metal for improved repeatability and fairness across libraries. DuckDB is the fastest library for both join and group by queries at almost every data size.

[Skip directly to the results](#results)

#### The Benchmark Has Been Updated!

In April, DuckDB Labs published a [blog post reporting updated H2O.ai db-benchmark results](https://duckdb.org/2023/04/14/h2oai). Since then, the results haven't been updated. The original plan was to update the results with every DuckDB release. DuckDB 0.9.1 was recently released, and DuckDB Labs has updated the benchmark. While updating the benchmark, however, we noticed that our initial setup did not lend itself to being fair to all solutions. The machine used had network storage and could suffer from noisy neighbors. To avoid these issues, the whole benchmark was re-run on a c6id.metal machine.

#### New Benchmark Environment: c6id.metal Instance

Initially, updating the results to the benchmark showed strange results. Even using the same library versions from the prior update, some solutions regressed and others improved. We believe this variance came from the AWS EC2 instance type we chose: `m4.10xlarge`.
The `m4.10xlarge` has 40 virtual CPUs and EBS storage. EBS storage is highly available network block storage for EC2 instances. When running compute-heavy benchmarks, a machine like the `m4.10xlarge` can suffer from the following issues:

* **Network storage** is an issue for benchmarking solutions that interact with storage frequently. For the 500 MB and 5 GB workloads, network storage was not an issue on the `m4.10xlarge` since all solutions could execute the queries in memory. For the 50 GB workload, however, network storage was an issue for the solutions that could not execute queries in memory. While the `m4.10xlarge` has dedicated EBS bandwidth, any read/write from storage is still happening over the network, which is usually slower than physically mounted storage. Solutions that frequently read and write to storage for the 50 GB queries end up doing this over the network. This network time becomes a chunk of the execution time of the query. If the network has variable performance, the query performance is then also variable.

* **Noisy neighbors** is a common issue when benchmarking on virtual CPUs. The previous machine most likely shared its compute hardware with other (neighboring) AWS EC2 instances. If these neighbors are also running compute heavy workloads, the physical CPU caches are repeatedly invalidated/flushed by the neighboring instance and the benchmark instance. When the CPU cache is shared between two workloads on two instances, both workloads require extra reads from memory for data that would already be in CPU cache on a non-virtual machine.

In order to be fair to all solutions, we decided to change the instance type to a metal instance with local storage. Metal instance types negate any noisy neighbor problems because the hardware is physical and not shared with any other AWS users/instances. Network storage problems are also fixed because solutions can read and write data to the local instance storage, which is physically mounted on the hardware.

Another benefit of the c6id.metal box is that it stresses parallel performance. There are 128 cores on the c6id.metal. Performance differences between solutions that can effectively use every core and solutions that cannot are clearly visible.

See the [updated settings](#::updated-settings) section on how settings were change for each solution when run on the new machine.

#### Updating the Benchmark

Moving forward we will update the benchmark when PRs with new performance numbers are provided. The PR should include a description of the changes to a solution script or a version update and new entries in the `time.csv` and `logs.csv` files. These entries will be verified using a different c6id.metal instance, and if there is limited variance, the PR will be merged and the results will be updated!

##### Updated Settings

1. ClickHouse
    * Storage: Any data this gets spilled to disk also needs to be on the NVMe drive. This has been changed in the new `format_and_mount.sh` script and the `clickhouse/clickhouse-mount-config.xml` file.
2. Julia (juliadf & juliads)
    * Threads: The threads were hardcoded for juliadf/juliads to 20/40 threads. Now the max number of threads are used. No option was given to spill to disk, so this was not changed/researched.
3. DuckDB
    * Storage: The DuckDB database file was specified to run on the NVMe mount.
4. Spark
    * Storage: There is an option to spill to disk. I was unsure of how to modify the storage location so that it was on the NVMe drive. Open to a PR with storage location changes and improved results!

Many solutions do not spill to disk, so they did not require any modification to use the instance storage. Other solutions use `parallel::ncores()` or default to a maximum number of cores for parallelism. Solution scripts were run in their current form on [github.com/duckdblabs/db-benchmark](https://github.com/duckdblabs/db-benchmark). Please read the [Updating the Benchmark](https://github.com/duckdblabs/db-benchmark#updating-the-benchmark) section on how to re-run your solution.

##### Results

The first results you see are the 50 GB group by results. The benchmark runs every query twice per solution, and both runtimes are reported. The "first time" can be considered a cold run, and the "second time" can be considered a hot run. DuckDB and DuckDB-latest perform very well among all dataset sizes and variations.

The team at DuckDB Labs has been hard at work improving the performance of the out-of-core hash aggregates and joins. The most notable improvement is the performance of query 5 in the advanced group by queries. The cold run is almost an order of magnitude better than every other solution! DuckDB is also one of only two solutions to finish the 50 GB join query. Some solutions are experiencing timeouts on the 50 GB datasets. Solutions running the 50 GB group by queries are killed after running for 180 minutes, meaning all 10 group by queries need to finish within the 180 minutes. Solutions running the 50 GB join queries are killed after running for 360 minutes.

[Link to result page](https://DuckDBlabs.github.io/db-benchmark/)


## Extensions for DuckDB-Wasm

**Publication date:** 2023-12-18

**Author:** Carlo Piovesan

**TL;DR:** DuckDB-Wasm users can now load DuckDB extensions, allowing them to run extensions in the browser.

In this blog post, we will go over two exciting DuckDB features: the DuckDB-Wasm client and DuckDB extensions. I will discuss how these disjoint features have now been adapted to work together. These features are now available for DuckDB-Wasm users and you can try them out at [shell.duckdb.org](https://shell.duckdb.org).

#### DuckDB Extensions

DuckDB's philosophy is to have a lean core system to ensure robustness and portability.
However, a competing design goal is to be flexible and allow a wide range of functionality that is necessary to perform advanced analytics.
To accommodate this, DuckDB has an extension mechanism for installing and loading extensions during runtime.

##### Running DuckDB Extensions Locally

For DuckDB, here is a simple end-to-end example using the [command line interface](#docs:lts:clients:cli:overview):

```sql
INSTALL tpch;
LOAD tpch;
CALL dbgen(sf = 0.1);
PRAGMA tpch(7);
```

This script first installs the [TPC-H extension](#docs:lts:core_extensions:tpch) from the official extension repository, which implements the popular TPC-H benchmark. It then loads the TPC-H extension, uses it to populate the database with generated data using the `dbgen` function. Finally, it runs [TPC-H query 7](https://github.com/duckdb/duckdb/blob/v0.9.2/extension/tpch/dbgen/queries/q07.sql).

This example demonstrates a case where we install an extension to complement DuckDB with a new feature (the TPC-H data generator), which is not part of the base DuckDB executable. Instead, it is downloaded from the extension repository, then loaded and executed it locally within the framework of DuckDB.

Currently, DuckDB has [several extensions](#docs:lts:core_extensions:overview). These add support for filesystems, file formats, database and network protocols. Additionally, they implement new functions such as full text search.

#### DuckDB-Wasm

In an effort spearheaded by André Kohn, [DuckDB was ported to the WebAssembly platform](https://duckdb.org/2021/10/29/duckdb-wasm) in 2021. [WebAssembly](https://webassembly.org/), also known as Wasm, is a W3C standard language developed in recent years. Think of it as a machine-independent binary format that you can execute from within the sandbox of a web browser.

Thanks to DuckDB-Wasm, anyone has access to a DuckDB instance only a browser tab away, with all computation being executed locally within your browser and no data leaving your device. DuckDB-Wasm is a library that can be used in various deployments (e.g., [notebooks that run inside your browser without a server](https://observablehq.com/@cmudig/duckdb)). In this post, we will use the Web shell, where SQL statements are entered by the user line by line, with the behavior modeled after the DuckDB [CLI shell](#docs:lts:clients:cli:overview).

#### DuckDB Extensions, in DuckDB-Wasm!

DuckDB-Wasm [now supports DuckDB extensions](#docs:lts:clients:wasm:extensions). This support comes with four new key features.
First, the DuckDB-Wasm library can be compiled with dynamic extension support.
Second, DuckDB extensions can be compiled to a single WebAssembly module.
Third, users and developers working with DuckDB-Wasm can now select the set of extensions they load.
Finally, the DuckDB-Wasm shell's features are now much closer to the native [CLI functionality](#docs:lts:clients:cli:overview).

##### Using the TPC-H Extension in DuckDB-Wasm

To demonstrate this, we will again use the [TPC-H data generation example](#::running-duckdb-extensions-locally).
To run this script in your browser, [start an online DuckDB shell that runs these commands](https://shell.duckdb.org/#queries=v0,INSTALL-tpch~,LOAD-tpch~,CALL-dbgen(sf%3D0.1)~,PRAGMA-tpch(7)~). The script will generate the TPC-H data set at scale factor 0.1, which corresponds to 100 MB in uncompressed CSV format.

Once the script is finished, you can keep executing queries, or you could even download the `customer.parquet` file (1 MB) using the following commands:

```sql
COPY customer TO 'customer.parquet';
.files download customer.parquet
```

This will first copy the `customer.parquet` to the DuckDB-Wasm file system, then download it via your browser.

In short, your DuckDB instance, which _runs entirely within your browser,_ first installed and loaded the [TPC-H extension](#docs:lts:core_extensions:tpch). It then used the extension logic to generate data and convert it to a Parquet file. Finally, you could download the Parquet file as a regular file to your local file system.

<a href="https://shell.duckdb.org/#queries=v0,INSTALL-tpch~,LOAD-tpch~,CALL-dbgen(sf%3D0.1)~,PRAGMA-tpch(7)~">
![](../images/wasm-blog-post-shell-tpch.png)
</a>

##### Using the Spatial Extension in DuckDB-Wasm

To show the possibilities unlocked by DuckDB-Wasm extensions and test the capabilities of what's possible, what about using the [spatial extension](#docs:lts:core_extensions:spatial:overview) within DuckDB-Wasm?
This extension implements geospatial types and functions that allow it to work with geospatial data and relevant workloads.

To install and load the spatial extension in DuckDB-Wasm, run:

```sql
INSTALL spatial;
LOAD spatial;
```

Using the spatial extension, the following query uses the New York taxi dataset, and calculates the area of the taxi zones for each borough:

```sql
CREATE TABLE nyc AS
    SELECT
        borough,
        st_union_agg(geom) AS full_geom,
        st_area(full_geom) AS area,
        st_centroid(full_geom) AS centroid,
        count(*) AS count
    FROM
        st_read('https://raw.githubusercontent.com/duckdb/duckdb-spatial/main/test/data/nyc_taxi/taxi_zones/taxi_zones.shp')
GROUP BY borough;

SELECT borough, area, centroid::VARCHAR, count
FROM nyc;
```

Both your local DuckDB client and the [online DuckDB shell](https://shell.duckdb.org/#queries=v0,INSTALL-spatial~,LOAD-spatial~,CREATE-TABLE-nyc-AS-SELECT-borough%2C-st_union_agg(geom)-AS-full_geom%2C-st_area(full_geom)-AS-area%2C-st_centroid(full_geom)-AS-centroid%2C-count(*)-AS-count-FROM-st_read('https%3A%2F%2Fraw.githubusercontent.com%2Fduckdb%2Fduckdb-spatial%2Fmain%2Ftest%2Fdata%2Fnyc_taxi%2Ftaxi_zones%2Ftaxi_zones.shp')-GROUP-BY-borough~,SELECT-borough%2C-area%2C-centroid%3A%3AVARCHAR%2C-count-FROM-nyc~) will perform the same analysis.

#### Under the Hood

Let's dig into how this all works.
The following figure shows an overview of DuckDB-Wasm's architecture.
Both components in the figure run within the web browser.

![](../images/wasm-blog-post-overview.png)


When you load DuckDB-Wasm in your browser, there are two components that will be set up:
(1) A main-thread wrapper library that acts as a bridge between users or code using DuckDB-Wasm and drives the background component. 
(2) A DuckDB engine used to execute queries.
This component lives in a Web Worker and communicates with the main thread component via messages. This component has a JavaScript layer that handles messages and the original DuckDB C++ logic compiled down to a single WebAssembly file.

What happens when we add extensions to the mix?

![](../images/wasm-blog-post-extensions.png)


Extensions for DuckDB-Wasm are composed of a single WebAssembly module. This will encode the logic and data of the extensions, the list of functions that are going to be imported and exported, and a custom section encoding metadata that allows verification of the extension.

To make extension loading work, the DuckDB engine component blocks, fetches, and validates external WebAssembly code, then links it in, wires together import and export, and then the system will be connected and set to keep executing as if it was a single codebase.

The central code block that makes this possible is the following:

```cpp
EM_ASM(
    {
        const xhr = new XMLHttpRequest();
        xhr.open("GET", UTF8ToString($0), false);
        xhr.responseType = "arraybuffer";
        xhr.send(null);
        var uInt8Array = xhr.response;
        // Check signatures / version compatibility left as an exercise
        WebAssembly.validate(uInt8Array);
        // Here we add the uInt8Array to Emscripten's filesystem,
        // for it to be found by dlopen
        FS.writeFile(UTF8ToString($1), new Uint8Array(uInt8Array));
    },
    filename.c_str(), basename.c_str()
);

auto lib_hdl = dlopen(basename.c_str(), RTLD_NOW | RTLD_LOCAL);
if (!lib_hdl) {
    throw IOException(
      "Extension \"%s\" could not be loaded: %s",
      filename,
      GetDLError()
    );
}
```

Here, we rely on two powerful features of [Emscripten](https://emscripten.org/), the compiler toolchain we are using to compile DuckDB to WebAssembly.

First, `EM_ASM` allows us to inline JavaScript code directly in C++ code. It means that during runtime when we get to that block of code, the WebAssembly component will go back to JavaScript land, perform a blocking `XMLHttpRequest` on a URL such as [https://extensions.duckdb.org/.../tpch.duckdb_extension.wasm](https://extensions.duckdb.org/duckdb-wasm/v0.9.2/wasm_eh/tpch.duckdb_extension.wasm),
then validate that the package that has been just fetched is actually a valid WebAssembly module.

Second, we leverage Emscripten's [`dlopen` implementation](https://emscripten.org/docs/compiling/Dynamic-Linking.html), which enables compatible WebAssembly modules to be linked together and act as a single composable codebase.

These enable implementing dynamic loading of extensions, when triggered via the SQL `LOAD` statement.

#### Developer Guide

We see two main groups of developers using extensions with DuckDB-Wasm.

* Developers working with DuckDB-Wasm: If you are building a website or a library that wraps DuckDB-Wasm, the new extension support means that there is now a wider range of functionality that can be exposed to your users.
* Developers working on DuckDB extensions: If you have written a DuckDB extension, or are thinking of doing so, consider porting it to DuckDB-Wasm. The [DuckDB extension template repository](https://github.com/duckdb/extension-template) contains the configuration required for compiling to DuckDB-Wasm.

#### Limitations

DuckDB-Wasm extensions have a few inherent limitations. For example, it is not possible to communicate with native executables living on your machine, which is required by some extensions, such as the [`postgres` scanner extension](#docs:lts:core_extensions:postgres).
Moreover, compilation to Wasm may not be currently supported for some libraries you are relying on, or capabilities might not be one-to-one with local executables due to additional requirements imposed on the browser, in particular around [non-secure HTTP requests](#docs:lts:clients:wasm:extensions::httpfs).

#### Conclusions

In this blog post, we explained how DuckDB-Wasm supports extensions, and demonstrated with multiple extensions: [TPC-H](#docs:lts:core_extensions:tpch), [Parquet](#docs:lts:data:parquet:overview), and [spatial](#docs:lts:core_extensions:spatial:overview).

Thanks to the portability of DuckDB, the scripts shown in this blog post also work on your smartphone:

![](../images/wasm-blog-post-ios-shell.png)


For updates on the latest developments, follow this blog and join the Wasm channel in [our Discord](https://discord.duckdb.org). If you have an example of what's possible with extensions in DuckDB, let us know!

## Multi-Database Support in DuckDB

**Publication date:** 2024-01-26

**Author:** Mark Raasveldt

**TL;DR:** DuckDB can attach MySQL, Postgres, and SQLite databases in addition to databases stored in its own format. This allows data to be read into DuckDB and moved between these systems in a convenient manner.

![](../images/blog/duckdb-multidb-support.png)


In modern data analysis, data must often be combined from a wide variety of different sources. Data might sit in CSV files on your machine, in Parquet files in a data lake, or in an operational database. DuckDB has strong support for moving data between many different data sources. However, this support has previously been limited to reading data and writing data to files.

DuckDB supports advanced operations on its own native storage format – such as deleting rows, updating values, or altering the schema of a table. It supports all of these operations using ACID semantics. This guarantees that your database is always left in a sane state – operations are atomic and do not partially complete.

DuckDB now has a pluggable storage and transactional layer. This flexible layer allows new storage back-ends to be created by DuckDB extensions. These storage back-ends can support all database operations in the same way that DuckDB supports them, including inserting data and even modifying schemas.

The [MySQL](#docs:lts:core_extensions:mysql), [Postgres](#docs:lts:core_extensions:postgres), and [SQLite](#docs:lts:core_extensions:sqlite) extensions implement this new pluggable storage and transactional layer, allowing DuckDB to connect to those systems and operate on them in the same way that it operates on its own native storage engine.

These extensions enable a number of useful features. For example, using these extensions you can:

* Export data from SQLite to JSON
* Read data from Parquet into Postgres
* Move data from MySQL to Postgres

... and much more.


#### Attaching Databases

The [`ATTACH` statement](#docs:lts:sql:statements:attach) can be used to attach a new database to the system. By default, a native DuckDB file will be attached. The `TYPE` parameter can be used to specify a different storage type. Alternatively, the `{type}:` prefix can be used.

For example, using the SQLite extension, we can open [a SQLite database file](https://github.com/duckdb/duckdb-sqlite/raw/main/data/db/sakila.db) and query it as we would query a DuckDB database.

```sql
ATTACH 'sakila.db' AS sakila (TYPE sqlite);
SELECT title, release_year, length FROM sakila.film LIMIT 5;
```

```text
┌──────────────────┬──────────────┬────────┐
│      title       │ release_year │ length │
│     varchar      │   varchar    │ int64  │
├──────────────────┼──────────────┼────────┤
│ ACADEMY DINOSAUR │ 2006         │     86 │
│ ACE GOLDFINGER   │ 2006         │     48 │
│ ADAPTATION HOLES │ 2006         │     50 │
│ AFFAIR PREJUDICE │ 2006         │    117 │
│ AFRICAN EGG      │ 2006         │    130 │
└──────────────────┴──────────────┴────────┘
```

The `USE` command switches the main database.

```sql
USE sakila;
SELECT first_name, last_name FROM actor LIMIT 5;
```

```text
┌────────────┬──────────────┐
│ first_name │  last_name   │
│  varchar   │   varchar    │
├────────────┼──────────────┤
│ PENELOPE   │ GUINESS      │
│ NICK       │ WAHLBERG     │
│ ED         │ CHASE        │
│ JENNIFER   │ DAVIS        │
│ JOHNNY     │ LOLLOBRIGIDA │
└────────────┴──────────────┘
```

The SQLite database can be manipulated as if it were a native DuckDB database. For example, we can create a new table, populate it with values from a Parquet file, delete a few rows from the table and alter the schema of the table.

```sql
CREATE TABLE lineitem AS FROM 'lineitem.parquet' LIMIT 1000;
DELETE FROM lineitem WHERE l_returnflag = 'N';
ALTER TABLE lineitem DROP COLUMN l_comment;
```

The `duckdb_databases` table contains a list of all attached databases and their types.

```sql
SELECT database_name, path, type FROM duckdb_databases;
```

```text
┌───────────────┬───────────┬─────────┐
│ database_name │   path    │  type   │
│    varchar    │  varchar  │ varchar │
├───────────────┼───────────┼─────────┤
│ sakila        │ sakila.db │ sqlite  │
│ memory        │ NULL      │ duckdb  │
└───────────────┴───────────┴─────────┘
```

#### Mix and Match

While attaching to different database types is useful – it becomes even more powerful when used in combination. For example, we can attach both a SQLite, MySQL and a Postgres database.

```sql
ATTACH 'sqlite:sakila.db' AS sqlite;
ATTACH 'postgres:dbname=postgresscanner' AS postgres;
ATTACH 'mysql:user=root database=mysqlscanner' AS mysql;
```

Now we can move data between these attached databases and query them together. Let's copy the `film` table to MySQL, and the `actor` table to Postgres:

```sql
CREATE TABLE mysql.film AS FROM sqlite.film;
CREATE TABLE postgres.actor AS FROM sqlite.actor;
```

We can now join tables from these three attached databases together. Let's find all of the actors that starred in `Ace Goldfinger`.

```sql
SELECT first_name, last_name
FROM mysql.film
JOIN sqlite.film_actor ON (film.film_id = film_actor.film_id)
JOIN postgres.actor ON (actor.actor_id = film_actor.actor_id)
WHERE title = 'ACE GOLDFINGER';
```

```text
┌────────────┬───────────┐
│ first_name │ last_name │
│  varchar   │  varchar  │
├────────────┼───────────┤
│ BOB        │ FAWCETT   │
│ MINNIE     │ ZELLWEGER │
│ SEAN       │ GUINESS   │
│ CHRIS      │ DEPP      │
└────────────┴───────────┘
```

Running `EXPLAIN` on the query shows how the data from the different engines is combined into the final query result.

```text
┌───────────────────────────┐                                                          
│         PROJECTION        │                                                          
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                                                          
│         first_name        │                                                          
│         last_name         │                                                          
└─────────────┬─────────────┘                                                          
┌─────────────┴─────────────┐                                                          
│         HASH_JOIN         │                                                          
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                                                          
│           INNER           │                                                          
│     film_id = film_id     ├───────────────────────────────────────────┐              
└─────────────┬─────────────┘                                           │              
┌─────────────┴─────────────┐                             ┌─────────────┴─────────────┐
│         HASH_JOIN         │                             │           FILTER          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                             │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           INNER           │                             │ (title = 'ACE GOLDFINGER')│
│    actor_id = actor_id    ├──────────────┐              │                           │
└─────────────┬─────────────┘              │              └─────────────┬─────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│        SQLITE_SCAN        ││       POSTGRES_SCAN       ││        MYSQL_SCAN         │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│    sakila.db:film_actor   ││           actor           ││            film           │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│          film_id          ││          actor_id         ││          film_id          │
│          actor_id         ││         first_name        ││           title           │
│                           ││         last_name         ││                           │
└───────────────────────────┘└───────────────────────────┘└───────────────────────────┘
```

#### Transactions

All statements executed within DuckDB are executed within a transaction. If an explicit `BEGIN TRANSACTION` is not called, every statement will execute in its own transaction. This also applies to queries that are executed over other storage engines. These storage engines also support explicit `BEGIN`, `COMMIT` and `ROLLBACK` statements.

For example, we can begin a transaction within our attached `SQLite` database, make a change, and then roll it back. The original data will be restored.

```sql
BEGIN;
TRUNCATE film;
SELECT title, release_year, length FROM film;
```

```text
┌─────────┬──────────────┬────────┐
│  title  │ release_year │ length │
│ varchar │   varchar    │ int64  │
├─────────────────────────────────┤
│             0 rows              │
└─────────────────────────────────┘
```

```sql
ROLLBACK;
SELECT title, release_year, length FROM film LIMIT 5;
```

```text
┌──────────────────┬──────────────┬────────┐
│      title       │ release_year │ length │
│     varchar      │   varchar    │ int64  │
├──────────────────┼──────────────┼────────┤
│ ACADEMY DINOSAUR │ 2006         │     86 │
│ ACE GOLDFINGER   │ 2006         │     48 │
│ ADAPTATION HOLES │ 2006         │     50 │
│ AFFAIR PREJUDICE │ 2006         │    117 │
│ AFRICAN EGG      │ 2006         │    130 │
└──────────────────┴──────────────┴────────┘
```

##### Multi-Database Transactions

Every storage engine has their own transactions that are stand-alone and managed by the storage engine itself. Opening a transaction in Postgres, for example, calls `BEGIN TRANSACTION` in the Postgres client. The transaction is managed by Postgres itself. Similarly, when the transaction is committed or rolled back, the storage engine handles this by itself.

Transactions are used both for **reading** and for **writing** data. For reading data, they are used to provide a consistent snapshot of the database. For writing, they are used to ensure all data in a transaction is packed together and written at the same time.

When executing a transaction that involves multiple attached databases we need to open multiple transactions: one per attached database that is used in the transaction. While this is not a problem when **reading** from the database, it becomes complicated when **writing**. In particular, when we want to `COMMIT` a transaction it is challenging to ensure that either (a) every database has successfully committed, or (b) every database has rolled back.

For that reason, it is currently not supported to **write** to multiple attached databases in a single transaction. Instead, an error is thrown when this is attempted:

```sql
BEGIN;
CREATE TABLE postgres.new_table (i INTEGER);
CREATE TABLE mysql.new_table (i INTEGER);
```

```console
Error: Attempting to write to database "mysql" in a transaction that has
already modified database "postgres" – a single transaction can only write
to a single attached database.
```

#### Copying Data between Databases

`CREATE TABLE AS`, `INSERT INTO` and `COPY` can be used to copy data between different attached databases. The dedicated [`COPY FROM DATABASE ... TO`](#docs:lts:sql:statements:copy::copy-from-database--to) can be used to copy all data from one database to another. This includes all tables and views that are stored in the source database.

```sql
-- attach a Postgres database
ATTACH 'postgres:dbname=postgresscanner' AS postgres;
-- attach a DuckDB file
ATTACH 'database.db' AS ddb;
-- export all tables and views from the Postgres database to the DuckDB file
COPY FROM DATABASE postgres TO ddb;
```

#### Directly Opening a Database

The explicit `ATTACH` statement is not required to connect to a different database type. When instantiating a DuckDB instance a connection can be made directly to a different database type using the `{type}:` prefix. For example, to connect to a SQLite file, use `sqlite:file.db`. To connect to a Postgres instance, use `postgres:dbname=postgresscanner`. This can be done in any client, including the CLI. For instance:

**CLI:**

```bash
duckdb sqlite:file.db
```

**Python:**

```python
import duckdb
con = duckdb.connect('sqlite:file.db')
```

This is equivalent to attaching the storage engine and running `USE` afterwards.

#### Conclusion

DuckDB's pluggable storage engine architecture enables many use cases. By attaching multiple databases, data can be extracted in a transactionally safe manner for bulk ETL or ELT workloads, as well as for on-the-fly data virtualization workloads. These techniques also work well in combination, for example, by moving data in bulk on a regular cadence, while filling in the last few data points on the fly.

Pluggable storage engines also unlock new ways to handle concurrent writers in a data platform. Each separate process could write its output to a transactional database, and the results could be combined within DuckDB – all in a transactionally safe manner. Then, data analysis tasks can occur on the centralized DuckDB database for improved performance.

We look forward to hearing the many creative ways you are able to use this feature!

#### Future Work

We intend to continue enhancing the performance and capabilities of the existing extensions. In addition, all of these features can be leveraged by the community to connect to other databases.

## Announcing DuckDB 0.10.0

**Publication date:** 2024-02-13

**Authors:** Mark Raasveldt, Hannes Mühleisen

**TL;DR:** The DuckDB team is happy to announce the latest DuckDB release (0.10.0). This release is named Fusca after the [Velvet scoter](https://en.wikipedia.org/wiki/Velvet_scoter) native to Europe.

![](../images/blog/velvet-scoter-duck.jpg)


To install the new version, please visit the [installation guide](https://duckdb.org/install/index.html). The full release notes can be found [on GitHub](https://github.com/duckdb/duckdb/releases/tag/v0.10.0).

#### What's New in 0.10.0

There have been too many changes to discuss them each in detail, but we would like to highlight several particularly exciting features!

Below is a summary of those new features with examples, starting with a change in our SQL dialect that is designed to produce more intuitive results by default.

#### Breaking SQL Changes

[**Implicit Cast to VARCHAR**](https://github.com/duckdb/duckdb/pull/10115). Previously, DuckDB would automatically allow any type to be implicitly cast to `VARCHAR` during function binding. As a result it was possible to e.g., compute the substring of an integer without using an implicit cast. Starting with this release, you will need to use an explicit cast here instead.

```sql
SELECT substring(42, 1, 1) AS substr;
```

```console
No function matches the given name and argument types 'substring(...)'.
You might need to add explicit type casts.
```

To use an explicit cast, run:

```sql
SELECT substring(42::VARCHAR, 1, 1) AS substr;
```

```text
┌─────────┐
│ substr  │
│ varchar │
├─────────┤
│ 4       │
└─────────┘
```

Alternatively, the `old_implicit_casting` setting can be used to revert this behavior, e.g.:

```sql
SET old_implicit_casting = true;
SELECT substring(42, 1, 1) AS substr;
```

```text
┌─────────┐
│ substr  │
│ varchar │
├─────────┤
│ 4       │
└─────────┘
```

[**Literal Typing**](https://github.com/duckdb/duckdb/pull/10194). Previously, integer and string literals behaved identically to the `INTEGER` and `VARCHAR` types. Starting with this release, `INTEGER_LITERAL` and `STRING_LITERAL` are separate types that have their own binding rules.

* `INTEGER_LITERAL` types can be implicitly converted to any integer type in which the value fits
* `STRING_LITERAL` types can be implicitly converted to **any** other type

This aligns DuckDB with Postgres, and makes operations on literals more intuitive. For example, we can compare string literals with dates – but we cannot compare `VARCHAR` values with dates.

```sql
SELECT d > '1992-01-01' AS result
FROM (VALUES (DATE '1992-01-01')) t(d);
```

```text
┌─────────┐
│ result  │
│ boolean │
├─────────┤
│ false   │
└─────────┘
```

```sql
SELECT d > '1992-01-01'::VARCHAR AS result
FROM (VALUES (DATE '1992-01-01')) t(d);
```

```console
Binder Error:
Cannot compare values of type DATE and type VARCHAR – an explicit cast is required
```

#### Backward Compatibility

Backward compatibility refers to the ability of a newer DuckDB version to read storage files created by an older DuckDB version. This release is the first release of DuckDB that supports backward compatibility in the storage format. DuckDB v0.10 can read and operate on files created by the previous DuckDB version – DuckDB v0.9. [This is made possible by the implementation of a new serialization framework](https://github.com/duckdb/duckdb/pull/8156).

Write with v0.9:

```bash
duckdb_092 v092.db
```

```sql
CREATE TABLE lineitem AS
FROM lineitem.parquet;
```

Read with v0.10:

```bash
duckdb_0100 v092.db
```

```sql
SELECT l_orderkey, l_partkey, l_comment
FROM lineitem
LIMIT 1;
```

```text
┌────────────┬───────────┬─────────────────────────┐
│ l_orderkey │ l_partkey │        l_comment        │
│   int32    │   int32   │         varchar         │
├────────────┼───────────┼─────────────────────────┤
│          1 │    155190 │ to beans x-ray carefull │
└────────────┴───────────┴─────────────────────────┘
```

For future DuckDB versions, our goal is to ensure that any DuckDB version released **after** can read files created by previous versions, starting from this release. We want to ensure that the file format is fully backward compatible. This allows you to keep data stored in DuckDB files around and guarantees that you will be able to read the files without having to worry about which version the file was written with or having to convert files between versions.

#### Forward Compatibility

Forward compatibility refers to the ability of an older DuckDB version to read storage files produced by a newer DuckDB version. DuckDB v0.9 is **partially** forward compatible with DuckDB v0.10. Certain files created by DuckDB v0.10 can be read by DuckDB v0.9.

Write with v0.10:

```bash
duckdb_0100 v010.db
```

```sql
CREATE TABLE lineitem AS
FROM lineitem.parquet;
```

Read with v0.9:

```bash
duckdb_092 v010.db
```

```sql
SELECT l_orderkey, l_partkey, l_comment
FROM lineitem
LIMIT 1;
```

```text
┌────────────┬───────────┬─────────────────────────┐
│ l_orderkey │ l_partkey │        l_comment        │
│   int32    │   int32   │         varchar         │
├────────────┼───────────┼─────────────────────────┤
│          1 │    155190 │ to beans x-ray carefull │
└────────────┴───────────┴─────────────────────────┘
```

Forward compatibility is provided on a **best effort** basis. While stability of the storage format is important – there are still many improvements and innovations that we want to make to the storage format in the future. As such, forward compatibility may be (partially) broken on occasion.

For this release, DuckDB v0.9 is able to read files created by DuckDB v0.10 provided that:

* The database file does not contain views
* The database file does not contain new types (` ARRAY`, `UHUGEINT`)
* The database file does not contain indexes (` PRIMARY KEY`, `FOREIGN KEY`, `UNIQUE`, explicit indexes)
* The database file does not contain new compression methods (` ALP`). As ALP is automatically used to compress `FLOAT` and `DOUBLE` columns – that means forward compatibility in practice often does not work for `FLOAT` and `DOUBLE` columns unless `ALP` is explicitly disabled through configuration.

We expect that as the format stabilizes and matures this will happen less frequently – and we hope to offer better guarantees in allowing DuckDB to read files written by future DuckDB versions.

#### CSV Reader Rework

**[CSV Reader Rework](https://github.com/duckdb/duckdb/pull/10209).** The CSV reader has received a major overhaul in this release. The new CSV reader uses efficient state machine transitions to speed through CSV files. This has greatly sped up performance of the CSV reader, particularly in multi-threaded scenarios. In addition, in the case of malformed CSV files, reported error messages should be more clear.

Below is a benchmark comparing the loading time of 11 million rows of the NYC Taxi dataset from a CSV file on an M1 Max with 10 cores:


| Version | Load time |
| ------- | --------: |
| v0.9.2  |     2.6 s |
| v0.10.0 |     1.2 s |

Furthermore, many optimizations have been done that make running queries over CSV files directly significantly faster as well. Below is a benchmark comparing the execution time of a `SELECT count(*)` query directly over the NYC Taxi CSV file.


| Version | Query time |
| ------- | ---------: |
| v0.9.2  |      1.8 s |
| v0.10.0 |      0.3 s |

#### Fixed-Length Arrays

**[Fixed-Length Arrays](https://github.com/duckdb/duckdb/pull/8983).** This release introduces the fixed-length array type. Fixed-length arrays are similar to lists, however, every value must have the same fixed number of elements in them.

```sql
CREATE TABLE vectors (v DOUBLE[3]);
INSERT INTO vectors VALUES ([1, 2, 3]);
```

Fixed-length arrays can be operated on faster than variable-length lists as the size of each list element is known ahead of time. This release also introduces specialized functions that operate over these arrays, such as `array_cross_product`, `array_cosine_similarity`, and `array_inner_product`.

```sql
SELECT array_cross_product(v, [1, 1, 1]) AS result
FROM vectors;
```

```text
┌───────────────────┐
│      result       │
│     double[3]     │
├───────────────────┤
│ [-1.0, 2.0, -1.0] │
└───────────────────┘
```

See the [Array Type page](#docs:lts:sql:data_types:array) in the documentation for more information.

#### Multi-Database Support

DuckDB can now attach MySQL, Postgres, and SQLite databases in addition to databases stored in its own format. This allows data to be read into DuckDB and moved between these systems in a convenient manner, as attached databases are fully functional, appear just as regular tables, and can be updated in a safe, transactional manner. More information about multi-database support can be found in our [recent blog post](https://duckdb.org/2024/01/26/multi-database-support-in-duckdb).

```sql
ATTACH 'sqlite:sakila.db' AS sqlite;
ATTACH 'postgres:dbname=postgresscanner' AS postgres;
ATTACH 'mysql:user=root database=mysqlscanner' AS mysql;
```

#### Secret Manager

DuckDB integrates with several cloud storage systems such as S3 that require access credentials to access data. In the current version of DuckDB, authentication information is configured through DuckDB settings, e.g., `SET s3_access_key_id = '...';`. While this worked, it had several shortcomings. For example, it was not possible to set different credentials for different S3 buckets without modifying the settings between queries. Because settings are not considered secret, it was also possible to query them using `duckdb_settings()`.

With this release, DuckDB adds a new "[Secrets Manager](https://github.com/duckdb/duckdb/pull/10042)" to manage secrets in a better way. We now have a unified user interface for secrets across all backends that use them. Secrets can be scoped, so different storage prefixes can have different secrets, allowing, for example, joining across organizations in a single query. Secrets can also be persisted, so that they do not need to be specified every time DuckDB is launched.

Secrets are typed, their type identifies which service they are for. For example, this release can manage secrets for S3, Google Cloud Storage, Cloudflare R2 and Azure Blob Storage. For each type, there are one or more "secret providers" that specify how the secret is created. Secrets can also have an optional scope, which is a file path prefix that the secret applies to. When fetching a secret for a path, the secret scopes are compared to the path, returning the matching secret for the path. In the case of multiple matching secrets, the longest prefix is chosen.

Finally, secrets can be temporary or persistent. Temporary secrets are used by default – and are stored in-memory for the life span of the DuckDB instance similar to how settings worked previously. Persistent secrets are stored in unencrypted binary format in the `~/.duckdb/stored_secrets` directory. On startup of DuckDB, persistent secrets are read from this directory and automatically loaded.

For example, to create a temporary unscoped secret to access S3, we can now use the following syntax:

```sql
CREATE SECRET (
    TYPE s3,
    KEY_ID '⟨AKIAIOSFODNN7EXAMPLE⟩',
    SECRET '⟨wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY⟩',
    REGION '⟨us-east-1⟩'
);
```

If two secrets exist for a service type, the scope can be used to decide which one should be used. For example:

```sql
CREATE SECRET secret1 (
    TYPE s3,
    KEY_ID 'my_key1',
    SECRET 'my_secret1',
    SCOPE 's3://⟨my-bucket⟩'
);

CREATE SECRET secret2 (
    TYPE s3,
    KEY_ID 'my_key2',
    SECRET 'my_secret2',
    SCOPE 's3://⟨my-other-bucket⟩'
);
```

Now, if the user queries something from `s3://⟨my-other-bucket⟩/something`{:.language-sql .highlight}, `secret2` will be chosen automatically for that request.

Secrets can be listed using the built-in table-producing function, e.g., by using `FROM duckdb_secrets();`{:.language-sql .highlight}. Sensitive information will be redacted.

In order to persist secrets between DuckDB database instances, we can now use the `CREATE PERSISTENT SECRET` command, e.g.:

```sql
CREATE PERSISTENT SECRET my_persistent_secret (
    TYPE s3,
    KEY_ID 'my_key',
    SECRET 'my_secret'
);
```

As mentioned, this will write the secret (unencrypted, so beware) to the `~/.duckdb/stored_secrets` directory.

See the [Create Secret page](#docs:lts:sql:statements:create_secret) in the documentation for more information.

#### Temporary Memory Manager

DuckDB has support for larger-than-memory operations, which means that memory-hungry operators such as aggregations and joins can offload part of their intermediate results to temporary files on disk should there not be enough memory available.

Before, those operators started offloading to disk if their memory usage reached around 60% of the available memory (as defined by the memory limit). This works well if there is exactly one of these operations happening at the same time. If multiple memory-intensive operations are happening simultaneously, their combined memory usage may exceed the memory limit, causing DuckDB to throw an error.

This release introduces the so-called "[Temporary Memory Manager](https://github.com/duckdb/duckdb/pull/10147)", which manages the temporary memory of concurrent operations. It works as follows: Memory-intensive operations register themselves with the Temporary Manager. Each registration is guaranteed some minimum amount of memory by the manager depending on the number of threads and the current memory limit. Then, the memory-intensive operations communicate how much memory they would currently like to use. The manager can approve this or respond with a reduced allocation. In a case of a reduced allocation, the operator will need to dynamically reduce its memory requirements, for example by switching algorithms.

For example, a hash join might adapt its operation and perform a partitioned hash join instead of a full in-memory one if not enough memory is available.

Here is an example:

```sql
PRAGMA memory_limit = '5GB';
SET temp_directory = '/tmp/duckdb_temporary_memory_manager';

CREATE TABLE tbl AS
SELECT range AS i,
       range AS j
FROM range(100_000_000);

SELECT max(i),
       max(t1.j),
       max(t2.j),
       max(t3.j),
FROM tbl AS t1
JOIN tbl AS t2 USING (i)
JOIN tbl AS t3 USING (i);
```

Note that a temporary directory has to be set here, because the operators actually need to offload data to disk to complete this query given this memory limit.

With the new version 0.10.0, this query completes in ca. 5 s on a MacBook, while it would error out on the previous version with `Error: Out of Memory Error: failed to pin block of size ...`.

#### Adaptive Lossless Floating-Point Compression (ALP)

Floating point numbers are notoriously difficult to compress efficiently, both in terms of compression ratio as well as speed of compression and decompression. In the past, DuckDB had support for the then state-of-the-art "[Chimp](https://github.com/duckdb/duckdb/pull/4878)" and the "[Patas](https://github.com/duckdb/duckdb/pull/5044)" compression methods. Turns out, those were not the last word in floating point compression. Researchers [Azim Afroozeh](https://www.cwi.nl/en/people/azim-afroozeh/), [Leonard Kuffo](https://www.cwi.nl/en/people/leonardo-xavier-kuffo-rivero/) and (the one and only) [Peter Boncz](https://homepages.cwi.nl/~boncz/) have recently published a paper titled "[ALP: Adaptive Lossless floating-Point Compression](https://dl.acm.org/doi/pdf/10.1145/3626717)" at SIGMOD, a top-tier academic conference for data management research. In an uncommon yet highly commendable move, they have also sent a [pull request](https://github.com/duckdb/duckdb/pull/9635) to DuckDB. The new compression scheme replaces Chimp and Patas. Inside DuckDB, ALP is **x2-4 times faster** than Patas (at decompression) achieving **compression ratios twice as high** (sometimes even much more).


| Compression  |    Load |   Query |   Size |
| :----------- | ------: | ------: | -----: |
| ALP          | 0.434 s | 0.020 s | 184 MB |
| Patas        | 0.603 s | 0.080 s | 275 MB |
| Uncompressed | 0.316 s | 0.012 s | 489 MB |

As a user, you don't have to do anything to make use of the new ALP compression method, DuckDB will automatically decide during checkpointing whether using ALP is beneficial for the specific dataset.

#### CLI Improvements

The command-line client has seen a lot of work this release. In particular, multi-line editing has been made the default mode, and has seen many improvements. The query history is now also multi-line. [Syntax highlighting has improved](#docs:lts:clients:cli:syntax_highlighting) – missing brackets and unclosed quotes are highlighted as errors, and matching brackets are highlighted when the cursor moves over them. Compatibility with read-line has also been [greatly extended](#docs:lts:clients:cli:editing).

![](../images/syntax_highlighting_screenshot.png)


See the [extended CLI docs for more information](#docs:lts:clients:cli:overview).

#### Final Thoughts

These were a few highlights – but there are many more features and improvements in this release. Below are a few more highlights. The full release notes can be [found on GitHub](https://github.com/duckdb/duckdb/releases/tag/v0.10.0).

##### New Features

* [`COMMENT ON`](https://github.com/duckdb/duckdb/pull/10372)
* [`COPY FROM DATABASE`](https://github.com/duckdb/duckdb/pull/9765)
* [`UHUGEINT` type](https://github.com/duckdb/duckdb/pull/8635)
* [Window `EXCLUDE`](https://github.com/duckdb/duckdb/pull/9220) and [Window `DISTINCT`](https://github.com/duckdb/duckdb/pull/9754) support
* [Parquet encryption support](https://github.com/duckdb/duckdb/pull/9392)
* [Indexes for Lambda parameters](https://github.com/duckdb/duckdb/pull/8851)
* [`EXCEPT ALL`/`INTERSECT ALL`](https://github.com/duckdb/duckdb/pull/9636)
* [`DESCRIBE`/`SHOW`/`SUMMARIZE` as subquery](https://github.com/duckdb/duckdb/pull/10210)
* [Support recursive CTEs in correlated subqueries](https://github.com/duckdb/duckdb/pull/10357)

##### New Functions

* [`parquet_kv_metadata`](https://github.com/duckdb/duckdb/pull/9126) and [`parquet_file_metadata`](https://github.com/duckdb/duckdb/pull/9793) functions
* [`read_text`/`read_blob` table functions](https://github.com/duckdb/duckdb/pull/10376)
* [`list_reduce`](https://github.com/duckdb/duckdb/pull/9909), [`list_where`, `list_zip`, `list_select`, `list_grade_up`](https://github.com/duckdb/duckdb/pull/8907)

##### Storage Improvements

* [Vacuuming partial deletes](https://github.com/duckdb/duckdb/pull/9931)
* [Parallel checkpointing](https://github.com/duckdb/duckdb/pull/9999)
* [Checksum WAL](https://github.com/duckdb/duckdb/pull/10126)

##### Optimizations

* [Parallel streaming query result](https://github.com/duckdb/duckdb/pull/10245)
* [Struct filter pushdown](https://github.com/duckdb/duckdb/pull/10314)
* [`first(x ORDER BY y)` optimizations](https://github.com/duckdb/duckdb/pull/10347)

##### Acknowledgments

We would like to thank all of the contributors for their hard work on improving DuckDB.

## SQL Gymnastics: Bending SQL into Flexible New Shapes

**Publication date:** 2024-03-01

**Author:** Alex Monahan

**TL;DR:** Combining multiple features of DuckDB’s [friendly SQL](/docs/guides/sql_features/friendly_sql) allows for highly flexible queries that can be reused across tables.

![](../images/blog/duck_gymnast.jpg)


DuckDB's [especially](https://duckdb.org/2022/05/04/friendlier-sql) [friendly](https://duckdb.org/2023/08/23/even-friendlier-sql) [SQL dialect](#docs:lts:sql:dialect:friendly_sql) simplifies common query operations.
However, these features also unlock new and flexible ways to write advanced SQL! 
In this post we will combine multiple friendly features to both move closer to real-world use cases and stretch your imagination.
These queries are useful in their own right, but their component pieces are even more valuable to have in your toolbox.

What is the craziest thing you have built with SQL? 
We want to hear about it! 
Tag [DuckDB on X](https://twitter.com/duckdb) (the site formerly known as Twitter) or [LinkedIn](https://www.linkedin.com/company/duckdb/mycompany/), and join the [DuckDB Discord community](https://discord.duckdb.org/).

#### Traditional SQL Is Too Rigid to Reuse

SQL queries are typically crafted specifically for the unique tables within a database.
This limits reusability. 
For example, have you ever seen a library of high-level SQL helper functions?
SQL as a language typically is not flexible enough to build reusable functions.
Today, we are flying towards a more flexible future!

#### Dynamic Aggregates Macro

In SQL, typically the columns to `SELECT` and `GROUP BY` must be specified individually. 
However, in many business intelligence workloads, groupings and aggregate functions must be easily user-adjustable.
Imagine an interactive charting workflow – first I want to plot total company revenue over time.
Then if I see a dip in revenue in that first plot, I want to adjust the plot to group the revenue by business unit to see which section of the company caused the issue.
This typically requires templated SQL, using a language that compiles down to SQL (like [Malloy](https://www.malloydata.dev/)), or building a SQL string using another programming language.
How much we can do with just SQL?

Let's have a look at a flexible SQL-only approach and then break down how it is constructed. 

<details markdown='1'>
<summary markdown='span'>
    First we will create an example data table. `col1` is unique on each row, but the other columns are various groupings of the rows. 
</summary>

```sql
CREATE OR REPLACE TABLE example AS 
    SELECT x % 11 AS col1, x % 5 AS col2, x % 2 AS col3, 1 AS col4
    FROM range(1, 11) t(x);
FROM example;
```
</details>

| col1 | col2 | col3 | col4 |
| ---: | ---: | ---: | ---: |
|    1 |    1 |    1 |    1 |
|    2 |    2 |    0 |    1 |
|    3 |    3 |    1 |    1 |
|    4 |    4 |    0 |    1 |
|    5 |    0 |    1 |    1 |
|    6 |    1 |    0 |    1 |
|    7 |    2 |    1 |    1 |
|    8 |    3 |    0 |    1 |
|    9 |    4 |    1 |    1 |
|   10 |    0 |    0 |    1 |

##### Creating the Macro

The macro below accepts lists of columns to include or exclude, a list of columns to aggregate, and an aggregate function to apply.
All of these can be passed in as parameters from the host language that is querying the database.

```sql
-- We use a table macro (or function) for reusability
CREATE OR REPLACE MACRO dynamic_aggregates(
        included_columns,
        excluded_columns,
        aggregated_columns,
        aggregate_function
    ) AS TABLE (
    FROM example 
    SELECT 
        -- Use a COLUMNS expression to only select the columns
        -- we include or do not exclude
        COLUMNS(c -> (
            -- If we are not using an input parameter (list is empty),
            -- ignore it
            (list_contains(included_columns, c) OR
             len(included_columns) = 0)
            AND
            (NOT list_contains(excluded_columns, c) OR
             len(excluded_columns) = 0)
            )),
        -- Use the list_aggregate function to apply an aggregate
        -- function of our choice
        list_aggregate(
            -- Convert to a list (to enable the use of list_aggregate)
            list(
                -- Use a COLUMNS expression to choose which columns
                -- to aggregate
                COLUMNS(c -> list_contains(aggregated_columns, c))
            ), aggregate_function
        )
    GROUP BY ALL -- Group by all selected but non-aggregated columns
    ORDER BY ALL -- Order by each column from left to right 
);
```

###### Executing the Macro

Now we can use that macro for many different aggregation operations.
For illustrative purposes, the 3 queries below show different ways to achieve identical results.

Select col3 and col4, and take the minimum values of col1 and col2:

```sql
FROM dynamic_aggregates(
    ['col3', 'col4'], [], ['col1', 'col2'], 'min'
);
```

Select all columns except col1 and col2, and take the minimum values of col1 and col2:

```sql
FROM dynamic_aggregates(
    [], ['col1', 'col2'], ['col1', 'col2'], 'min'
);
```

If the same column is in both the included and excluded list, it is excluded (exclusions win ties).
If we include col2, col3, and col4, but we exclude col2, then it is as if we only included col3 and col4:

```sql
FROM dynamic_aggregates(
    ['col2', 'col3', 'col4'], ['col2'], ['col1', 'col2'], 'min'
);
```

Executing either of those queries will return this result:

| col3 | col4 | list_aggregate(list(example.col1), 'min') | list_aggregate(list(example.col2), 'min') |
| ---: | ---: | ----------------------------------------: | ----------------------------------------: |
|    0 |    1 |                                         2 |                                         0 |
|    1 |    1 |                                         1 |                                         0 |

###### Understanding the Design

The first step of our flexible [table macro](#docs:lts:sql:statements:create_macro::table-macros) is to choose a specific table using DuckDB's [`FROM`-first syntax](https://duckdb.org/2023/08/23/even-friendlier-sql#from-first-in-select-statements).
Well that's not very dynamic!
If we wanted to, we could work around this by creating a copy of this macro for each table we want to expose to our application.
However, we will show another approach in our next example, and completely solve the issue in a follow up blog post with an in-development DuckDB feature.
Stay tuned!

Then we `SELECT` our grouping columns based on the list parameters that were passed in.
The [`COLUMNS` expression](#docs:lts:sql:expressions:star::columns-expression) will execute a [lambda function](#docs:lts:sql:functions:lambda) to decide which columns meet the criteria to be selected.

The first portion of the lambda function checks if a column name was passed in within the `included_columns` list.
However, if we choose not to use an inclusion rule (by passing in a blank `included_columns` list), we want to ignore that parameter.
If the list is blank, `len(included_columns) = 0` will evaluate to `true` and effectively disable the filtering on `included_columns`.
This is a common pattern for optional filtering that is generically useful across a variety of SQL queries.
(Shout out to my mentor and friend Paul Bloomquist for teaching me this pattern!)

We repeat that pattern for `excluded_columns` so that it will be used if populated, but ignored if left blank.
The `excluded_columns` list will also win ties, so that if a column is in both lists, it will be excluded.

Next, we apply our aggregate function to the columns we want to aggregate.
It is easiest to follow the logic of this part of the query by working from the innermost portion outward.
The `COLUMNS` expression will acquire the columns that are in our `aggregated_columns` list.
Then, we do a little bit of gymnastics (it had to happen sometime...).

If we were to apply a typical aggregation function (like `sum` or `min`), it would need to be specified statically in our macro.
To pass it in dynamically as a string (potentially all the way from the application code calling this SQL statement), we take advantage of a unique property of the [`list_aggregate` function](#docs:lts:sql:functions:nested::list-aggregates).
It accepts the name of a function (as a string) in its second parameter.
So, to use this unique property, we use the [`list` aggregate function](#docs:lts:sql:functions:aggregates::general-aggregate-functions) to transform all the values within each group into a list.
Then we use the `list_aggregate` function to apply the `aggregate_function` we passed into the macro to each list.

Almost done!
Now [`GROUP BY ALL`](#docs:lts:sql:query_syntax:groupby::group-by-all) will automatically choose to group by the columns returned by the first `COLUMNS` expression.
The [`ORDER BY ALL`](#docs:lts:sql:query_syntax:orderby::order-by-all) expression will order each column in ascending order, moving from left to right.

We made it!

> Extra credit! In the next release of DuckDB, version 0.10.1, we will be able to [apply a dynamic alias](https://github.com/duckdb/duckdb/pull/10774) to the result of a `COLUMNS` expression.
> For example, each new aggregate column could be renamed in the pattern `agg_[the original column name]`.
> This will unlock the ability to chain together these type of macros, as the naming will be predictable.  

###### Takeaways

Several of the approaches used within this macro can be applied in a variety of ways in your SQL workflows.
Using a lambda function in combination with the `COLUMNS` expression can allow you to select any arbitrary list of columns.
The `OR len(my_list) = 0` trick allows list parameters to be ignored when blank.
Once you have that arbitrary set of columns, you can even apply a dynamically chosen aggregation function to those columns using `list` and `list_aggregate`.

However, we still had to specify a table at the start.
We are also limited to aggregate functions that are available to be used with `list_aggregate`.
Let's relax those two constraints!

##### Creating Version 2 of the Macro

This approach takes advantage of two key concepts:

* Macros can be used to create temporary aggregate functions
* A macro can query a [Common Table Expression (CTE) / `WITH` clause](#docs:lts:sql:query_syntax:with) that is in scope during execution

```sql
CREATE OR REPLACE MACRO dynamic_aggregates_any_cte_any_func(
    included_columns,
    excluded_columns,
    aggregated_columns
    /* No more aggregate_function */
) AS TABLE (
    FROM any_cte -- No longer a fixed table!
    SELECT 
        COLUMNS(c -> (
            (list_contains(included_columns, c) OR
            len(included_columns) = 0)
            AND 
            (NOT list_contains(excluded_columns, c) OR
            len(excluded_columns) = 0)
            )),
        -- We no longer convert to a list, 
        -- and we refer to the latest definition of any_func 
        any_func(COLUMNS(c -> list_contains(aggregated_columns, c))) 
    GROUP BY ALL 
    ORDER BY ALL 
);
```

###### Executing Version 2

When we call this macro, there is additional complexity.
We no longer execute a single statement, and our logic is no longer completely parameterizable (so some templating or SQL construction will be needed).
However, we can execute this macro against any arbitrary CTE, using any arbitrary aggregation function.
Pretty powerful and very reusable!

```sql
-- We can define or redefine any_func right before calling the macro 
CREATE OR REPLACE TEMP FUNCTION any_func(x)
    AS 100.0 * sum(x) / count(x);

-- Any table structure is valid for this CTE!
WITH any_cte AS (
    SELECT
        x % 11 AS id,
        x % 5 AS my_group,
        x % 2 AS another_group,
        1 AS one_big_group
    FROM range(1, 101) t(x)
)
FROM dynamic_aggregates_any_cte_any_func(
    ['another_group', 'one_big_group'], [], ['id', 'my_group']
);
```

| another_group | one_big_group | any_func(any_cte.id) | any_func(any_cte.my_group) |
| ------------- | ------------- | -------------------- | -------------------------- |
| 0             | 1             | 502.0                | 200.0                      |
| 1             | 1             | 490.0                | 200.0                      |

###### Understanding Version 2

Instead of querying the very boldly named `example` table, we query the possibly more generically named `any_cte`.
Note that `any_cte` has a different schema than our prior example – the columns in `any_cte` can be anything!
When the macro is created, `any_cte` doesn't even exist.
When the macro is executed, it searches for a table-like object named `any_cte`, and it was defined in the CTE as the macro was called.

Similarly, `any_func` does not exist initially.
It only needs to be created (or recreated) at some point before the macro is executed.
Its only requirements are to be an aggregate function that operates on a single column.

> `FUNCTION` and `MACRO` are synonyms in DuckDB and can be used interchangeably!

###### Takeaways from Version 2

A macro can act on any arbitrary table by using a CTE at the time it is called.
This makes our macro far more reusable – it can work on any table!
Not only that, but any custom aggregate function can be used.

Look how far we have stretched SQL – we have made a truly reusable SQL function!
The table is dynamic, the grouping columns are dynamic, the aggregated columns are dynamic, and so is the aggregate function.
Our daily gymnastics stretches have paid off.
However, stay tuned for a way to achieve similar results with a simpler approach in a future post.

#### Custom Summaries for Any Dataset

Next we have a truly production-grade example!
This query powers a portion of the MotherDuck Web UI's [Column Explorer](https://motherduck.com/blog/introducing-column-explorer/) component.
[Hamilton Ulmer](https://www.linkedin.com/in/hamilton-ulmer-28b97817/) led the creation of this component and is the author of this query as well!
The purpose of the Column Explorer, and this query, is to get a high-level overview of the data in all columns within a dataset as quickly and easily as possible.

DuckDB has a built-in [`SUMMARIZE` keyword](#docs:lts:guides:meta:summarize) that can calculate similar metrics across an entire table.
However, for larger datasets, `SUMMARIZE` can take a couple of seconds to load.
This query provides a custom summarization capability that can be tailored to the properties of your data that you are most interested in.

Traditionally, databases required that every column be referred to explicitly, and work best when data is arranged in separate columns.
This query takes advantage of DuckDB's ability to apply functions to all columns at once, its ability to [`UNPIVOT`](#docs:lts:sql:statements:unpivot) (or stack) columns, and its [`STRUCT`](#docs:lts:sql:data_types:struct) data type to store key/value pairs.
The result is a clean, pivoted summary of all the rows and columns in a table.

Let's take a look at the entire function, then break it down piece by piece.

This [example dataset](https://huggingface.co/datasets/maharshipandya/spotify-tracks-dataset) comes from [Hugging Face](https://huggingface.co/), which hosts [DuckDB-accessible Parquet files](https://huggingface.co/blog/hub-duckdb) for many of their datasets.
First, we create a local table populated from this remote Parquet file.

##### Creation

```sql
CREATE OR REPLACE TABLE spotify_tracks AS
    FROM 'https://huggingface.co/datasets/maharshipandya/spotify-tracks-dataset/resolve/refs%2Fconvert%2Fparquet/default/train/0000.parquet?download=true';
```

Then we create and execute our `custom_summarize` macro.
We use the same `any_cte` trick from above to allow this to be reused on any query result or table.

```sql
CREATE OR REPLACE MACRO custom_summarize() AS TABLE (
    WITH metrics AS (
        FROM any_cte 
        SELECT 
            {
                name: first(alias(COLUMNS(*))),
                type: first(typeof(COLUMNS(*))),
                max: max(COLUMNS(*))::VARCHAR,
                min: min(COLUMNS(*))::VARCHAR,
                approx_unique: approx_count_distinct(COLUMNS(*)),
                nulls: count(*) - count(COLUMNS(*)),
            }
    ), stacked_metrics AS (
        UNPIVOT metrics 
        ON COLUMNS(*)
    )
    SELECT value.* FROM stacked_metrics
);
```

##### Execution

The `spotify_tracks` dataset is effectively renamed to `any_cte` and then summarized.

```sql
WITH any_cte AS (FROM spotify_tracks)
FROM custom_summarize();
```

The result contains one row for every column in the raw dataset, and several columns of summary statistics.

| name             | type    | max                                                     | min                          | approx_unique | nulls |
| ---------------- | ------- | ------------------------------------------------------- | ---------------------------- | ------------: | ----: |
| Unnamed: 0       | BIGINT  | 113999                                                  | 0                            |        114089 |     0 |
| track_id         | VARCHAR | 7zz7iNGIWhmfFE7zlXkMma                                  | 0000vdREvCVMxbQTkS888c       |         89815 |     0 |
| artists          | VARCHAR | 龍藏Ryuzo                                               | !nvite                       |         31545 |     1 |
| album_name       | VARCHAR | 당신이 잠든 사이에 Pt. 4 Original Television Soundtrack | ! ! ! ! ! Whispers ! ! ! ! ! |         47093 |     1 |
| track_name       | VARCHAR | 행복하길 바래                                           | !I'll Be Back!               |         72745 |     1 |
| popularity       | BIGINT  | 100                                                     | 0                            |            99 |     0 |
| duration_ms      | BIGINT  | 5237295                                                 | 0                            |         50168 |     0 |
| explicit         | BOOLEAN | true                                                    | false                        |             2 |     0 |
| danceability     | DOUBLE  | 0.985                                                   | 0.0                          |          1180 |     0 |
| energy           | DOUBLE  | 1.0                                                     | 0.0                          |          2090 |     0 |
| key              | BIGINT  | 11                                                      | 0                            |            12 |     0 |
| loudness         | DOUBLE  | 4.532                                                   | -49.531                      |         19436 |     0 |
| mode             | BIGINT  | 1                                                       | 0                            |             2 |     0 |
| speechiness      | DOUBLE  | 0.965                                                   | 0.0                          |          1475 |     0 |
| acousticness     | DOUBLE  | 0.996                                                   | 0.0                          |          4976 |     0 |
| instrumentalness | DOUBLE  | 1.0                                                     | 0.0                          |          5302 |     0 |
| liveness         | DOUBLE  | 1.0                                                     | 0.0                          |          1717 |     0 |
| valence          | DOUBLE  | 0.995                                                   | 0.0                          |          1787 |     0 |
| tempo            | DOUBLE  | 243.372                                                 | 0.0                          |         46221 |     0 |
| time_signature   | BIGINT  | 5                                                       | 0                            |             5 |     0 |
| track_genre      | VARCHAR | world-music                                             | acoustic                     |           115 |     0 |

So how was this query constructed? 
Let's break down each CTE step by step.

##### Step by Step Breakdown

###### Metrics CTE

First let's have a look at the `metrics` CTE and the shape of the data that is returned:

```sql
FROM any_cte 
SELECT 
    {
        name: first(alias(COLUMNS(*))),
        type: first(typeof(COLUMNS(*))),
        max: max(COLUMNS(*))::VARCHAR,
        min: min(COLUMNS(*))::VARCHAR,
        approx_unique: approx_count_distinct(COLUMNS(*)),
        nulls: count(*) - count(COLUMNS(*)),
    };
```

| main.struct_pack("name" := first(alias(subset."Unnamed: 0")), ...                                  | main.struct_pack("name" := first(alias(subset.track_id)), ...                                                                         | ... | main.struct_pack("name" := first(alias(subset.time_signature)), ...                          | main.struct_pack("name" := first(alias(subset.track_genre)), ...                                              |
| -------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------- | --- | -------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------- |
| {'name': Unnamed: 0, 'type': BIGINT, 'max': 113999, 'min': 0, 'approx_unique': 114089, 'nulls': 0} | {'name': track_id, 'type': VARCHAR, 'max': 7zz7iNGIWhmfFE7zlXkMma, 'min': 0000vdREvCVMxbQTkS888c, 'approx_unique': 89815, 'nulls': 0} | ... | {'name': time_signature, 'type': BIGINT, 'max': 5, 'min': 0, 'approx_unique': 5, 'nulls': 0} | {'name': track_genre, 'type': VARCHAR, 'max': world-music, 'min': acoustic, 'approx_unique': 115, 'nulls': 0} |


This intermediate result maintains the same number of columns as the original dataset, but only returns a single row of summary statistics.
The names of the columns are truncated due to their length.
The default naming of `COLUMNS` expressions will be improved in DuckDB 0.10.1, so names will be much cleaner!

The data in each column is organized into a `STRUCT` of key-value pairs. 
You can also see that a clean name of the original column is stored within the `STRUCT` thanks to the use of the `alias` function.
While we have calculated the summary statistics, the format of those statistics is difficult to visually interpret. 

The query achieves this structure using the `COLUMNS(*)` expression to apply multiple summary metrics to all columns, and the `{...}` syntax to create a `STRUCT`.
The keys of the struct represent the names of the metrics (and what we want to use as the column names in the final result). 
We use this approach since we want to transpose the columns to rows and then split the summary metrics into their own columns.

###### `stacked_metrics` CTE

Next, the data is unpivoted to reshape the table from one row and multiple columns to two columns and multiple rows. 

```sql
UNPIVOT metrics 
ON COLUMNS(*);
```

| name                                                                        | value                                                                                                                                 |
| --------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
| main.struct_pack("name" := first(alias(spotify_tracks."Unnamed: 0")), ...   | {'name': Unnamed: 0, 'type': BIGINT, 'max': 113999, 'min': 0, 'approx_unique': 114089, 'nulls': 0}                                    |
| main.struct_pack("name" := first(alias(spotify_tracks.track_id)), ...       | {'name': track_id, 'type': VARCHAR, 'max': 7zz7iNGIWhmfFE7zlXkMma, 'min': 0000vdREvCVMxbQTkS888c, 'approx_unique': 89815, 'nulls': 0} |
| ...                                                                         | ...                                                                                                                                   |
| main.struct_pack("name" := first(alias(spotify_tracks.time_signature)), ... | {'name': time_signature, 'type': BIGINT, 'max': 5, 'min': 0, 'approx_unique': 5, 'nulls': 0}                                          |
| main.struct_pack("name" := first(alias(spotify_tracks.track_genre)), ...    | {'name': track_genre, 'type': VARCHAR, 'max': world-music, 'min': acoustic, 'approx_unique': 115, 'nulls': 0}                         |

By unpivoting on `COLUMNS(*)`, we take all columns and pivot them downward into two columns: one for the auto-generated `name` of the column, and one for the `value` that was within that column.

###### Return the Results

The final step is the most gymnastics-like portion of this query.
We explode the `value` column's struct format so that each key becomes its own column using the [`STRUCT.*` syntax](#docs:lts:sql:data_types:struct::struct).
This is another way to make a query less reliant on column names – the split occurs automatically based on the keys in the struct.

```sql
SELECT value.*
FROM stacked_metrics;
```

We have now split apart the data into multiple columns, so the summary metrics are nice and interpretable.

| name             | type    | max                                                     | min                          | approx_unique | nulls |
| ---------------- | ------- | ------------------------------------------------------- | ---------------------------- | ------------: | ----: |
| Unnamed: 0       | BIGINT  | 113999                                                  | 0                            |        114089 |     0 |
| track_id         | VARCHAR | 7zz7iNGIWhmfFE7zlXkMma                                  | 0000vdREvCVMxbQTkS888c       |         89815 |     0 |
| artists          | VARCHAR | 龍藏Ryuzo                                               | !nvite                       |         31545 |     1 |
| album_name       | VARCHAR | 당신이 잠든 사이에 Pt. 4 Original Television Soundtrack | ! ! ! ! ! Whispers ! ! ! ! ! |         47093 |     1 |
| track_name       | VARCHAR | 행복하길 바래                                           | !I'll Be Back!               |         72745 |     1 |
| popularity       | BIGINT  | 100                                                     | 0                            |            99 |     0 |
| duration_ms      | BIGINT  | 5237295                                                 | 0                            |         50168 |     0 |
| explicit         | BOOLEAN | true                                                    | false                        |             2 |     0 |
| danceability     | DOUBLE  | 0.985                                                   | 0.0                          |          1180 |     0 |
| energy           | DOUBLE  | 1.0                                                     | 0.0                          |          2090 |     0 |
| key              | BIGINT  | 11                                                      | 0                            |            12 |     0 |
| loudness         | DOUBLE  | 4.532                                                   | -49.531                      |         19436 |     0 |
| mode             | BIGINT  | 1                                                       | 0                            |             2 |     0 |
| speechiness      | DOUBLE  | 0.965                                                   | 0.0                          |          1475 |     0 |
| acousticness     | DOUBLE  | 0.996                                                   | 0.0                          |          4976 |     0 |
| instrumentalness | DOUBLE  | 1.0                                                     | 0.0                          |          5302 |     0 |
| liveness         | DOUBLE  | 1.0                                                     | 0.0                          |          1717 |     0 |
| valence          | DOUBLE  | 0.995                                                   | 0.0                          |          1787 |     0 |
| tempo            | DOUBLE  | 243.372                                                 | 0.0                          |         46221 |     0 |
| time_signature   | BIGINT  | 5                                                       | 0                            |             5 |     0 |
| track_genre      | VARCHAR | world-music                                             | acoustic                     |           115 |     0 |


#### Conclusion

We have shown that it is now possible to build reusable SQL macros in a highly flexible way. 
You can now build a macro that:
* Operates on any dataset
* Selects any columns
* Groups by any columns
* Aggregates any number of columns with any function.

Phew! 

Along the way we have covered some useful tricks to have in your toolbox:
* Applying a macro to any dataset using a CTE
* Selecting a dynamic list of columns by combining the `COLUMNS` expression with a lambda and the `list_contains` function
* Passing in an aggregate function as a string using `list_aggregate`
* Applying any custom aggregation function within a macro
* Making list parameters optional using `OR len(list_parameter) = 0`
* Using the `alias` function with a `COLUMNS` expression to store the original name of all columns
* Summarizing all columns and then transposing that summary using `UNPIVOT` and `STRUCT.*`

The combination of these friendly SQL features is more powerful than using any one individually.
We hope that we have inspired you to take your SQL to new limits!

As always, we welcome your feedback and suggestions. 
We also have more flexibility in mind that will be demonstrated in future posts.
Please share the times you have stretched SQL in imaginative ways!

Happy analyzing!

## Dependency Management in DuckDB Extensions

**Publication date:** 2024-03-22

**Author:** Sam Ansmink

**TL;DR:** While core DuckDB has zero external dependencies, building extensions with dependencies is now very simple, with built-in support for vcpkg, an open-source package manager with support for over 2000 C/C++ packages. Interested in building your own? Check out the [extension template](https://github.com/duckdb/extension-template).

#### Introduction

Ever since the birth of DuckDB, one of its main pillars has been its strict no-external-dependencies philosophy.
Paraphrasing [this 2019 SIGMOD paper](https://hannes.muehleisen.org/publications/SIGMOD2019-demo-duckdb.pdf) on DuckDB:
*To achieve the requirement of having practical “embeddability” and portability, the database needs to run in whatever
environment the host does. Dependencies on external libraries (e.g., openssh) for either compile- or runtime have been
found to be problematic.*

In this blog post, we will cover how DuckDB manages to stay true to this philosophy without forcing DuckDB developers
down the path of complete abstinence. Along the way, we will show practical examples of how external dependencies are
possible, and how you can use this when creating your own DuckDB extension.

#### The Difficulties of Complete Abstinence

Having no external dependencies is conceptually very simple. However, in a real-world system with real-world
requirements, it is difficult to achieve. Many features require complex implementations of protocols and algorithms, and
many high-quality libraries exist that implement them. What this means for DuckDB (and most other systems, for that matter)
is that there are basically three options for handling requirements with potential external dependencies:

1. Inlining external code
2. Rewriting the external dependency
3. Breaking the no-dependency rule

The first two options are pretty straightforward: to avoid depending on some external software, just make it part of
the codebase. By doing so, the unpredictable nature of depending on somebody else is now eliminated! DuckDB has applied
both inlining and rewriting to prevent dependencies. For example, the [Postgres parser](https://github.com/duckdb/duckdb/tree/main/third_party/libpg_query) and
[MbedTLS](https://github.com/duckdb/duckdb/tree/main/third_party/mbedtls) libraries are inlined into DuckDB, whereas the S3 support is provided
using a custom implementation of the AWS S3 protocol.

Okay, great – problem solved, right? Well, not so fast. Most people with some software engineering experience will realize
that both inlining and rewriting come with serious drawbacks. The
most fundamental issue is probably related to code maintenance. Every significant piece of software needs some level of
maintenance. Ranging from fixing bugs to dealing with changing (build) environments or requirements, code will
need to be modified to stay functional and relevant. When inlining/rewriting dependencies, this also copies over the
maintenance burden.

For DuckDB, this historically meant that for each dependency, very careful consideration was made to balance the
increased maintenance burden against the necessity of dependency. Including a dependency meant the responsibility of
maintaining it, so this decision was never taken lightly. This works well in many cases and has the added benefit of forcing
developers to think critically about including a dependency and not mindlessly bolt on library after library. However,
for some dependencies, this just doesn't work. Take, for example, the SDKs of large cloud providers. They tend to be pretty
massive, very frequently updated, and packed with arguably essential functionality for an increasingly mature analytical
database. This leaves an awkward choice: either not provide these essential features or break the no-dependency rule.

#### DuckDB Extensions

This is where extensions come in. Extensions provide an elegant solution to the dilemma of dependencies by allowing
fine-grained breakage of the no-dependency rule. Moving dependencies out of DuckDB's core into extensions, the core
codebase can remain, and does remain, dependency-free.
This means that DuckDB's “Practical embeddability and portability” remains unthreatened. On the other hand, DuckDB can
still provide features that inevitably require depending on some 3rd party library. Furthermore, by moving dependencies
to extensions, each extension can have different levels of exposure to instability from dependencies. For example, some
extensions may choose to depend only on highly mature, stable libraries with good portability, whereas others may choose
to include more experimental dependencies with limited portability. This choice is then forwarded to the user by
allowing them to choose which extension to use.

At DuckDB, this realization of the importance of extensions and its relation to the no-dependency rule came
[very early](https://github.com/duckdb/duckdb/pull/594), and consequently extensibility has been ingrained into DuckDB's
design since its early days. Today, many parts of DuckDB can be extended. For example, you can add functions (table,
scalar, copy, aggregation), filesystems, parsers, optimizer rules, and much more. Many new features that are added to
DuckDB are added in extensions and are grouped by either functionality or by set of dependencies. Some examples of
extensions are the [SQLite](#docs:lts:core_extensions:sqlite) extension for reading/writing to/from SQLite files or the
[Spatial](#docs:lts:core_extensions:spatial:overview) extension which offers support for a wide range of geospatial processing
features. DuckDB's extensions are distributed as loadable binaries for most major platforms (including
[DuckDB-Wasm](https://duckdb.org/2023/12/18/duckdb-extensions-in-wasm)), allowing loading and installing extensions with two simple SQL
statements:

```sql
INSTALL spatial;
LOAD spatial;
```

For most core extensions maintained by the DuckDB team, there is even an auto-install and auto-load feature which will detect the required extensions for
a SQL statement and automatically install and load them. For a detailed description of which extensions are available
and how to use them, check out the [docs](#docs:lts:core_extensions:overview).

#### Dependency Management

So far, we've seen how DuckDB avoids external dependencies in its core codebase by moving them out of the core repository into
extensions. However, we're not out of the woods yet. As DuckDB is written in C++, the most natural way to write
extensions is C++. In C++, though, there is no standard tooling like a package manager and the answer to the
question of how to do dependency management in C++ has been, for many years: *“Through much pain and anguish.”* Given
DuckDB's focus on portability and support for many platforms, managing dependencies manually is not feasible: dependencies generally are built from source, with each their own intricacies requiring special build flags and
configuration for different platforms. With a growing ecosystem of extensions, this would quickly turn into an
unmaintainable mess.

Fortunately, much has changed in the C++ landscape over the past few years. Today, good dependency managers do exist.
One of them is Microsoft's [vcpkg](https://vcpkg.io/). It has become a highly notable player among C++ dependency
managers, as proven by its 20k+ GitHub stars and native support
from [CLion](https://blog.jetbrains.com/clion/2023/01/support-for-vcpkg-in-clion/)
and [Visual Studio](https://devblogs.microsoft.com/cppblog/vcpkg-is-now-included-with-visual-studio/). vcpkg contains
over 2000 dependencies such
as [Apache Arrow](https://github.com/microsoft/vcpkg/tree/master/ports/arrow), [yyjson](https://github.com/microsoft/vcpkg/tree/master/ports/yyjson),
and [various](https://github.com/microsoft/vcpkg/tree/master/ports/azure-core-cpp) [cloud](https://github.com/microsoft/vcpkg/tree/master/ports/aws-sdk-cpp) [provider](https://github.com/googleapis/google-cloud-cpp)
SDKs.

For anyone who has ever used a package manager, using vcpkg will feel quite natural. Dependencies are specified in
a `vcpkg.json` file, and vcpkg is hooked into the build system. Now, when building, vcpkg ensures that the dependencies
specified in the `vcpkg.json` are built and available. vcpkg supports integration with multiple build systems, with a
focus on its seamless CMake integration.

#### Using vcpkg with DuckDB

Now that we covered DuckDB extensions and vcpkg, we have shown how DuckDB can manage dependencies without sacrificing
portability, maintainability and stability more than necessary. Next, we'll make things a bit more tangible by looking at
one of DuckDB's extensions and how it uses vcpkg to manage its dependencies.

##### Example: Azure extension

The [Azure](#docs:lts:core_extensions:azure) extension provides functionality related to [Microsoft Azure](https://azure.microsoft.com/),
one of the major cloud providers. DuckDB's Azure extension depends on the Azure C++ SDK to support reading directly from
Azure Storage. To do so it adds a custom filesystem and [secret type](#docs:lts:configuration:secrets_manager), which can be
used to easily query from authenticated Azure containers:

```sql
CREATE SECRET az1 (
    TYPE azure,
    CONNECTION_STRING '⟨redacted⟩'
);
SELECT column_a, column_b
FROM 'az://my-container/some-file.parquet';
```

To implement these features, the Azure extension depends on different parts of the Azure SDK. These are specified in the
Azure extensions `vcpkg.json`:

```json
{
  "dependencies": [
    "azure-identity-cpp",
    "azure-storage-blobs-cpp",
    "azure-storage-files-datalake-cpp"
  ]
}
```

Then, in the Azure extension's `CMakelists.txt` file, we find the following lines:

```cmake
find_package(azure-identity-cpp CONFIG)
find_package(azure-storage-blobs-cpp CONFIG)
find_package(azure-storage-files-datalake-cpp CONFIG)

target_link_libraries(${EXTENSION_NAME} Azure::azure-identity Azure::azure-storage-blobs Azure::azure-storage-files-datalake)
target_include_directories(${EXTENSION_NAME} PRIVATE Azure::azure-identity Azure::azure-storage-blobs Azure::azure-storage-files-datalake)
```

And that's basically it! Every time the Azure extension is built, vcpkg will be called first to
ensure `azure-identity-cpp`, `azure-storage-blobs-cpp` and `azure-storage-files-datalake-cpp` are built using the correct platform-specific flags and
available in CMake through `find_package`.

#### Building Your Own DuckDB Extension

Up until this part, we've focused on managing dependencies from a point-of-view of the developers of core DuckDB
contributors. However, all of this applies to anyone who wants to build an extension. DuckDB maintains a [C++ Extension Template](https://github.com/duckdb/extension-template),
which contains all the necessary build scripts, CI/CD pipeline and vcpkg configuration to build, test and deploy a DuckDB extension in
minutes. It can automatically build the loadable extension binaries for all available platforms, including Wasm.

##### Setting up the Extension Template

To demonstrate how simple this process is, let's go through all the steps of building a DuckDB extension from scratch,
including adding a vcpkg-managed external dependency.

Firstly, you will need to install vcpkg:

```bash
git clone https://github.com/Microsoft/vcpkg.git
./vcpkg/bootstrap-vcpkg.sh
export VCPKG_TOOLCHAIN_PATH=`pwd`/vcpkg/scripts/buildsystems/vcpkg.cmake
```

Then, you create a GitHub repository based on [the template](https://github.com/duckdb/extension-template) by clicking “Use this
template”.

Now to clone your newly created extension repo (including its submodules) and initialize the template:

```bash
git clone --recurse-submodules \
    https://github.com/⟨your_username⟩/⟨your_extension_repo⟩
cd your-extension-repo
./scripts/bootstrap-template.py url_parser
```

Finally, to confirm everything works as expected, run the tests:

```bash
make test
```

##### Adding Functionality

In its current state, the extension is, of course, a little boring. Therefore, let's add some functionality! To keep
things simple, we'll add a scalar function that parses a URL and returns the scheme. We'll call the
function `url_scheme`. We start by adding a dependency to the boost url library in our `vcpkg.json` file:

```json
{
  "dependencies": [
    "boost-url"
  ]
}
```

Then, we follow up with changing our `CMakelists.txt` to ensure our dependencies are correctly included in the build.

```cmake
find_package(Boost REQUIRED COMPONENTS url)
target_link_libraries(${EXTENSION_NAME} Boost::url)
target_link_libraries(${LOADABLE_EXTENSION_NAME} Boost::url)
```

Then, in `src/url_parser_extension.cpp`, we remove the default example functions and replace them with our
implementation of the `url_scheme` function:

```cpp
inline void UrlParserScalarFun(DataChunk &args, ExpressionState &state, Vector &result) {
  auto &name_vector = args.data[0];
  UnaryExecutor::Execute<string_t, string_t>(
    name_vector, result, args.size(),
    [&](#string_t url) {
          string url_string = url.GetString();
          boost::system::result<boost::urls::url_view> parse_result = boost::urls::parse_uri( url_string );
          if (parse_result.has_error() || !parse_result.value().has_scheme()) {
              return string_t();
          }
          string scheme = parse_result.value().scheme();
          return StringVector::AddString(result, scheme);
      });
}

static void LoadInternal(DatabaseInstance &instance) {
  auto url_parser_scalar_function = ScalarFunction("url_scheme", {LogicalType::VARCHAR}, LogicalType::VARCHAR, UrlParserScalarFun);
  ExtensionUtil::RegisterFunction(instance, url_parser_scalar_function);
}
```

With our extension written, we can run `make` to build both DuckDB and the extension. After the build is finished, we
are ready to try out our extension. Since the build process also builds a fresh DuckDB binary with the extension loaded
automatically, all we need to do is run `./build/release/duckdb`, and we can use our newly added scalar function:

```sql
SELECT url_scheme('https://github.com/duckdb/duckdb');
```

Finally, as we are well-behaved developers, we add some tests by overwriting the default test `test/sql/url_parser.test`
with:

```sql
require url_parser

# Confirm the extension works
query I
SELECT url_scheme('https://github.com/duckdb/duckdb')
----
https

# On parser errors or not finding a scheme, the result is also an empty string
query I
SELECT url_scheme('not:\a/valid_url')
----
(empty)
```

Now all that's left to do is confirm everything works as expected with `make test`, and push these changes to the remote
repository. Then, GitHub Actions will take over and ensure the extension is built for all of DuckDB's supported
platforms.

For more details, check out the template repository. Also, the example extension we built in this blog is published
on [GitHub](https://github.com/samansmink/url-parse-extension). Note that in the demo, the Wasm and MinGW builds have been
[disabled](https://github.com/samansmink/url-parse-extension/blob/935c4273eea174d99d25be156d4bfea8f55abfa6/.github/workflows/MainDistributionPipeline.yml#L21)
due to [outstanding](https://github.com/microsoft/vcpkg/issues/35408) [issues](https://github.com/microsoft/vcpkg/issues/35549)
with the boost-url dependency for building on these platforms. As these issues are fixed upstream, re-enabling their builds
for the extension is very simple. Of course, as the author of this extension, it could make a lot of sense to fix these compile issues
yourself in vcpkg and fix them not only for this extension, but for the whole open-source community!

#### Conclusion

In this blog post, we've explored DuckDB's journey towards managing dependencies in its extension ecosystem while
upholding its core philosophy of zero external dependencies. By leveraging the power of extensions, DuckDB can maintain
its portability and embeddability while still providing essential features that require external dependencies. To
simplify managing dependencies, Microsoft's vcpkg is integrated into DuckDB's extension build systems both for
DuckDB-maintained extension and third-party extensions.

If this blog post sparked your interest in creating your own DuckDB extension, check out
the [C++ Extension Template](https://github.com/duckdb/extension-template),
the [DuckDB docs on extensions](#docs:lts:core_extensions:overview),
and the very handy [duckdb-extension-radar repository](https://github.com/mehd-io/duckdb-extension-radar) that tracks public DuckDB extensions.
Additionally, DuckDB has a [Discord server](https://discord.duckdb.org) where you can ask for help on
extensions or anything DuckDB-related in general.

## 42.parquet – A Zip Bomb for the Big Data Age

**Publication date:** 2024-03-26

**Author:** Hannes Mühleisen

**TL;DR:** A 42 kB Parquet file can contain over 4 PB of data.

[Apache Parquet](https://parquet.apache.org) has become the de-facto standard for tabular data interchange. It is greatly superior to its scary cousin CSV by using a binary, columnar and *compressed* data representation. In addition, Parquet files come with enough metadata so that files can be correctly interpreted without additional information. Most modern data tools and services support reading and writing Parquet files.

However, Parquet files are not without their dangers: For example, corrupt files can crash readers that are not being very careful in interpreting internal offsets and such. But even perfectly valid files can be problematic and lead to crashes and service downtime as we will show below.

A pretty well-known attack on naive firewalls and virus scanners is a [Zip Bomb](https://en.wikipedia.org/wiki/Zip_bomb), one famous example being [42.zip](https://www.unforgettable.dk), named so because of course [42 is the perfect number](https://en.wikipedia.org/wiki/42_(number)#The_Hitchhiker's_Guide_to_the_Galaxy) and the file is only 42 kilobytes large. This perfectly-valid zip file has a bunch of other zip files in it, which again contain other zip files and so on. Eventually, if one would try to unpack all of that, you would end up with 4 petabytes of data. Big Data indeed.

Parquet files support various methods to compress data. How big of a table can one create with a Parquet file that is only 42 kilobytes large in the spirit of a zip bomb? Let's find out! For reasons of portability, we have implemented our own [Parquet reader and writers for DuckDB](#docs:lts:data:parquet:overview). It is unavoidable to learn a great deal about the Parquet format when implementing it.

A Parquet file is made up of one or more row groups, which contain columns, which in turn contain so-called pages that contain the actual data in encoded format. Among other encodings, Parquet supports [dictionary encoding](https://en.wikipedia.org/wiki/Dictionary_coder), where we first have a page with a dictionary, followed by data pages that refer to the dictionary instead of containing plain values. This is more efficient for columns where long values such as categorical strings repeat often, because the dictionary references can be much smaller.

Let's exploit that. We write a dictionary with a single value and refer to it over and over. In our example, we use a single 64-bit integer, the biggest possible value because why not. Then, we refer back to this dictionary entry using the `RLE_DICTIONARY` [run-length encoding](https://en.wikipedia.org/wiki/Run-length_encoding) specified in Parquet. The [specified encoding](https://parquet.apache.org/docs/file-format/data-pages/encodings/#run-length-encoding--bit-packing-hybrid-rle--3) is a bit weird because for some reason it combines bit packing and run-length encoding but essentially we can use the biggest run-length possible, which is `2^31-1`, a little over 2 billion. Since the dictionary is tiny (one entry), the value we repeat is 0, referring to the only entry. Including its required metadata headers and footers (like all metadata in Parquet, this is encoded using [Thrift](https://thrift.apache.org)), this file is only 133 bytes large. 133 bytes to represent 2 billion 8-byte integers is not too bad, even if they're all the same.

But we can go up from there. Columns can contain multiple pages referring to *the same* dictionary, so we can just repeat our data page over and over, each time only adding 31 bytes to the file, but 2 billion values to the table the file represents. We can also use another trick to blow up the data size: as mentioned, Parquet files contain one or more row groups, those are stored in a Thrift footer at the end of the file. Each column in this row group contains byte offsets (` data_page_offset` and friends) into the file where the pages for the columns are stored.  Nothing keeps us from adding multiple row groups that *all refer to the same byte offset*, the one where we stored our slightly mischievous dictionary and data pages. Each row group we add logically repeats all the pages. Of course, adding row groups also requires metadata storage, so there is some sort of trade-off between adding pages (2 billion values) and row groups (2x whatever other row group it duplicates).

With some fiddling, we found that if we repeat the data page 1000 times and repeat the row group 290 times, we end up with [a Parquet file](https://github.com/hannes/fortytwodotparquet/raw/main/42.parquet) that is 42 kilobytes large, yet contains *622 trillion* values (622,770,257,630,000 to be exact). If one would materialize this table in memory, it would require over *4 petabytes* of memory, finally a real example of [Big Data](https://motherduck.com/blog/big-data-is-dead/), coincidentally roughly the same size as the original `42.zip` mentioned above.

We've made the [script that we use to generate this file available as well](https://github.com/hannes/fortytwodotparquet/blob/main/create-parquet-file.py), we hope it can be used to test Parquet readers better. We hope to have shown that Parquet files can be considered harmful and should certainly not be shoved into some pipeline without being extra careful. And while DuckDB *can* read data from our file (e.g., with a `LIMIT`), if you would make it read through it all, you better get some coffee.

## No Memory? No Problem. External Aggregation in DuckDB

**Publication date:** 2024-03-29

**Author:** Laurens Kuiper

**TL;DR:** Since the 0.9.0 release, DuckDB’s fully parallel aggregate hash table can efficiently aggregate over many more groups than fit in memory.

Most grouped aggregation queries yield just a few output rows.
For example, “How many flights departed from each European capital in the past ten years?” yields one row per European capital, even if the table containing all the flight information has millions of rows.
This is not always the case, as “How many orders did each customer place in the past ten years?” yields one row per customer, which could be millions, which significantly increases the memory consumption of the query.
However, even if the aggregation does not fit in memory, DuckDB can still complete the query.

Not interested in the implementation? [Jump straight to the experiments!](#experiments)

#### Introduction

Around two years ago, we published our first blog post on DuckDB’s hash aggregation, titled [“Parallel Grouped Aggregation in DuckDB”](https://duckdb.org/2022/03/07/aggregate-hashtable).
So why are we writing another blog post now?

Unlike most database systems, which are servers, DuckDB is used in all kinds of environments, which may not have much memory.
However, some database queries, like aggregations with many unique groups, require a lot of memory.
The laptop I am writing this on has 16 GB of RAM.
What if a query needs 20 GB?
If this happens:

```console
Out of Memory Error: could not allocate block of size X (Y/Z used)
```

The query is aborted.
Sadly, we can’t [download more RAM](https://knowyourmeme.com/memes/download-more-ram).
But luckily, this laptop also has a fast SSD with 1 TB of storage.
In many cases, we don’t need all 20 GB of data to be in memory simultaneously, and we can temporarily place some data in storage.
If we load it back whenever needed, we can still complete the query.
We must be careful to use storage sparingly because despite modern SSDs being fast, they are still much slower than memory.

In a nutshell, that’s what this post is about.
Since the [0.9.0 release](https://duckdb.org/2023/09/26/announcing-duckdb-090), DuckDB’s hash aggregation can process more unique groups than fit in memory by offloading data to storage.
In this post, we’ll explain how this works.
If you want to know what hash aggregation is, how hash collisions are resolved, or how DuckDB’s hash table is structured, check out [our first blog post on hash aggregation](https://duckdb.org/2022/03/07/aggregate-hashtable).

#### Memory Management

Most database systems store persistent data on “pages”.
Upon request, these pages can be read from the _database file_ in storage, put into memory, and written back again if necessary.
The common wisdom is to make all pages the same size: This allows pages to be swapped and avoids [fragmentation](https://en.wikipedia.org/wiki/Fragmentation_(computing)) in memory and storage.
When the database is started, a portion of memory is allocated and reserved for these pages, called the “buffer pool”.
The database component that is responsible for managing the buffer pool is aptly called the “buffer manager”.

The remaining memory is reserved for short-lived, i.e., _temporary_, memory allocations, such as hash tables for aggregation.
These allocations are done differently, which is good because if there are many unique groups, hash tables may need to be very large, so we wouldn’t have been able to use the fixed-size pages for that anyway.
If we have more temporary data than fits in memory, operators like aggregation have to decide when to selectively write data to a _temporary file_ in storage.

... At least, that’s the traditional way of doing things.
This made little sense for DuckDB.
Why should we manage persistent and temporary data so differently?
The difference is that _persistent_ data should be _persisted_, and _temporary_ data should not.
Why can’t a buffer manager manage both?

DuckDB’s buffer manager is not traditional.
Most persistent and temporary data is stored on fixed-size pages and managed by the buffer manager.
The buffer manager tries to make the best use of your memory.
That means we don’t reserve a portion of memory for a buffer pool.
This allows DuckDB to use all memory for persistent data, not just a portion if that’s what’s best for your workload.
If you’re doing large aggregations that need a lot of memory, DuckDB can evict the persistent data from memory to free up space for a large hash table.

Because DuckDB’s buffer manager manages _all_ memory, both persistent and temporary data, it is much better at choosing when to write temporary data to storage than operators like aggregation could ever be.
Leaving the responsibility of offloading to the buffer manager also saves us the effort of implementing reading and writing data to a temporary file in every operator that needs to process data that does not fit in memory.

Why don’t buffer managers in other database systems manage temporary data?
There are two problems: _Memory Fragmentation_ and _Invalid References_.

##### Memory Fragmentation

Hash tables and other data structures used in query operators don’t exactly have a fixed size like the pages used for persistent data.
We also don’t want to have a lot of pages with variable sizes floating around in memory alongside the pages with a fixed size, as this would cause memory fragmentation.

Ideally, we would use the fixed size for _all_ of our memory allocations, but this is not a good idea: Sometimes, the most efficient way to process a query requires allocating, for example, a large array.
So, we settled for using a fixed size for _almost all_ of our allocations.
These short-lived allocations are immediately deallocated after use, unlike the fixed-size pages for persistent data, which are kept around.
These allocations do not cause fragmentation with each other because [jemalloc](https://jemalloc.net), which DuckDB uses for allocating memory when possible, categorizes allocations using size classes and maintains separate arenas for them.

##### Invalid References

Temporary data usually cannot be written to storage as-is because it often contains pointers.
For example, DuckDB implements the string type proposed by [Umbra](https://db.in.tum.de/~freitag/papers/p29-neumann-cidr20.pdf), which has a fixed width.
Strings longer than 12 characters are not stored within the string type, but _somewhere else_, and a pointer to this “somewhere else” is stored instead.

This creates a problem when we want to offload data to storage.
Let’s say this “somewhere else” where strings longer than 12 characters are stored is one of those pages that the buffer manager can offload to storage at any time to free up some memory.
If the page is offloaded and then loaded back, it will most likely be loaded into a different address in memory.
The pointers that pointed to the long strings are now _invalid_ because they still point to the previous address!

The usual way of writing data containing pointers to storage is by _serializing_ it first.
When reading it back into memory, it has to be _deserialized_ again.
[(De-)serialization can be an expensive operation](https://www.vldb.org/pvldb/vol10/p1022-muehleisen.pdf), hence why data formats like [Arrow Flight](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/) exist, which try to minimize the cost.
However, we can’t use Arrow here because Arrow is a column-major layout, but [a row-major layout is more efficient for hash tables](https://ir.cwi.nl/pub/13807/13807B.pdf).

We could create a row-major version of Arrow Flight, but we can just avoid (de-)serialization altogether:
We’ve created a specialized row-major _page layout_ that actually uses the old invalidated pointers to _recompute_ new valid pointers after reading the data back into memory.

The page layout places fixed-size rows and variable-size data like strings on separate pages.
The size of the rows is fixed for a query: After a SQL query is issued, DuckDB creates and executes a query plan.
So, even before executing the said plan, we already know which columns we need, their types, and how wide these types are.

As shown in the image below, a small amount of “MetaData” is needed to recompute the pointers.
The fixed-size rows are stored in “Row Pages”, and variable-size rows in “Var Pages”.

<p align="center">
    ![](../images/external_aggregation/TupleDataCollection-light.svg)

    
</p>

Remember that there are pointers within the fixed-size rows pointing to variable-size data.
The MetaData describes which fixed-size rows point to which Var Page and the last known address of the Var Page.
For example, MetaData 1 describes 5 rows stored in Row Page 1 at offset 0, with variable-size data stored in Var Page 1, which had an address of `0x42`.

Let’s say the buffer manager decides to offload Var Page 1.
When we request Var Page 1 again, it’s loaded into address `0x500`.
The pointers within those 5 rows are now invalid.
For example, one of the rows contains the pointer `0x48`, which means that it is stored at offset `0x48 - 0x42 = 6` in Var Page 1.
We can recompute the pointer by adding the offset to the new address of the page: `0x500 + 6 = 0x506`.
Pointer recomputation is done for rows with their strings stored on the same Row and Var Page, so we create a new MetaData every time a Row Page or Var Page is full.

The advantage of pointer recomputation over (de-)serialization is that it can be done lazily.
We can check whether the Var Page was offloaded by comparing the pointer in the MetaData with the current pointer to the page.
We don’t have to recompute the pointers if they are the same.

#### External Aggregation

Now that we’ve figured out how to deal with temporary data, it’s finally time to talk about hash aggregation.
The first big challenge is to perform the aggregation in parallel.

DuckDB uses [Morsel-Driven Parallelism](https://db.in.tum.de/~leis/papers/morsels.pdf) to parallelize query execution, which essentially means that query operators, such as aggregation, must be parallelism-aware.
This differs from [plan-driven parallelism](https://dl.acm.org/doi/pdf/10.1145/93605.98720), keeping operators unaware of parallelism.

To briefly summarize [our first blog post on aggregation](https://duckdb.org/2022/03/07/aggregate-hashtable): In DuckDB, all active threads have their own thread-local hash table, which they sink input data into.
This will keep threads busy until all input data has been read.
Multiple threads will likely have the _exact same group_ in their hash table.
Therefore, the thread-local hash tables must be combined to complete the grouped aggregation.
This can be done in parallel by partitioning the hash tables and assigning each thread to combine the data from each partition.
For the most part, we still use this same approach.
You’ll see this in the image below, which illustrates our new implementation.

<p align="center">
    ![](../images/external_aggregation/OOCHA-light.svg)

    
</p>

We call the first phase _Thread-Local Pre-Aggregation_.
The input data are _morsels_, chunks of around 100,000 rows.
These are assigned to active threads, which sink them into their thread-local hash table until all input data has been read.
We use _linear probing_ to resolve collisions and _salt_ to reduce the overhead of dealing with said collisions.
This is explained in [our first blog post on aggregation](https://duckdb.org/2022/03/07/aggregate-hashtable), so I won’t repeat it here.

Now that we’ve explained what _hasn’t_ changed, we can talk about what _has_ changed.
The first difference compared to last time is the way that we partition.
Before, if we had, for example, 32 threads, each thread would create 32 hash tables, one for each partition.
This totals a whopping 1024 hash tables, which did not scale well when even more threads were active.
Now, each thread has one hash table, _but the data within each hash table is partitioned_.
The data is also stored on the specialized page layout we presented earlier so that it can easily be offloaded to storage.

The second difference is that the hash tables are not _resized_ during Thread-Local Pre-Aggregation.
We keep the hash tables’ size small, reducing the amount of cache misses during this phase.
This means that the hash table will be full at some point.
When it’s full, we reset it and start over.
We can do this because we’ll finish the aggregation later in the second phase.
When we reset the hash table, we “unpin” the pages that store the actual data, which tells our buffer manager it can write them to storage when it needs to free up memory.

Together, these two changes result in a low memory requirement during the first phase.
Each thread only needs to keep a small hash table in memory.
We may collect a lot of data by filling up the hash table many times, but the buffer manager can offload almost all of it if needed.

For the second phase, _Partition-Wise Aggregation_, the thread-local partitioned data is exchanged, and each thread combines the data of a single partition into a hash table.
This phase is mostly the same as before, except that we now sometimes create many more partitions than threads.
Why? The hash table for one partition might fit in memory, but 8 threads could be combining a partition simultaneously, and we might not be able to fit 8 partitions in memory.
The easy solution to this problem is to _over-partition_.
If we make more partitions than threads, for example, 32 partitions, the size of the partitions will be smaller, and the 8 threads will combine only 8 out of the 32 partitions simultaneously, which won’t require nearly as much memory.

<a name="experiments"></a>

#### Experiments

Aggregations that result in only a few unique groups can easily fit in memory.
To evaluate our external hash aggregation implementation, we need aggregations that have many unique groups.
For this purpose, we will use the [H2O.ai database-like ops benchmark](https://duckdblabs.github.io/db-benchmark/), which [we've resurrected](https://duckdb.org/2023/04/14/h2oai), and [now maintain](https://duckdb.org/2023/11/03/db-benchmark-update).
Specifically, we will use the `G1_1e9_2e0_0_0.csv.zst` file, which is 50 GB uncompressed.
The source code for the H2O.ai benchmark can be found on [GitHub](https://github.com/duckdblabs/db-benchmark).
You can download the file yourself from <https://blobs.duckdb.org/data/G1_1e9_2e0_0_0.csv.zst> (18.8 GB compressed).

We use the following queries from the benchmark to load the data:

```sql
SET preserve_insertion_order = false;
CREATE TABLE y (
    id1 VARCHAR, id2 VARCHAR, id3 VARCHAR,
    id4 INTEGER, id5 INTEGER, id6 INTEGER,
    v1 INTEGER, v2 INTEGER, v3 FLOAT);
COPY y FROM 'G1_1e9_2e0_0_0.csv.zst' (FORMAT csv, AUTO_DETECT true);
CREATE TYPE id1ENUM AS ENUM (SELECT id1 FROM y);
CREATE TYPE id2ENUM AS ENUM (SELECT id2 FROM y);
CREATE TABLE x (
    id1 id1ENUM, id2 id2ENUM, id3 VARCHAR,
    id4 INTEGER, id5 INTEGER, id6 INTEGER,
    v1 INTEGER, v2 INTEGER, v3 FLOAT);
INSERT INTO x (SELECT * FROM y);
DROP TABLE IF EXISTS y;
```

The H2O.ai aggregation benchmark consists of 10 queries, which vary in the number of unique groups:

```sql
-- Query 1: ~100 unique groups
CREATE OR REPLACE TABLE ans AS
SELECT id1, sum(v1) AS v1
FROM x
GROUP BY id1;
```

```sql
-- Query 2: ~10,000 unique groups
CREATE OR REPLACE TABLE ans AS
SELECT id1, id2, sum(v1) AS v1
FROM x
GROUP BY id1, id2;
```

```sql
-- Query 3: ~10,000,000 unique groups
CREATE OR REPLACE TABLE ans AS
SELECT id3, sum(v1) AS v1, avg(v3) AS v3
FROM x
GROUP BY id3;
```

```sql
-- Query 4: ~100 unique groups
CREATE OR REPLACE TABLE ans AS
SELECT id4, avg(v1) AS v1, avg(v2) AS v2, avg(v3) AS v3
FROM x
GROUP BY id4;
```

```sql
-- Query 5: ~1,000,000 unique groups
CREATE OR REPLACE TABLE ans AS
SELECT id6, sum(v1) AS v1, sum(v2) AS v2, sum(v3) AS v3
FROM x
GROUP BY id6;
```

```sql
-- Query 6: ~10,000 unique groups
CREATE OR REPLACE TABLE ans AS
SELECT
    id4,
    id5,
    quantile_cont(v3, 0.5) AS median_v3,
    stddev(v3) AS sd_v3
FROM x
GROUP BY id4, id5;
```

```sql
-- Query 7: ~10,000,000 unique groups
CREATE OR REPLACE TABLE ans AS
SELECT id3, max(v1) - min(v2) AS range_v1_v2
FROM x
GROUP BY id3;
```

```sql
-- Query 8: ~10,000,000 unique groups
CREATE OR REPLACE TABLE ans AS
SELECT id6, v3 AS largest2_v3
FROM (
    SELECT id6, v3, row_number() OVER (
          PARTITION BY id6
          ORDER BY v3 DESC) AS order_v3
    FROM x
    WHERE v3 IS NOT NULL) sub_query
WHERE order_v3 <= 2;
```

```sql
-- Query 9: ~10,000 unique groups
CREATE OR REPLACE TABLE ans AS
SELECT id2, id4, pow(corr(v1, v2), 2) AS r2
FROM x
GROUP BY id2, id4;
```

```sql
-- Query 10: ~1,000,000,000 unique groups
CREATE OR REPLACE TABLE ans AS
SELECT id1, id2, id3, id4, id5, id6, sum(v3) AS v3, count(*) AS count
FROM x
GROUP BY id1, id2, id3, id4, id5, id6;
```

The [results on the benchmark page](https://duckdblabs.github.io/db-benchmark/) are obtained using the `c6id.metal` AWS EC2 instance.
On this instance, all the queries easily fit in memory, and having many threads doesn't hurt performance either.
DuckDB only takes 8.58 seconds to complete even the largest query, query 10, which returns 1 billion unique groups.
However, many people will not use such a beefy machine to crunch numbers.
On my laptop, a 2020 MacBook Pro, some smaller queries will fit in memory, like query 1, but query 10 will definitely not.

The following table is a summary of the hardware used.

| Specs       | `c6id.metal` | Laptop | Ratio |
| :---------- | -----------: | -----: | ----: |
| Memory      |       256 GB |  16 GB |   16× |
| CPU cores   |           64 |      8 |    8× |
| CPU threads |          128 |      8 |   16× |
| Hourly cost |        $6.45 |  $0.00 |   NaN |

Although the CPU cores of the AWS EC2 instance are not directly comparable with those of my laptop, the instance clearly has much more compute power and memory available.
Despite the large differences in hardware, DuckDB can complete all 10 queries without a problem:

| Query | `c6id.metal` | Laptop |  Ratio |
| ----: | -----------: | -----: | -----: |
|     1 |         0.08 |   0.74 |  9.25× |
|     2 |         0.09 |   0.76 |  8.44× |
|     3 |         8.01 | 156.63 | 19.55× |
|     4 |         0.26 |   2.07 |  7.96× |
|     5 |         6.72 | 145.00 | 21.58× |
|     6 |        17.12 |  19.28 |  1.13× |
|     7 |         6.33 | 124.85 | 19.72× |
|     8 |         6.53 | 126.35 | 19.35× |
|     9 |         0.32 |   1.90 |  5.94× |
|    10 |         8.58 | 264.14 | 30.79× |

The runtime of the queries is reported in seconds, and was obtained by taking the median of 3 runs on my laptop using DuckDB 0.10.1.
The `c6id.metal` instance results were obtained from the [benchmark website](https://duckdblabs.github.io/db-benchmark/).
Despite being unable to _fit_ all unique groups in my laptop's memory, DuckDB can _compute_ all unique groups and return them.
The largest query, query 10, takes almost 4.5 minutes to complete.
This is over 30× longer than with the beefy `c6id.metal` instance.
The large difference is, of course, explained by the large differences in hardware.
Interestingly, this is still faster than Spark on the `c6id.metal` instance, which takes 603.05 seconds!

#### Conclusion

DuckDB is constantly improving its larger-than-memory query processing capabilities.
In this blog post, we showed some of the tricks DuckDB uses for spilling and loading data from storage.
These tricks are implemented in DuckDB's external hash aggregation, released since 0.9.0.
We took the hash aggregation for a spin on the H2O.ai benchmark, and DuckDB could complete all 50 GB queries on a laptop with only 16 GB of memory.

Interested in reading more? [Read our paper on external aggregation](https://hannes.muehleisen.org/publications/icde2024-out-of-core-kuiper-boncz-muehleisen.pdf).

## duckplyr: dplyr Powered by DuckDB

**Publication date:** 2024-04-02

**Author:** Hannes Mühleisen

**TL;DR:** The new R package duckplyr translates the dplyr API to DuckDB's execution engine.

![](../images/blog/duckplyr/duckplyr.png)


> For the duckplyr documentation, visit [`duckplyr.tidyverse.org`](https://duckplyr.tidyverse.org/).

#### Background

Wrangling tabular data into a form suitable for analysis can be a challenging task. Somehow, every data set is created differently. Differences between datasets exist in their logical organization of information into rows and columns or in more specific choices like the representation of dates, currency, categorical values, missing data and so on. The task is not simplified by the lack of global consensus on trivial issues like which character to use as a decimal separator. To gain new insights, we also commonly need to combine information from multiple sources, for example by joining two data sets using a common identifier. There are some common recurring operations however, that have been found to be universally useful in reshaping data for analysis. For example, the [Structured (English) Query Language](https://s3.us.cloud-object-storage.appdomain.cloud/res-files/2705-sequel-1974.pdf), or SQL (“See-Quel”) for short describes a set of common operations that can be applied to tabular data like selection, projections, joins, aggregation, sorting, windowing, and more. SQL proved to be a huge success, despite its many warts and many attempts to replace it, it is still the de-facto language for data transformation with a gigantic industry behind it.


```R
library("DBI")
con <- dbConnect(...)
df <- dbGetQuery(con, "SELECT something, very, complicated FROM some_table JOIN another_table BY (some_shared_attribute) GROUP BY group_one, group_two ORDER BY some_column, and_another_column;")
```
*A not very ergonomic way of pulling data into R*


For data analysts in interactive programming environments like R or Python possibly from within IDEs such as RStudio or Jupyter Notebooks, using SQL to reshape data was never really a natural choice. Sure, sometimes it was required to use SQL to pull data from operational systems as shown above, but when given a choice, analysts much preferred to use the more ergonomic data reshaping facilities provided by those languages. R had built-in data wrangling from the start as part of the language with the [data.frame class to represent tabular data](https://stat.ethz.ch/R-manual/R-devel/library/base/html/data.frame.html). Later on, in 2014, Hadley Wickham defined the logical structure of tabular data for so-called [“tidy” data](https://vita.had.co.nz/papers/tidy-data.pdf) and published the first version of the [dplyr](https://dplyr.tidyverse.org) package designed to unify and simplify the previously unwieldy R commands to reshape data into a singular, unified and consistent API. In Python-land, the widely popular [Pandas project](https://pandas.pydata.org) extended Python with a de-facto tabular data representation along with relational-style operators albeit without any attempt at “tidiness”.

At some point however, the R and Python data *processing* facilities started to creak under the ever-increasing weight of datasets that people wished to analyze. Datasets quickly grew into millions of rows. For example, one of the early datasets that required [special handling](https://www.r-bloggers.com/2012/12/analyze-the-american-community-survey-acs-with-r-and-monetdb/) was the American Community Survey dataset, because there are just so many Americans.  But tools like Pandas and dplyr had been designed for convenience, not necessarily efficiency. For example, they lack the ability to parallelize data reshaping jobs on the now-common multicore processors.

And while there was a whole set of emerging “Big Data” tools, using those from an interactive data analysis environment proved to be a poor developer experience, for example due to multi-second job startup times and very complex setup procedures far beyond the skill set of most data analysts. However, the world of relational data management systems had not stood still in the meantime. Great progress had been made to improve the efficiency of analytical data analysis from SQL: Innovations around [columnar data representation](https://ir.cwi.nl/pub/21772/1900000024-Abadi-Vol5-DBS-024.pdf), [efficient query interpretation](https://www.cidrdb.org/cidr2005/papers/P19.pdf) or even [compilation](https://www.vldb.org/pvldb/vol4/p539-neumann.pdf), and [automatic efficient parallelization](https://db.in.tum.de/~leis/papers/morsels.pdf) increased query processing efficiency by several orders of magnitude. Regrettably, those innovations did not find their way into the data analysts toolkit – even as decades passed – due to lack of communication between communities and siloing of innovations into corporate, commercial, and close-source products.

There are two possible ways out of this unfortunate scenario:

1. improve the data analysis capabilities of R and Python to be able to handle larger datasets through general efficiency improvements, optimization, and parallelization;
2. somehow integrate existing state-of-the-art technology into interactive data analysis environments.

The main issue with approach one is that building a *competitive* analytical query engine from scratch is a multi-million dollar effort requiring a team of highly specialized experts on query engine construction. There are many moving highly complex parts that all have to play together nicely. There are seemingly-obvious questions in query engines that one can [get a PhD in data management systems for a solution](https://hannes.muehleisen.org/publications/ICDE2023-sorting.pdf). Recouping such a massive investment in a space where it is common that tools are built by volunteers in their spare time and released for free is challenging. That being said, there are a few commendable projects in this space like [data.table](https://CRAN.R-project.org/package=data.table) or more recently [pola.rs](https://pola.rs) that offer greatly improved performance over older tools.

Approach two is also not without its challenges: State of the art query engine technology is often hidden behind incompatible architectures. For example, the two-tier architecture where a data management system runs on a dedicated database server and client applications use a client protocol to interact with said server is rather incompatible with interactive analysis. Setting up and maintaining a separate database “server” – even on the same computer – is still painful. Moving data back and forth between the analysis environment and the database server has been [shown to be quite expensive](https://hannes.muehleisen.org/publications/p852-muehleisen.pdf). Unfortunately, those architectural decisions deeply influence the query engine trade-offs and are therefore difficult to change afterwards.

![](../images/blog/duckplyr/generic-dbms-protocol.png)


There has been movement in this space however: One of the stated goals of DuckDB is to [unshackle state-of-the-art analytical data management technology from system architecture with its in-process architecture](https://hannes.muehleisen.org/publications/CIDR2020-raasveldt-muehleisen-duckdb.pdf). Simply put, this means there is no separate database server and DuckDB instead runs within a “host” process. This host can be any application that requires data management capabilities or just an interactive data analysis environment like Python or R. Running within the host environment has another massive advantage: Moving data back and forth between the host and DuckDB is very cheap. For R and Python, DuckDB can  directly run complex queries on data frames within the analysis environment without any import or conversion steps. Conversely, DuckDB’s query results can directly be converted to data frames, greatly reducing the overhead of integrating with downstream libraries for plotting, further analysis or Machine Learning. DuckDB is able to efficiently execute arbitrarily complex relational queries including recursive and correlated queries. DuckDB is able to handle larger-than-memory datasets both in reading and writing but also when dealing with large intermediate results, for example resulting from aggregations with millions of groups. DuckDB has a sophisticated full query optimizer that removes the previously common manual optimization steps. DuckDB also offers persistence, tabular data being stored in files on disk. The tables in those files can be changed, too – while keeping transactional integrity. Those are unheard-of features in interactive data analysis, they are the result of decades of research and engineering in analytical data systems.

![](../images/blog/duckplyr/duckdb-in-r.png)


One issue remains however, DuckDB speaks SQL. While SQL is a popular language, not all analysts want to express their data transformations in SQL. One of the main issues here is that typically, queries are expressed as strings in R or Python scripts, which are sent to a database system in an opaque way. This means that those queries carry all-or-nothing semantics and it can be challenging to debug problems (“You have an error in your SQL syntax; check the manual…”). APIs like dplyr are often more convenient for the user, they allow an IDE to support things like auto-completion on functions, variable names etc. In addition, the additive nature of the dplyr API allows to build a sequence of data transformation in small steps, which reduces the cognitive load of the analyst considerably compared to writing a hundred-line SQL query. There have been some [early experimental attempts](https://hannes.muehleisen.org/publications/SSDBM2013-databases-and-statistics.pdf) to overload R’s native data frame API in order to map to SQL databases, but those approaches have been found to be too limited in generality, surprising to users and generally too brittle. A better approach is needed.

#### The duckplyr R Package

To address those issues, we have partnered up with the dplyr project team at [Posit](https://posit.co) (formerly RStudio) and [cynkra](https://www.cynkra.com) to develop [**duckplyr**](https://duckplyr.tidyverse.org/). duckplyr is a drop-in replacement for [dplyr](https://dplyr.tidyverse.org), powered by DuckDB for performance. Duckplyr implements several innovations in the interactive analysis space. First of all, installing duckplyr is just as easy as installing dplyr. DuckDB has been packaged for R as a [stand-alone R package](https://cran.r-project.org/package=duckdb) that contains the entire data management system code as well as wrappers for R. Both the DuckDB R package as well as duckplyr are available on CRAN, making installation on all major platforms a straightforward:

```R
install.packages("duckplyr")
```

##### Verbs

Under the hood, duckplyr translates the sort-of-relational [dplyr operations](https://dplyr.tidyverse.org/reference/index.html#data-frame-verbs) (“verbs”) to DuckDB’s relational query processing engine. Apart from some naming confusion, there is a mostly straightforward mapping between dplyr’s verbs such as select, filter, summarise, etc. and DuckDB’s project, filter and aggregate operators. A crucial difference from previous approaches is that duckplyr does not go through DuckDB’s SQL interface to create query plans. Instead, duckplyr uses DuckDB’s so-called “relational” API to directly construct logical query plans. This API allows to bypass the SQL parser entirely, greatly reducing the difficulty in operator, identifier, constant, and table name escaping that plagues other approaches such as dbplyr.

![](../images/blog/duckplyr/dplyr-duckdb-plans.png)


We have [exposed the C++-level relational API to R](https://github.com/duckdb/duckdb-r/blob/main/R/relational.R), so that it is possible to directly construct DuckDB query plans from R. This low-level API is not meant to be used directly, but it is used by duckplyr to transform the dplyr verbs to the DuckDB relational API and thus to query plans. Here is an example:

```R
library("duckplyr")
as_duckplyr_df(data.frame(n=1:10)) |>
    mutate(m=n+1) |>
    filter (m > 5) |>
    count() |>
    explain()
```

```text
┌───────────────────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             n             │
└─────────────┬─────────────┘                             
┌─────────────┴─────────────┐
│    UNGROUPED_AGGREGATE    │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│        count_star()       │
└─────────────┬─────────────┘                                                             
┌─────────────┴─────────────┐
│           FILTER          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│(+(CAST(n AS DOUBLE), 1.0) │
│           > 5.0)          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           EC: 10          │
└─────────────┬─────────────┘                             
┌─────────────┴─────────────┐
│     R_DATAFRAME_SCAN      │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│         data.frame        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             n             │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           EC: 10          │
└───────────────────────────┘  
```

We can see how a sequence of dplyr verbs mutate, filter, and count is “magically” transformed into a DuckDB query plan consisting of a scan, a filter, projections and an aggregate. We can see at the very bottom an `R_DATAFRAME_SCAN` operator is added. This operator directly reads an R data frame as if it were a table in DuckDB, without requiring actual data import. The new verb `explain()` causes DuckDB’s logical query plan to be printed so that we can expect what DuckDB intends to execute based on the duckplyr sequence of verbs.

##### Expressions

An often overlooked yet crucial component of data transformations are so-called expressions. Expressions are (conceptually) scalar transformations of constants and columns from the data that can be used to for example produce derived columns or to transform actual column values to boolean values to be used in filters. For example, one might write an expression like `(amount - discount) * tax` to compute the actual invoiced amount without that amount actually being stored in a column or use an expression like `value > 42` in a filter expression to remove all rows where the value is less than or equal to `42`. Dplyr relies on the base R engine to evaluate expressions with some minor modifications to resolve variable names to columns in the input data. When moving evaluation of expressions over to DuckDB, the process becomes a little bit more involved. DuckDB has its own and independent expression system consisting of a built-in set of functions (e.g., `min`), scalar values and types. To transform R expressions into DuckDB expressions, we use an interesting R feature to capture un-evaluated abstract syntax trees from function arguments. By traversing the tree, we can transform R scalar values into DuckDB scalar values, R function calls into DuckDB function calls, and R-level variable references into DuckDB column references. It should be clear that this transformation cannot be perfect: There are functions in R that DuckDB simply does not support, for example those coming from the myriad of contributed packages. While we are working on expanding the set of supported expressions, there will always be some that cannot be translated. However, in the case of non-translatable expressions, we would still be able to return a result to the user. To  achieve this, we have implemented a transparent fall-back mechanism that uses the existing R-level expression evaluation method in the case that an expression cannot be translated to DuckDB’s expression language. For example, the following transformation `m = n + 1` can be translated:

```R
as_duckplyr_df(data.frame(n=1:10)) |>
    mutate(m=n+1) |>
    explain()
```

```text
┌───────────────────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             n             │
│             m             │
└─────────────┬─────────────┘                             
┌─────────────┴─────────────┐
│     R_DATAFRAME_SCAN      │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│         data.frame        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             n             │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           EC: 10          │
└───────────────────────────┘  
```

While the following transformation using an inline lambda function cannot (yet):

```R
as_duckplyr_df(data.frame(n=1:10)) |>
    mutate(m=(\(x) x+1)(n)) |>
    explain()
```

```text
┌───────────────────────────┐
│     R_DATAFRAME_SCAN      │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│         data.frame        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             n             │
│             m             │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           EC: 10          │
└───────────────────────────┘           
```

It is a little hard to see (and we are working on improving this), the `explain()` output clearly differs between the two mutate expressions. In the first case, DuckDB computes the  `+ 1` as part of the projection operator, in the second case, the translation failed and a fallback was used, leading to the computation happening in the R engine. The upside of automatic fallback is that things “just work”. The downside is that there will usually be a performance hit from the fallback due to – for example – the lack of automatic parallelization. We are planning to add a debug mode where users can inspect the translation process and get insight into why translations fail.

##### Eager vs. Lazy Materialization

Dplyr and Pandas follow an execution strategy known as “eager materialization”. Every time an operation is invoked on a data frame, this operation is immediately executed and the result created in memory. This can be problematic. Consider the following example, a ten million row dataset is modified by adding 1 to a column. Then, the `top_n` operation is invoked to retrieve the first ten rows only. Because of eager materialization, the addition operation is executed on ten million rows, the result is created in memory, only for almost all of it to be thrown away immediately because only the first ten rows were requested. Duckplyr solves this problem by using a so-called “lazy materialization” strategy where no action is performed initially but instead the users’ intent is being captured. This means that the addition of one to ten million rows will not be performed immediately. The system is instead able to optimize the requested computation and will only perform the addition on the first few rows. Also importantly, the intermediate result of the addition is never actually created in memory, greatly reducing the memory pressure.

However, lazy computation presents a possible integration issue: The result of lazy computation has to be some sort of lazy computation placeholder object, that can be passed to another lazy operation or forced to be evaluated, e.g., via a special print method. However, this would break backwards compatibility with dplyr, where the result of each dplyr operation is a fully materialized data frame itself. This means that those results can be directly passed on to downstream operations like plotting without the plotting package having to be aware of the “lazyness” of the duckplyr result object. To address this, we have creatively used a R feature known as [ALTREP](https://homepage.stat.uiowa.edu/~luke/talks/uiowa-2018.pdf). ALTREP allows R objects to have different in-memory representations, and for custom code to be executed whenever those objects are accessed. Duckplyr results are lazy placeholder objects, yes, but they appear to be bog-standard R data frames at the same time. R data frames are essentially named lists of typed vectors with a special row.names attribute. Because DuckDB’s lazy query planning already knows the names and types of the resulting table, we can export the names into the lazy data frame. We do not however know the number of rows nor their contents yet. We therefore make both the actual data vectors and the row names vector that contains the data frame length lazy vectors. Those vectors carry a callback that the R engine will invoke whenever downstream code – e.g., plotting code – touches those vectors. The callback will actually trigger computation of the entire pipeline and transformation of the result to a R data frame. Duckplyr’s own operations will refrain from touching those vectors, they instead continue lazily using a special lazy computation object that is also stored in the lazy data frame. This method allows duckplyr to be both lazy and not at the same time, which allows full drop-in replacement with the eagerly evaluated dplyr while keeping the lazy evaluation that is crucial for DuckDB to be able to do a full-query optimization of the various transformation steps.

Here is an example of the duality of the result of duckplyr operations using R’s `inspect()` method:

```R
dd <- as_duckplyr_df(data.frame(n=1:10)) |> mutate(m=n+1)
.Internal(inspect(dd))
```

```text
@12daad988 19 VECSXP g0c2 [OBJ,REF(2),ATT] (len=2, tl=0)
  @13e0c9d60 13 INTSXP g0c0 [REF(4)] DUCKDB_ALTREP_REL_VECTOR n (INTEGER)
  @13e0ca1c0 14 REALSXP g0c0 [REF(4)] DUCKDB_ALTREP_REL_VECTOR m (DOUBLE)
ATTRIB:
  @12817a838 02 LISTSXP g0c0 [REF(1)]
    TAG: @13d80d420 01 SYMSXP g1c0 [MARK,REF(65535),LCK,gp=0x4000] "names" (has value)
    @12daada08 16 STRSXP g0c2 [REF(65535)] (len=2, tl=0)
      @13d852ef0 09 CHARSXP g1c1 [MARK,REF(553),gp=0x61] [ASCII] [cached] "n"
      @13e086338 09 CHARSXP g1c1 [MARK,REF(150),gp=0x61] [ASCII] [cached] "m"
    TAG: @13d80d9d0 01 SYMSXP g1c0 [MARK,REF(56009),LCK,gp=0x4000] "class" (has value)
    @12da9e208 16 STRSXP g0c2 [REF(65535)] (len=2, tl=0)
      @11ff15708 09 CHARSXP g0c2 [MARK,REF(423),gp=0x60] [ASCII] [cached] "duckplyr_df"
      @13d892308 09 CHARSXP g1c2 [MARK,REF(1513),gp=0x61,ATT] [ASCII] [cached] "data.frame"
    TAG: @13d80d1f0 01 SYMSXP g1c0 [MARK,REF(65535),LCK,gp=0x4000] "row.names" (has value)
    @13e0c9970 13 INTSXP g0c0 [REF(65535)] DUCKDB_ALTREP_REL_ROWNAMES
```

We can see that the internal structure of the data frame indeed reflects a data frame, but we can also see the special vectors `DUCKDB_ALTREP_REL_VECTOR` that hide the un-evaluated data vectors as well as `DUCKDB_ALTREP_REL_ROWNAMES` that hide the fact that the true dimensions of the data frame are not yet known.

#### Benchmark: TPC-H Q1

Let’s finish with a quick demonstration of duckplyr’s performance improvements. We use the data generator from the well known TPC-H benchmark, which is helpfully available as a DuckDB extension. With the “scale factor” of 1, the following DuckDB/R one-liner will generate a data set with a little over 6 million rows and store it in the R data frame named “lineitem”:

```R
lineitem <- duckdb:::sql("INSTALL tpch; LOAD tpch; CALL dbgen(sf=1); FROM lineitem;")
```

We have transformed the TPC-H benchmark query 1 from its original SQL formulation to dplyr syntax:

```R
tpch_01 <- function() {
  lineitem |>
    select(l_shipdate, l_returnflag, l_linestatus, l_quantity, l_extendedprice, l_discount, l_tax) |>
    filter(l_shipdate <= !!as.Date("1998-09-02")) |>
    select(l_returnflag, l_linestatus, l_quantity, l_extendedprice, l_discount, l_tax) |>
    summarise(
      sum_qty = sum(l_quantity),
      sum_base_price = sum(l_extendedprice),
      sum_disc_price = sum(l_extendedprice * (1 - l_discount)),
      sum_charge = sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)),
      avg_qty = mean(l_quantity),
      avg_price = mean(l_extendedprice),
      avg_disc = mean(l_discount),
      count_order = n(),
      .by = c(l_returnflag, l_linestatus)
    ) |>
    arrange(l_returnflag, l_linestatus)
}
```

We can now execute this function with both dplyr and duckplyr and observe the time required to compute the result. "Stock" dplyr takes ca. 400 milliseconds on my MacBook for this query, duckplyr requires only ca 70 milliseconds. Again, this time includes all the magic transforming the sequence of dplyr verbs into a relational operator tree, optimizing said tree, converting the input R data frame into a DuckDB intermediate on-the-fly, and transforming the (admittedly small) result back to a R data frame. Of course, the data set used here is still relatively small and the query is not that complex either, essentially a single grouped aggregation. The differences will be much more pronounced for more complex transformations on larger data sets. duckplyr can also directly access large collections of e.g., Parquet files on storage, and push down filters into those scans, which can also greatly improve performance.

#### Conclusion

The duckplyr package for R wraps DuckDB's state-of-the-art analytical query processing techniques in a dplyr-compatible API. We have gone to great lengths to ensure compatibility despite switching execution paradigms from eager to lazy and having to translate expressions to a different environment. We continue to work to expand duckplyr's capabilities but would love to hear your experiences trying it out.

Here are two recordings from last year's posit::conf where we present DuckDB for R and duckplyr:

* [In-Process Analytical Data Management with DuckDB – posit::conf(2023)](https://www.youtube.com/watch?v=9OFzOvV-to4)
* [duckplyr: Tight Integration of duckdb with R and the tidyverse – posit::conf(2023)](https://www.youtube.com/watch?v=V9GwSPjKMKw)

## Vector Similarity Search in DuckDB

**Publication date:** 2024-05-03

**Author:** Max Gabrielsson

**TL;DR:** This blog post shows a preview of DuckDB's new `vss` extension, which introduces support for HNSW (Hierarchical Navigable Small Worlds) indexes to accelerate vector similarity search.

In DuckDB v0.10.0, we introduced the [`ARRAY` data type](#docs:lts:sql:data_types:array), which stores fixed-sized lists, to complement the existing variable-size [`LIST` data type](#docs:lts:sql:data_types:list).

The initial motivation for adding this data type was to provide optimized operations for lists that can utilize the positional semantics of their child elements and avoid branching as all lists have the same length. Think e.g., the sort of array manipulations you'd do in NumPy: stacking, shifting, multiplying – you name it. Additionally, we wanted to improve our interoperability with Apache Arrow, as previously Arrow's fixed-size list types would be converted to regular variable-size lists when ingested into DuckDB, losing some type information.

However, as the hype for __vector embeddings__ and __semantic similarity search__ was growing, we also snuck in a couple of distance metric functions for this new `ARRAY` type:
[`array_distance`](#docs:lts:sql:functions:array::array_distancearray1-array2),
[`array_negative_inner_product`](#docs:lts:sql:functions:array::array_negative_inner_productarray1-array2) and
[`array_cosine_distance`](#docs:lts:sql:functions:array::array_cosine_distancearray1-array2)

> If you're one of today's [lucky 10,000](https://xkcd.com/1053/) and haven't heard of word embeddings or vector search, the short version is that it's a technique used to represent documents, images, entities – _data_ as high-dimensional _vectors_ and then search for _similar_ vectors in a vector space, using some sort of mathematical "distance" expression to measure similarity. This is used in a wide range of applications, from natural language processing to recommendation systems and image recognition, and has recently seen a surge in popularity due to the advent of generative AI and availability of pre-trained models.

This got the community really excited! While we (DuckDB Labs) initially went on record saying that we would not be adding a vector similarity search index to DuckDB as we deemed it to be too far out of scope, we were very interested in supporting custom indexes through extensions in general. Shoot, I've been _personally_ nagging on about wanting to plug-in an "R-Tree" index since the inception of DuckDBs [spatial extension](#docs:lts:core_extensions:spatial:overview)! So when one of our client projects evolved into creating a proof-of-concept custom "HNSW" index extension, we said that we'd give it a shot. And... well, one thing led to another.

Fast forward to now and we're happy to announce the availability of the `vss` vector similarity search extension for DuckDB! While some may say we're late to the vector search party, [we'd like to think the party is just getting started!](https://www.gartner.com/en/newsroom/press-releases/2023-10-11-gartner-says-more-than-80-percent-of-enterprises-will-have-used-generative-ai-apis-or-deployed-generative-ai-enabled-applications-by-2026)

Alright, so what's in `vss`?

#### The Vector Similarity Search (VSS) Extension

On the surface, `vss` seems like a comparatively small DuckDB extension. It does not provide any new data types, scalar functions or copy functions, but rather a single new index type: `HNSW` ([Hierarchical Navigable Small Worlds](https://en.wikipedia.org/wiki/Hierarchical_Navigable_Small_World_graphs)), which is a graph-based index structure that is particularly well-suited for high-dimensional vector similarity search.

```sql
-- Create a table with an array column
CREATE TABLE embeddings (vec FLOAT[3]);

-- Create an HNSW index on the column
CREATE INDEX idx ON embeddings USING HNSW (vec);
```

This index type can't be used to enforce constraints or uniqueness like the built-in [`ART` index](#docs:lts:sql:indexes), and can't be used to speed up joins or index regular columns either. Instead, the `HNSW` index is only applicable to columns of the `ARRAY` type containing `FLOAT` elements and will only be used to accelerate queries calculating the "distance" between a constant `FLOAT` `ARRAY` and the `FLOAT` `ARRAY`'s in the indexed column, ordered by the resulting distance and returning the top-n results. That is, queries of the form:

```sql
SELECT *
FROM embeddings
ORDER BY array_distance(vec, [1, 2, 3]::FLOAT[3])
LIMIT 3;
```

will have their logical plan optimized to become a projection over a new `HNSW` index scan operator, removing the limit and sort altogether. We can verify this by checking the `EXPLAIN` output:

```sql
EXPLAIN
SELECT *
FROM embeddings
ORDER BY array_distance(vec, [1, 2, 3]::FLOAT[3])
LIMIT 3;
```

```text
┌───────────────────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             #0            │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│            vec            │
│array_distance(vec, [1.0, 2│
│         .0, 3.0])         │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      HNSW_INDEX_SCAN      │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│   t1 (HNSW INDEX SCAN :   │
│            idx)           │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│            vec            │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           EC: 3           │
└───────────────────────────┘
```

You can pass the `HNSW` index creation statement a `metric` parameter to decide what kind of distance metric to use. The supported metrics are `l2sq`, `cosine` and `inner_product`, matching the three built-in distance functions: `array_distance`, `array_cosine_distance` and `array_negative_inner_product`.
The default is `l2sq`, which uses Euclidean distance (` array_distance`):

```sql
CREATE INDEX l2sq_idx ON embeddings USING HNSW (vec)
WITH (metric = 'l2sq');
```

To use cosine distance (` array_cosine_distance`):

```sql
CREATE INDEX cos_idx ON embeddings USING HNSW (vec)
WITH (metric = 'cosine');
```

To use inner product (` array_negative_inner_product`):

```sql
CREATE INDEX ip_idx ON embeddings USING HNSW (vec)
WITH (metric = 'ip');
```

#### Implementation

The `vss` extension is based on the [`usearch`](https://github.com/unum-cloud/usearch) library, which provides a flexible C++ implementation of the HNSW index data structure boasting very impressive performance benchmarks. While we currently only use a subset of all the functionality and tuning options provided by `usearch`, we're excited to explore how we can leverage more of its features in the future. So far we're mostly happy that it aligns so nicely with DuckDB's development ethos. Much like DuckDB itself, `usearch` is written in portable C++11 with no external dependencies and released under a permissive license, making it super smooth to integrate into our extension build and distribution pipeline.

#### Limitations

The big limitation as of now is that the `HNSW` index can only be created in in-memory databases, unless the `SET hnsw_enable_experimental_persistence = ⟨bool⟩`{:.language-sql .highlight} configuration parameter is set to `true`. If this parameter is not set, any attempt to create an `HNSW` index in a disk-backed database will result in an error message, but if the parameter is set, the index will not only be created in memory, but also persisted to disk as part of the DuckDB database file during checkpointing. After restarting or loading a database file with a persisted `HNSW` index, the index will be lazily loaded back into memory whenever the associated table is first accessed, which is significantly faster than having to re-create the index from scratch.

The reasoning for locking this feature behind an experimental flag is that we still have some known issues related to persistence of custom indexes that we want to address before enabling it by default. In particular, WAL recovery is not yet properly implemented for custom indexes, meaning that if a crash occurs or the database is shut down unexpectedly while there are uncommitted changes to a `HNSW`-indexed table, you can end up with data loss or corruption of the index. While it is technically possible to recover from a unexpected shutdown manually by first starting DuckDB separately, loading the `vss` extension and then `ATTACH`ing the database file, which ensures that the `HNSW` index functionality is available during WAL-playback, you should not rely on this for production workloads.

We're actively working on addressing this and other issues related to index persistence, which will hopefully make it into [DuckDB v0.10.3](#release_calendar::upcoming-releases), but for now we recommend using the `HNSW` index in in-memory databases only.

At runtime however, much like the `ART` the `HNSW` index must be able to fit into RAM in its entirety, and the memory allocated by the `HNSW` at runtime is allocated "outside" of the DuckDB memory management system, meaning that it won't respect DuckDB's `memory_limit` configuration parameter.

Another current limitation with the `HNSW` index so far is that it only supports the `FLOAT` (a 32-bit, single-precision floating point) type for the array elements and only distance metrics corresponding to the three built in distance functions, `array_distance`, `array_negative_inner_product` and `array_cosine_distance`. But this is also something we're looking to expand upon in the near future as it is much less of a technical limitation and more of a "we haven't gotten around to it yet" limitation.

#### Conclusion

The `vss` extension for DuckDB is a new extension that adds support for creating HNSW indexes on fixed-size list columns in DuckDB, accelerating vector similarity search queries. The extension can currently be installed on DuckDB v0.10.2 on all supported platforms (including Wasm!) by running `INSTALL vss; LOAD vss`. The `vss` extension treads new ground for DuckDB extensions by providing a custom index type and we're excited to refine and expand on this functionality going forward.

While we're still working on addressing some of the limitations above, particularly those related to persistence (and performance), we still really want to share this early version the `vss` extension as we believe this will open up a lot of cool opportunities for the community. So make sure to check out the [`vss` extension documentation](#docs:lts:core_extensions:vss) for more information on how to work with this extension!

This work was made possible by the sponsorship of a DuckDB Labs customer! If you are interested in similar work for specific capabilities, please reach out to [DuckDB Labs](https://duckdblabs.com/). Alternatively, we're happy to welcome contributors! Please reach out to the DuckDB Labs team over on Discord or on the [`vss` extension GitHub repository](https://github.com/duckdb/duckdb-vss) to keep up with the latest developments.

## Access 150k+ Datasets from Hugging Face with DuckDB

**Publication date:** 2024-05-29

**Authors:** The Hugging Face and DuckDB teams

**TL;DR:** DuckDB can now read data from [Hugging Face](https://huggingface.co/) via the `hf://` prefix.

We are excited to announce that we added support for `hf://` paths in DuckDB, providing access to more than 150,000 datasets for artificial intelligence. We worked with Hugging Face to democratize the access, manipulation, and exploration of datasets used to train and evaluate AI models.

#### Dataset Repositories

[Hugging Face](https://huggingface.co/) is a popular central platform where users can store, share, and collaborate on machine learning models, datasets, and other resources.

A dataset typically includes the following content:

* A `README` file: This plain text file provides an overview of the repository and its contents. It often describes the purpose, usage, and specific requirements or dependencies.
* Data files: Depending on the type of repository, it can include data files like CSV, Parquet, JSONL, etc. These are the core components of the repository.

A typical repository looks like this:

![Hugging face repository](../images/blog/hugging-face-example-repository.png)

#### Read Using `hf://` Paths

You often need to read files in various formats (such as CSV, JSONL, and Parquet) when working with data. As of version v0.10.3, DuckDB has native support for `hf://` paths as part of the [`httpfs` extension](#docs:lts:core_extensions:httpfs:overview), allowing easy access to all these formats.

Now, it is possible to query them using the URL pattern below:

```text
hf://datasets/⟨my_username⟩/⟨my_dataset⟩/⟨path_to_file⟩
```

For example, to read a CSV file, you can use the following query:

```sql
SELECT *
FROM 'hf://datasets/datasets-examples/doc-formats-csv-1/data.csv';
```

Where:

* `datasets-examples` is the name of the user/organization
* `doc-formats-csv-1` is the name of the dataset repository
* `data.csv` is the file path in the repository

The result of the query is:

| kind    | sound |
| ------- | ----- |
| dog     | woof  |
| cat     | meow  |
| pokemon | pika  |
| human   | hello |

To read a JSONL file, you can run:

```sql
SELECT *
FROM 'hf://datasets/datasets-examples/doc-formats-jsonl-1/data.jsonl';
```

Finally, for reading a Parquet file, use the following query:

```sql
SELECT *
FROM 'hf://datasets/datasets-examples/doc-formats-parquet-1/data/train-00000-of-00001.parquet';
```

Each of these commands reads the data from the specified file format and displays it in a structured tabular format. Choose the appropriate command based on the file format you are working with.

#### Creating a Local Table

To avoid accessing the remote endpoint for every query, you can save the data in a DuckDB table by running a [`CREATE TABLE ... AS` command](#docs:lts:sql:statements:create_table::create-table--as-select-ctas). For example:

```sql
CREATE TABLE data AS
    SELECT *
    FROM 'hf://datasets/datasets-examples/doc-formats-csv-1/data.csv';
```

Then, simply query the `data` table as follows:

```sql
SELECT *
FROM data;
```

#### Multiple Files

You might need to query multiple files simultaneously when working with large datasets. Let's see a quick sample using the [cais/mmlu](https://huggingface.co/datasets/cais/mmlu) (Measuring Massive Multitask Language Understanding) dataset. This dataset captures a test consisting of multiple-choice questions from various branches of knowledge. It covers 57 tasks, including elementary mathematics, US history, computer science, law, and more. To attain high accuracy on this test, AI models must possess extensive world knowledge and problem-solving ability.

First, let's count the number of rows in individual files. To get the row count from a single file in the cais/mmlu dataset, use the following query:

```sql
SELECT count(*) AS count
FROM 'hf://datasets/cais/mmlu/astronomy/dev-00000-of-00001.parquet';
```

| count |
| ----: |
|     5 |

Similarly, for another file (` test-00000-of-00001.parquet`) in the same dataset, we can run:

```sql
SELECT count(*) AS count
FROM 'hf://datasets/cais/mmlu/astronomy/test-00000-of-00001.parquet';
```

| count |
| ----: |
|   152 |

To query all files under a specific format, you can use a [glob pattern](#docs:lts:data:multiple_files:overview::multi-file-reads-and-globs). Here’s how you can count the rows in all files that match the pattern `*.parquet`:

```sql
SELECT count(*) AS count
FROM 'hf://datasets/cais/mmlu/astronomy/*.parquet';
```

| count |
| ----: |
|   173 |

By using glob patterns, you can efficiently handle large datasets and perform comprehensive queries across multiple files, simplifying your data inspections and processing tasks.
Here, you can see how you can look for questions that contain the word “planet” in astronomy:

```sql
SELECT count(*) AS count
FROM 'hf://datasets/cais/mmlu/astronomy/*.parquet'
WHERE question LIKE '%planet%';
```

| count |
| ----: |
|    21 |

And see some examples:

```sql
SELECT question
FROM 'hf://datasets/cais/mmlu/astronomy/*.parquet'
WHERE question LIKE '%planet%'
LIMIT 3;
```

| question                                                             |
| -------------------------------------------------------------------- |
| Why isn't there a planet where the asteroid belt is located?         |
| On which planet in our solar system can you find the Great Red Spot? |
| The lithosphere of a planet is the layer that consists of            |

#### Versioning and Revisions

In Hugging Face repositories, dataset versions or revisions are different dataset updates. Each version is a snapshot at a specific time, allowing you to track changes and improvements. In git terms, it can be understood as a branch or specific commit.

You can query different dataset versions/revisions by using the following URL:

```sql
hf://datasets/⟨my_username⟩/⟨my_dataset⟩@⟨my_branch⟩/⟨path_to_file⟩
```

For example:

```sql
SELECT *
FROM 'hf://datasets/datasets-examples/doc-formats-csv-1@~parquet/**/*.parquet';
```

| kind    | sound |
| ------- | ----- |
| dog     | woof  |
| cat     | meow  |
| pokemon | pika  |
| human   | hello |

The previous query will read all Parquet files under the `~parquet` revision. This is a special branch where Hugging Face automatically generates the Parquet files of every dataset to enable efficient scanning.

#### Authentication

Configure your Hugging Face Token in the DuckDB Secrets Manager to access private or gated datasets.
First, visit [Hugging Face Settings – Tokens](https://huggingface.co/settings/tokens) to obtain your access token.
Second, set it in your DuckDB session using DuckDB’s [Secrets Manager](#docs:lts:configuration:secrets_manager). DuckDB supports two providers for managing secrets:

* `CONFIG`: The user must pass all configuration information into the `CREATE SECRET` statement. To create a secret using the `CONFIG` provider, use the following command:

  ```sql
  CREATE SECRET hf_token (
     TYPE huggingface,
     TOKEN 'your_hf_token'
  );
  ```

* `credential_chain`: Automatically tries to fetch credentials. For the Hugging Face token, it will try to get it from `~/.cache/huggingface/token`. To create a secret using the `credential_chain` provider, use the following command:

  ```sql
  CREATE SECRET hf_token (
     TYPE huggingface,
     PROVIDER credential_chain
  );
  ```

#### Conclusion

The integration of `hf://` paths in DuckDB significantly streamlines accessing and querying over 150,000 datasets available on Hugging Face. This feature democratizes data manipulation and exploration, making it easier for users to interact with various file formats such as CSV, JSON,  JSONL, and Parquet. By utilizing `hf://` paths, users can execute complex queries, efficiently handle large datasets, and harness the extensive resources of Hugging Face repositories.

The integration supports seamless access to individual files, multiple files using glob patterns, and different dataset versions. DuckDB's robust capabilities ensure a flexible and streamlined data processing experience. This integration is a significant leap forward in making AI dataset access more accessible and efficient for researchers and developers, fostering innovation and accelerating progress in machine learning.

Want to learn more about leveraging DuckDB with Hugging Face datasets? Explore the [detailed guide](https://huggingface.co/docs/hub/datasets-duckdb).

## Analyzing Railway Traffic in the Netherlands

**Publication date:** 2024-05-31

**Author:** Gábor Szárnyas

**TL;DR:** We use a real-world railway dataset to demonstrate some of DuckDB's key features, including querying different file formats, connecting to remote endpoints, and using advanced SQL features.

#### Introduction

The Netherlands, the birthplace of DuckDB, has an area of about 42,000&nbsp;km² with a population of about 18 million people.
The high density of the country is a key factor in its [extensive railway network](https://en.wikipedia.org/wiki/Rail_transport_in_the_Netherlands),
which consists of 3,223 km of tracks and 397 stations.

Information about this network's stations and services is available in the form of [open datasets](https://www.rijdendetreinen.nl/en/open-data/).
These high-quality datasets are maintained by the team behind the [Rijden de Treinen _(Are the trains running?)_ application](https://www.rijdendetreinen.nl/en/about).

In this post, we'll demonstrate some of DuckDB's analytical capabilities on the Dutch railway network dataset.
Unlike most of our other blog posts, this one doesn't introduce a new feature or release: instead, it demonstrates several existing features using a single domain.
Some of the queries explained in this blog post are shown in simplified form on [DuckDB's landing page](https://duckdb.org/).

#### Loading the Data

For our initial queries, we'll use the 2023 [railway services dataset](https://www.rijdendetreinen.nl/en/open-data/train-archive).
To get this dataset, download the [`services-2023.csv.gz` file](https://blobs.duckdb.org/nl-railway/services-2023.csv.gz) (330 MB) and load it into DuckDB.

First, start the [DuckDB command line client](#docs:lts:clients:cli:overview) on a persistent database:

```bash
duckdb railway.db
```

Then, load the `services-2023.csv.gz` file into the `services` table.

```sql
CREATE TABLE services AS
    FROM 'services-2023.csv.gz';
```

Despite the seemingly simple query, there is quite a lot going on here.
Let's deconstruct the query:

* First, there is no need to explicitly define a schema for our `services` table, nor is it necessary to use a [`COPY ... FROM` statement](#docs:lts:sql:statements:copy::copy--from).
DuckDB automatically detects that the `'services-2023.csv.gz'` refers to a gzip-compressed CSV file, so it calls the [`read_csv` function](#docs:lts:data:csv:overview::csv-functions),
which decompresses the file and infers its schema from its content using the [CSV sniffer](#docs:lts:data:csv:auto_detection).

* Second, the query makes use of DuckDB's [`FROM`-first syntax](#docs:lts:sql:query_syntax:from::from-first-syntax), which allows users to omit the `SELECT *` clause.
Hence, the SQL statement `FROM 'services-2023.csv.gz';` is a shorthand for `SELECT * FROM 'services-2023.csv.gz';`.

* Third, the query creates a table called `services` and populates it with the result from the CSV reader. This is achieved using a [`CREATE TABLE ... AS` statement](#docs:lts:sql:statements:create_table::create-table--as-select-ctas).

Using [DuckDB v0.10.3](https://duckdb.org/install/index.html), loading the dataset takes approximately 5&nbsp;seconds on an M2 MacBook Pro. To check the amount of data loaded, we can run the following query which [pretty-prints](#docs:lts:sql:functions:text::print-numbers-with-thousand-separators) the number of rows in the `services` table:

```sql
SELECT format('{:,}', count(*)) AS num_services
FROM services;
```

| num_services |
| -----------: |
|   21,239,393 |

We can see that more than 21&nbsp;million train services ran in the Netherlands in 2023.

#### Finding the Busiest Station per Month

Let's ask a simple query first: _What were the busiest railway stations in the Netherlands in the first 6 months of 2023?_

First, for every month, let's compute the number of services passing through each station.
To do so, we extract the month from the service's date using the [`month` function](#docs:lts:sql:functions:datepart::monthdate),
then perform a group-by aggregation with a `count(*)`:

```sql
SELECT
    month("Service:Date") AS month,
    "Stop:Station name" AS station,
    count(*) AS num_services
FROM services
GROUP BY month, station
LIMIT 5;
```

Note that this query showcases a common redundancy in SQL: we list the names of non-aggregated columns in both the `SELECT` and the `GROUP BY` clauses.
Using DuckDB's [`GROUP BY ALL` feature](#docs:lts:sql:query_syntax:groupby::group-by-all), we can eliminate this.
At the same time, let's also turn this result into an intermediate table called `services_per_month` using a `CREATE TABLE ...  AS` statement:

```sql
CREATE TABLE services_per_month AS
    SELECT
        month("Service:Date") AS month,
        "Stop:Station name" AS station,
        count(*) AS num_services
    FROM services
    GROUP BY ALL;
```

To answer the question, we can use the [`arg_max(arg, val)` aggregation function](#docs:lts:sql:functions:aggregates::arg_maxarg-val),
which returns the column `arg` in the row with the maximum value `val`.
We filter on the month and return the results:

```sql
SELECT
    month,
    arg_max(station, num_services) AS station,
    max(num_services) AS num_services
FROM services_per_month
WHERE month <= 6
GROUP BY ALL;
```

| month | station            | num_services |
| ----: | ------------------ | -----------: |
|     1 | Utrecht Centraal   |        34760 |
|     2 | Utrecht Centraal   |        32300 |
|     3 | Utrecht Centraal   |        37386 |
|     4 | Amsterdam Centraal |        33426 |
|     5 | Utrecht Centraal   |        35383 |
|     6 | Utrecht Centraal   |        35632 |

Maybe surprisingly, in most months, the busiest railway station is not in Amsterdam but in the country's 4th largest city, [Utrecht](https://en.wikipedia.org/wiki/Utrecht), thanks to its central geographic location.

#### Finding the Top-3 Busiest Stations for Each Summer Month

Let's change the question to: _Which are the top-3 busiest stations for each summer month?_
The `arg_max()` function only helps us find the top-1 value but it is not sufficient for finding top-k results.

##### Using a Window Function (` OVER`)

DuckDB has extensive support for SQL features, including [window functions](#docs:lts:sql:functions:window_functions) and we can use the [`rank()` function](#docs:lts:sql:functions:window_functions::rank) to find top-k values.
Additionally, we use [`make_date`](#docs:lts:sql:functions:date::make_dateyear-month-day) to reconstruct the date, [`strftime`](#docs:lts:sql:functions:timestamptz::strftimetimestamptz-format) to turn it into the month's name and [`array_agg`](#docs:lts:sql:functions:aggregates::array_aggarg):

```sql
SELECT month, month_name, array_agg(station) AS top3_stations
FROM (
    SELECT
        month,
        strftime(make_date(2023, month, 1), '%B') AS month_name,
        rank() OVER
            (PARTITION BY month ORDER BY num_services DESC) AS rank,
        station,
        num_services
    FROM services_per_month
    WHERE month BETWEEN 6 AND 8
)
WHERE rank <= 3
GROUP BY ALL
ORDER BY month;
```

This gives the following result:

| month | month_name | top3_stations                                                |
| ----: | ---------- | ------------------------------------------------------------ |
|     6 | June       | [Utrecht Centraal, Amsterdam Centraal, Schiphol Airport]     |
|     7 | July       | [Utrecht Centraal, Amsterdam Centraal, Schiphol Airport]     |
|     8 | August     | [Utrecht Centraal, Amsterdam Centraal, Amsterdam Sloterdijk] |

We can see that the top 3 spots are shared between four stations: Utrecht Centraal, Amsterdam Centraal, Schiphol Airport, and Amsterdam Sloterdijk.

##### Using the `max_by(arg, val, n)` Function

Starting with DuckDB version 1.1.0, you can use a variant of the [`max_by` function](#docs:lts:sql:functions:aggregates::max_byarg-val-n) that accepts a third parameter, `n`, for the number of rows.
The resulting code is more concise and faster than the one using a window function.

```sql
SELECT
    month,
    strftime(make_date(2023, month, 1), '%B') AS month_name,
    max_by(station, num_services, 3) AS stations,
FROM services_per_month
WHERE month BETWEEN 6 AND 8
GROUP BY ALL
ORDER BY month;
```

##### Directly Querying Parquet Files through HTTPS or S3

DuckDB supports querying remote files, including CSV and Parquet, via [the HTTP(S) protocol and the S3 API](#docs:lts:core_extensions:httpfs:overview).
For example, we can run the following query:

```sql
SELECT "Service:Date", "Stop:Station name"
FROM 'https://blobs.duckdb.org/nl-railway/services-2023.parquet'
LIMIT 3;
```

It returns the following result:

| Service:Date | Stop:Station name  |
| ------------ | ------------------ |
| 2023-01-01   | Rotterdam Centraal |
| 2023-01-01   | Delft              |
| 2023-01-01   | Den Haag HS        |

Using the remote Parquet file, the query for answering [_Which are the top-3 busiest stations for each summer month?_](#::finding-the-top-3-busiest-stations-for-each-summer-month) can be run directly on a remote Parquet file without creating any local tables.
To do this, we can define the `services_per_month` table as a [common table expression in the `WITH` clause](#docs:lts:sql:query_syntax:with).
The rest of the query remains the same:

```sql
WITH services_per_month AS (
    SELECT
        month("Service:Date") AS month,
        "Stop:Station name" AS station,
        count(*) AS num_services
    FROM 'https://blobs.duckdb.org/nl-railway/services-2023.parquet'
    GROUP BY ALL
)
SELECT month, month_name, array_agg(station) AS top3_stations
FROM (
    SELECT
        month,
        strftime(make_date(2023, month, 1), '%B') AS month_name,
        rank() OVER
            (PARTITION BY month ORDER BY num_services DESC) AS rank,
        station,
        num_services
    FROM services_per_month
    WHERE month BETWEEN 6 AND 8
)
WHERE rank <= 3
GROUP BY ALL
ORDER BY month;
```

This query yields the same result as the query above, and completes (depending on the network speed) in about 1–2 seconds.
This speed is possible because DuckDB doesn't need to download the whole Parquet file to evaluate the query:
while the file size is 309&nbsp;MB, it only uses about 20&nbsp;MB of network traffic, approximately 6% of the total file size.

The reduction in network traffic is possible because of [partial reading](#docs:lts:data:parquet:overview::partial-reading) along both the columns and the rows of the data.
First, Parquet's columnar layout allows the reader to only access the required columns.
Second, the [zonemaps](#docs:lts:guides:performance:indexing::zonemaps) available in the Parquet file's metadata allow the filter pushdown optimization (e.g., the reader only fetches [row groups](#docs:lts:internals:storage::row-groups) with dates in the summer months).
Both of these optimizations are implemented via [HTTP range requests](https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests),
saving considerable traffic and time when running queries on remote Parquet files.

#### Largest Distance between Train Stations in the Netherlands

Let's answer the following question: _Which two train stations in the Netherlands have the largest distance between them when traveling via rail?_
For this, we'll use two datasets.
The first, [`stations-2022-01.csv`](https://blobs.duckdb.org/data/stations-2022-01.csv), contains information on the [railway stations](https://www.rijdendetreinen.nl/en/open-data/stations) (station name, country, etc.). We can simply load and query this dataset as follows:

```sql
CREATE TABLE stations AS
    FROM 'https://blobs.duckdb.org/data/stations-2022-01.csv';

SELECT
    id,
    name_short,
    name_long,
    country,
    printf('%.2f', geo_lat) AS latitude,
    printf('%.2f', geo_lng) AS longitude
FROM stations
LIMIT 5;
```

|   id | name_short | name_long             | country | latitude | longitude |
| ---: | ---------- | --------------------- | ------- | -------: | --------: |
|  266 | Den Bosch  | 's-Hertogenbosch      | NL      |    51.69 |      5.29 |
|  269 | Dn Bosch O | 's-Hertogenbosch Oost | NL      |    51.70 |      5.32 |
|  227 | 't Harde   | 't Harde              | NL      |    52.41 |      5.89 |
|    8 | Aachen     | Aachen Hbf            | D       |    50.77 |      6.09 |
|  818 | Aachen W   | Aachen West           | D       |    50.78 |      6.07 |

The second dataset, [`tariff-distances-2022-01.csv`](https://blobs.duckdb.org/data/tariff-distances-2022-01.csv), contains the [station distances](https://www.rijdendetreinen.nl/en/open-data/station-distances). The distances are defined as the shortest route on the railway network and they are used to calculate the tariffs for ticket.
Let's peek into this file:

```bash
head -n 9 tariff-distances-2022-01.csv | cut -d, -f1-9
```

```csv
Station,AC,AH,AHP,AHPR,AHZ,AKL,AKM,ALM
AC,XXX,82,83,85,90,71,188,32
AH,82,XXX,1,3,8,77,153,98
AHP,83,1,XXX,2,9,78,152,99
AHPR,85,3,2,XXX,11,80,150,101
AHZ,90,8,9,11,XXX,69,161,106
AKL,71,77,78,80,69,XXX,211,96
AKM,188,153,152,150,161,211,XXX,158
ALM,32,98,99,101,106,96,158,XXX
```

We can see that the distances are encoded as a matrix with the diagonal entries set to `XXX`.
As explained in the [dataset's description](https://www.rijdendetreinen.nl/en/open-data/station-distances#description), this string implies that the two stations are the same station.
If we just load the values as `XXX`, the CSV reader will assume that all columns have the type `VARCHAR` instead of numeric values.
While this can be cleaned up later, it's a lot easier to avoid this problem altogether.
To do so, we use the `read_csv` function and set the [`nullstr` parameter](#docs:lts:data:csv:overview::parameters) to `XXX`:

```sql
CREATE TABLE distances AS
    FROM read_csv(
        'https://blobs.duckdb.org/data/tariff-distances-2022-01.csv',
        nullstr = 'XXX'
    );
```

To make the `NULL` values visible in the command line output, we set the [`.nullvalue` dot command](#docs:lts:clients:cli:dot_commands) to `NULL`:

```sql
.nullvalue NULL
```

Then, using the [`DESCRIBE` statement](#docs:lts:guides:meta:describe), we can confirm that DuckDB has inferred the column correctly as `BIGINT`:

```sql
FROM (DESCRIBE distances)
LIMIT 5;
```

| column_name | column_type | null | key  | default | extra |
| ----------- | ----------- | ---- | ---- | ------- | ----- |
| Station     | VARCHAR     | YES  | NULL | NULL    | NULL  |
| AC          | BIGINT      | YES  | NULL | NULL    | NULL  |
| AH          | BIGINT      | YES  | NULL | NULL    | NULL  |
| AHP         | BIGINT      | YES  | NULL | NULL    | NULL  |
| AHPR        | BIGINT      | YES  | NULL | NULL    | NULL  |

To show the first 9 columns, we can run the following query with the [`#1`, `#2`, etc. column indexes in the `SELECT` statement](#docs:lts:sql:statements:select):

```sql
SELECT #1, #2, #3, #4, #5, #6, #7, #8, #9
FROM distances
LIMIT 8;
```

| Station | AC   |   AH |  AHP | AHPR |  AHZ |  AKL |  AKM |  ALM |
| ------- | ---- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| AC      | NULL |   82 |   83 |   85 |   90 |   71 |  188 |   32 |
| AH      | 82   | NULL |    1 |    3 |    8 |   77 |  153 |   98 |
| AHP     | 83   |    1 | NULL |    2 |    9 |   78 |  152 |   99 |
| AHPR    | 85   |    3 |    2 | NULL |   11 |   80 |  150 |  101 |
| AHZ     | 90   |    8 |    9 |   11 | NULL |   69 |  161 |  106 |
| AKL     | 71   |   77 |   78 |   80 |   69 | NULL |  211 |   96 |
| AKM     | 188  |  153 |  152 |  150 |  161 |  211 | NULL |  158 |
| ALM     | 32   |   98 |   99 |  101 |  106 |   96 |  158 | NULL |

We can see that the data was loaded correctly but the wide table format is a bit unwieldy for further processing:
to query for pairs of stations, we need to first turn it into a long table using the [`UNPIVOT`](#docs:lts:sql:statements:unpivot) statement.
Naïvely, we would write something like the following:

```sql
CREATE TABLE distances_long AS
    UNPIVOT distances
    ON AC, AH, AHP, ...
```

However, we have almost 400 stations, so spelling out their names would be quite tedious.
Fortunately, DuckDB has a trick to help with this:
the [`COLUMNS(*)` expression](#docs:lts:sql:expressions:star::columns-expression) lists all columns
and its optional `EXCLUDE` clause can remove given column names from the list.
Therefore, the expression `COLUMNS(* EXCLUDE station)` lists all column names except `station`, precisely what we need for the `UNPIVOT` command:

```sql
CREATE TABLE distances_long AS
    UNPIVOT distances
    ON COLUMNS (* EXCLUDE station)
    INTO NAME other_station VALUE distance;
```

This results in the following table:

```sql
SELECT station, other_station, distance
FROM distances_long
LIMIT 3;
```

| Station | other_station | distance |
| ------- | ------------- | -------: |
| AC      | AH            |       82 |
| AC      | AHP           |       83 |
| AC      | AHPR          |       85 |

Now we can join the `distances_long` table on the `stations` table along both the start and end stations,
then filter for stations which are located in the Netherlands.
We introduce symmetry breaking (` station < other_station`) to ensure that the same pair of stations only occurs once in the output.
Finally, we select the top-3 results:

```sql
SELECT
    s1.name_long AS station1,
    s2.name_long AS station2,
    distances_long.distance
FROM distances_long
JOIN stations s1 ON distances_long.station = s1.code
JOIN stations s2 ON distances_long.other_station = s2.code
WHERE s1.country = 'NL'
  AND s2.country = 'NL'
  AND station < other_station
ORDER BY distance DESC
LIMIT 3;
```

The results show that there are pairs of train stations, which are at least 425 km away – quite the distance for such a small country!

| station1         | station2           | distance |
| ---------------- | ------------------ | -------: |
| Eemshaven        | Vlissingen         |      426 |
| Eemshaven        | Vlissingen Souburg |      425 |
| Bad Nieuweschans | Vlissingen         |      425 |

#### Conclusion

In this post, we demonstrated some of DuckDB's key features,
including
[automatic detection of formats based on filenames](#docs:lts:data:overview),
[auto-inferencing the schema of CSV files](https://duckdb.org/2023/10/27/csv-sniffer),
[direct Parquet querying](https://duckdb.org/2021/06/25/querying-parquet),
[remote querying](#docs:lts:core_extensions:httpfs:overview),
[window functions](https://duckdb.org/2021/10/13/windowing),
[unpivot](#docs:lts:sql:statements:unpivot),
[several friendly SQL features](#docs:lts:sql:dialect:friendly_sql) (such as `FROM`-first, `GROUP BY ALL`, and `COLUMNS(*)`),
and so on.
The combination of these allows for formulating queries using different file formats (CSV, Parquet), data sources (local, HTTPS, S3), and SQL features.
This helps users answer queries quickly and efficiently.

In the next installment, we'll take a look at
temporal data using [AsOf joins](https://duckdb.org/2023/09/15/asof-joins-fuzzy-temporal-lookups)
and
geospatial data using the DuckDB [`spatial` extension](https://duckdb.org/2023/04/28/spatial).

## Announcing DuckDB 1.0.0

**Publication date:** 2024-06-03

**Authors:** Mark Raasveldt, Hannes Mühleisen

**TL;DR:** The DuckDB team is <i>very happy</i> to announce that today we’re releasing DuckDB version 1.0.0, codename “Snow Duck” (anas nivis).

To install the new version, please visit the [installation guide](https://duckdb.org/install/index.html).
For the release notes, see the [release page](https://github.com/duckdb/duckdb/releases/tag/v1.0.0).

![](../images/blog/paddling-of-ducks.svg)


It has been almost six years since the first source code was written for the project back in 2018, and a _lot_ has happened since: There are now over 300&nbsp;000 lines of C++ engine code, over 42&nbsp;000 commits and almost 4&nbsp;000 issues were opened and closed again. DuckDB has also gained significant popularity: the project has attracted tens of thousands of stars and followers on GitHub and social media platforms. Download counts are in the millions each month, and download traffic just for extensions is upwards of four terabytes _each day_. There are even [books](https://www.manning.com/books/duckdb-in-action) [being](https://www.amazon.com/Getting-Started-DuckDB-practical-efficiently/dp/1803241004) [written](https://www.oreilly.com/library/view/duckdb-up-and/9781098159689/) about DuckDB, and – most importantly – now even [Wikipedia considers DuckDB notable](https://en.wikipedia.org/wiki/DuckDB), albeit barely.

#### Why Now?

Of course, version numbers are somewhat arbitrary and “feely”, despite [attempts](https://semver.org/spec/v2.0.0.html) at making them more mechanical. We could have released DuckDB 1.0.0 back in 2018, or we could have waited ten more years. There is never a great moment, because software (with the exception of [TeX](https://x.com/fermatslibrary/status/1740324503308169507)) is never “done”. Why choose today?

Data management systems – even purely analytical ones – are such core components of any application that there is always an implicit contract of trust between their developers and users. Users rely on databases to provide correct query results and to not lose their data. At the same time, system developers need to be aware of their responsibility of not breaking people’s applications willy-nilly. Intuitively, version 1.0.0 means something else for a data management system than it means for an egg timer app (no offense). From the very beginning, we were committed to making DuckDB a reliable base for people to build their applications on. This is also why the 1.0.0 release is named after the non-existent _snow duck (anas nivis),_ harking back to Apple’s [Snow Leopard](https://arstechnica.com/gadgets/2009/08/mac-os-x-10-6/) release some years ago.

For us, one of the major blockers to releasing 1.0.0 was the storage format. DuckDB has its own custom-built data storage format. This format allows users to manage many (possibly very large) tables in a single file with full transactional semantics and state-of-the-art compression. Of course, designing a new file format is not without its challenges, and we had to make significant changes to the format over time. This led to the suboptimal situation that whenever a new DuckDB version was released, the files created with the old version did not work with the new DuckDB version and had to be manually upgraded. This problem was addressed in v0.10.0 back in February – where we introduced [backward compatibility and limited forward compatibility for DuckDB’s storage format](https://duckdb.org/2024/02/13/announcing-duckdb-0100#backward-compatibility). This feature has now been used in the wild for a while without serious issues – providing us with the confidence to offer a guarantee that DuckDB files created with DuckDB 1.0.0 will be compatible with future DuckDB versions.

#### Stability

The core theme of the 1.0.0 release is stability. This contrasts it with previous releases where we have had blog posts talk about long lists of new features. Instead, the 1.0.0 release has very limited new features (a [few](https://github.com/duckdb/duckdb/pull/11677) [might](https://github.com/duckdb/duckdb/pull/11918) [have](https://github.com/duckdb/duckdb/pull/11831) [snuck](https://github.com/duckdb/duckdb/pull/11835) in). Instead, our focus has been on stability.

We’ve observed the frankly staggering growth in the amount and breadth of use of DuckDB in the wild, and have not seen an increase in serious issues being reported. Meanwhile, there are thousands of test cases with millions of test queries being run every night. We run loads of microbenchmarks and standardized benchmark suites to spot performance regressions. DuckDB is constantly being tortured by various fuzzers that construct all manners of wild SQL queries to make sure we don’t miss weird corner cases. All told, this has built the necessary confidence in us to release a 1.0.0.

Another core aspect of stability with the 1.0.0 release is stability across versions. While [never breaking anyone's workflow is likely impossible](https://xkcd.com/1172/), we plan to be much more careful with user-facing changes going forward. In particular, we plan to focus on providing stability for the SQL dialect, as well as the C API. While we do not guarantee that we will never change semantics in these layers in the future – we will try to provide ample warning when doing so, as well as providing workarounds that allow previously working code to keep on working.

#### Looking ahead

Unlike many open-source projects, DuckDB also has a healthy long-term funding strategy. [DuckDB Labs](https://duckdblabs.com/), the company that employs DuckDB’s core contributors, has not had any outside investments, and as a result, the company is fully owned by the team. Labs’ business model is to provide consulting and support services for DuckDB, and we’re happy to report that this is going well. With the revenue from contracts, we fund long-term and strategic DuckDB development with a team of almost 20 people. At the same time, the intellectual property in the project is guarded by the independent [DuckDB Foundation](https://duckdb.org/foundation/index.html). This non-profit foundation ensures that DuckDB will be around long-term under the MIT license.

Regarding long-term plans, there are, of course, many things on the roadmap still. One thing we’re very excited about is the ability to expand the extension environment around DuckDB. Extensions are plug-ins that can add new SQL-level functions, file formats, optimizers, etc. while keeping the DuckDB core mean and lean. There are already an impressive number of third-party extensions to DuckDB, and we’re working hard to streamline the process of building and distributing community-contributed extensions. We think DuckDB can become the basis for the next revolution in data through community extensions connected by a high-performance data fabric accessible through a unified SQL interface.

Of course, there will be issues found in today’s release. But rest assured, there will be a 1.0.1 release. There will be a 1.1.0. And there might also be a 2.0.0 at some point. We’re in this for the long run, all of us, together. We have the team and the structures and resources to do so.

#### Acknowledgments

First of all, we are very, very grateful to you all. Our massive and heartfelt thanks go to everyone who has contributed code, filed issues or engaged in discussions, promoted DuckDB in their environment, and, of course, all DuckDB users. We could not have done it without you!

We would also like to thank the [CWI Database Architectures group](https://www.cwi.nl/en/groups/database-architectures/) for providing us with the environment and expertise to build DuckDB, the organizations that provided us with research grants early on, the excellent [customers of DuckDB Labs](https://duckdblabs.com/#collaborators) that make it all work (especially the early ones), and the generous donors to the [DuckDB Foundation](https://duckdb.org/foundation/index.html). We are particularly grateful to our long-standing Gold sponsors [MotherDuck](https://motherduck.com/), [Voltron Data](https://voltrondata.com/) and [Posit](https://posit.co/).

Finally, we would like to thank the [excellent and amazing team at DuckDB Labs](https://duckdblabs.com/#about).

So join us now in being nostalgic, teary-eyed and excited for what’s to come for DuckDB and celebrate the release of DuckDB 1.0.0 with us. We certainly will.

Mark and Hannes

## Native Delta Lake Support in DuckDB

**Publication date:** 2024-06-10

**Author:** Sam Ansmink

**TL;DR:** DuckDB now has native support for [Delta Lake](https://delta.io/), an open-source lakehouse framework, with the `delta` extension.

Over the past few months, DuckDB Labs has teamed up with Databricks to add first-party support for Delta Lake in DuckDB using
the new [`delta-kernel-rs`](https://github.com/delta-incubator/delta-kernel-rs) project. In this blog post we'll give you a short
overview of Delta Lake, Delta Kernel and, of course, present the new DuckDB Delta extension.

If you're already dearly familiar with Delta Lake and Delta Kernel, or you are just here to know how to boogie, feel free to
[skip to the juicy bits](#::how-to-use-delta-in-duckdb) on how to use the DuckDB with Delta.

#### Intro

[Delta Lake](https://delta.io/) is an open-source storage framework that enables building a lakehouse architecture. So to understand Delta Lake,
we need to understand what the lakehouse architecture is. The lakehouse is a data management architecture that strives to combine the cost-effectiveness
of cheap object storage with a smart management layer. In simple terms, lakehouse architectures are a collection of files in various formats,
with some additional metadata layers on top. These metadata layers aim to provide extra functionality on top of the raw collection of files
such as ACID transactions, time travel, partition- and schema evolution, statistics, and much more.
What a lakehouse architecture enables, is to run various types of data-intensive applications
such as data analytics and machine learning applications, directly on a vast collection of structured, semi-structured and
unstructured data, without the need for an intermediate data warehousing step. If you're ready for the deep dive, we recommend reading the
CIDR 2021 paper ["Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics"](https://www.cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf) by Michael Armbrust et al.
However, if you're (understandably) hesitant to dive into dense scientific literature, this image sums it up pretty well:

![](../images/blog/delta/lakehouse_arch.png)

Lakehouse architecture (image source: <a href="https://www.cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf#page=2">Armburst et al., CIDR 2021</a>)

#### Delta Lake

Now let’s zoom in a little on our star of the show for tonight, Delta Lake. Delta Lake (or simply "Delta") is currently one of the leading open-source
lakehouse formats, along with [Apache Iceberg](https://iceberg.apache.org/) and [Apache Hudi](https://hudi.apache.org/). The easiest way to get a feeling for what a Delta table is, is to think of
a Delta table as a "collection of Parquet files with some metadata". With this slight oversimplification in mind, we will now create a
Delta table and examine the files that are created, to improve our understanding. To do this, we'll set up Python with the packages: [`duckdb`](https://pypi.org/project/duckdb/), [`pandas`](https://pypi.org/project/pandas/) and [`deltalake`](https://pypi.org/project/deltalake/):

```bash
pip install duckdb pandas deltalake
```

Then, we use DuckDB to create some dataframes with test data, and write that to a Delta table using the `deltalake` package:

```python
import duckdb
from deltalake import DeltaTable, write_deltalake
con = duckdb.connect()
df1 = con.query("SELECT i AS id, i % 2 AS part, 'value-' || i AS value FROM range(0, 5) tbl(i)").df()
df2 = con.query("SELECT i AS id, i % 2 AS part, 'value-' || i AS value FROM range(5, 10) tbl(i)").df()
write_deltalake(f"./my_delta_table", df1,  partition_by=["part"])
write_deltalake(f"./my_delta_table", df2,  partition_by=["part"], mode='append')
```

With this script run, we have created a basic Delta table containing 10 rows, split across two partitions that we added in
two separate steps. To double-check that everything is going to plan, let’s use DuckDB to query the table:

```sql
SELECT *
FROM delta_scan('./my_delta_table')
ORDER BY id;
```

|   id | part | value   |
| ---: | ---: | ------- |
|    0 |    0 | value-0 |
|    1 |    1 | value-1 |
|    2 |    0 | value-2 |
|    3 |    1 | value-3 |
|    4 |    0 | value-4 |
|    5 |    1 | value-5 |
|    6 |    0 | value-6 |
|    7 |    1 | value-7 |
|    8 |    0 | value-8 |
|    9 |    1 | value-9 |

That looks great! All our expected data is there. Now let’s take a look at what files have actually been created using `tree`:

```bash
tree ./my_delta_table`
```

```text
my_delta_table
├── _delta_log
│   ├── 00000000000000000000.json
│   └── 00000000000000000001.json
├── part=0
│   ├── 0-f45132f6-2231-4dbd-aabb-1af29bf8724a-0.parquet
│   └── 1-76c82535-d1e7-4c2f-b700-669019d94a0a-0.parquet
└── part=1
    ├── 0-f45132f6-2231-4dbd-aabb-1af29bf8724a-0.parquet
    └── 1-76c82535-d1e7-4c2f-b700-669019d94a0a-0.parquet
```

The `tree` output shows 2 different types of files. While a Delta table can contain various other types of files, these form
the basis of any Delta table.

Firstly, there are **data files** in Parquet format. The data files contain all the data that is stored in the table. This is very similar
to how data is stored when DuckDB is used to write [partitioned Parquet files](#docs:lts:data:partitioning:partitioned_writes).

Secondly, there are **delta files** in JSON format. The Delta files contain a
log of the changes that have been made to the table. By replaying this log, a reader can construct a valid view of the table. To illustrate this, let’s
take a small peek into one of the first Delta log files:

```bash
cat my_delta_table/_delta_log/00000000000000000000.json
```

```json
...
{ "add": {
    "path": "part=1/0-f45132f6-2231-4dbd-aabb-1af29bf8724a-0.parquet",
    "partitionValues": { "part": "1" }
  },
  ...
}
{ "add": {
    "path": "part=0/0-f45132f6-2231-4dbd-aabb-1af29bf8724a-0.parquet",
    "partitionValues": { "part": "0" },
  },
  ...
}
...
```

As we can see, this log file contains two `add` objects that describe some data being added to respectively the `1` and `0` partitions. Note also
that the partition values themselves are stored in these Delta files explicitly, so even though the file structure looks very similar
to a [Hive-style](#docs:lts:data:partitioning:hive_partitioning) partitioning scheme, the folder names are not actually used by Delta internally. Instead, the partition values are read from the metadata.

Now with this simple example, we've shown the basics of how Delta works. For a more thorough understanding of the internals,
we refer to the [official Delta specification](https://github.com/delta-io/delta/blob/master/PROTOCOL.md), which is, by protocol specification standards,
quite easy to read. The official specification describes in detail how Delta handles every detail, from the basics described here to more complex things like checkpointing, deletes, schema evolution, and much more.

#### Implementation

##### The Delta Kernel

Supporting a relatively complex protocol such as Delta, requires significant development and maintenance effort. For this reason, when looking to
add support for such a protocol to an engine, the logical choice would be to look for a ready-to-use library to take care of this. In the case of Delta Lake, we could, for example, opt for the [`delta-rs` library](https://github.com/delta-io/delta-rs).
However, when it comes to implementing a native DuckDB Delta extension, this is problematic:
if we were to use the `delta-rs` library for implementing the DuckDB extension, all interaction with the Delta tables would go through the `delta-rs` library. But remember,
a Delta table is effectively *"just a bunch of Parquet files with some metadata"*. Therefore, this would mean that when DuckDB wants to read a Delta table,
the data files will be read by the `delta-rs` Parquet reader, using the`delta-rs` filesystem. But that's annoying: DuckDB already comes shipped with
an [excellent Parquet reader](#docs:lts:data:parquet:overview). Also, DuckDB already has support for a [variety](#docs:lts:core_extensions:httpfs:hugging_face) [of](#docs:lts:core_extensions:httpfs:s3api) [filesystems](#docs:lts:core_extensions:azure) with its own [credential management system](#docs:lts:configuration:secrets_manager). By using a library like
`delta-rs` for DuckDB's Delta extension this would actually run into a variety of problems:

- increased extension binary size
- inconsistent user experience between `delta_scan` and `read_parquet`
- increased maintenance load

Now to solve these problems, we would prefer to have some library that implements **only the Delta protocol** while letting DuckDB handle all the things it already knows how to handle.

Fortunately for us, this library exists and it's called the [Delta Kernel Project](https://delta.io/blog/delta-kernel).
The Delta Kernel is a "set of libraries for building Delta connectors that can read from and write into Delta tables without the need to understand the Delta protocol details".
This is done by exposing two relatively simple sets of APIs that an engine would implement, as shown in the image below:

![](../images/blog/delta/kernel.png)


For more details on the `delta-kernel-rs` project, we refer to this [excellent blog post](https://delta.io/blog/delta-kernel/), which goes in-depth into
the internals and design rationale.

Now while the `delta-kernel-rs` library is still experimental, it has [recently launched its v0.1.0 version](https://github.com/delta-incubator/delta-kernel-rs/releases/tag/v0.1.0), and already offers a lot of functionality.
Furthermore, because `delta-kernel-rs` exposes a C/C++ foreign function interface, integrating it into a DuckDB extension has been very straightforward.

##### DuckDB Delta Extension `delta_scan`

Now we're ready to dive into the nitty-gritties of the DuckDB Delta extension internals. To start, the Delta extension
currently implements a single table function: `delta_scan`. It's a simple but powerful function that scans a Delta Table.

To understand how this function is implemented, we first need to establish the four main components involved:

| Component         | Description                                                                            |
| ----------------- | -------------------------------------------------------------------------------------- |
| Delta kernel      | The [delta-kernel-rs](#::the-delta-kernel) library                                       |
| Delta extension   | DuckDB's loadable [Delta extension](#docs:lts:core_extensions:delta)   |
| Parquet extension | DuckDB's loadable [Parquet extension](#docs:lts:data:parquet:overview) |
| DuckDB            | Super cool duck-themed analytical database                                             |

Additionally, we need to understand that there are four main APIs involved:

| API                    | Description                                                                                                                                                                        |
| ---------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `FileSystem`           | DuckDB's API for I/O (for local files, [Azure](#docs:lts:core_extensions:azure), [S3](#docs:lts:core_extensions:httpfs:s3api), etc.)               |
| `TableFunction`        | DuckDB's API for table functions (e.g., [`read_parquet`](#docs:lts:data:parquet:overview), [`read_csv`](#docs:lts:guides:file_formats:csv_import)) |
| `MultiFileReader`      | DuckDB's API for handling multi-file scans                                                                                                                                         |
| Delta Kernel C/C++ FFI | Delta Kernel [FFI](https://github.com/delta-incubator/delta-kernel-rs) for Delta Lake                                                                                              |

Now we have all the links, let's tie them all together. When a user runs a query with a `delta_scan` table function,
DuckDB will call into the `delta_scan` function from the Delta extension using the `TableFunction` API. The `delta_scan` table function, however,
is actually just an exact copy of the regular `read_parquet` function.
To change the `read_parquet` into a `delta_scan`, it will replace the regular `MultiFileReader` of the `parquet_scan` (which simply scans a list or glob of files), with
a custom `DeltaMultiFileReader` that will generate a list of files based on the Delta Table metadata. Finally, whenever the Parquet extension requires
any IO, it will call into DuckDB using the `FileSystem` API to handle the I/O. This entire interaction is captured in the diagram below.

![](../images/blog/delta/delta_ext_overview-light.svg)

    

In this Diagram, we can see all four components involved in the processing of query containing a `delta_scan` table function. The arrows represent the communication that occurs
between the components across the four APIs. Now when reading a Delta Table, we can see that the Metadata is handled on the right side going through the Delta Kernel. On the left side
we can see how the Parquet data flows through the Parquet extension.

While there are obviously some important details missing here, such as the handling of deletion vectors and column mappings, we have now covered the basic concept of the DuckDB
Delta extension. Also, we have demonstrated how the current implementation achieves a very natural logical separation, with component internals being abstracted away by connecting
through clearly defined APIs. In doing so, the implementation achieves the following key properties:

1. **The details of the Delta protocol remain largely opaque to any DuckDB component.** The only point of contact with the internals of the Delta protocol is the narrow
   FFI exposed by the Delta kernel. This is fully handled by the Delta extension, whose only job is to translate this into native DuckDB APIs.

2. **Full reuse of existing Parquet scanning logic** of DuckDB, without any code reuse or compile time dependencies between extensions. Because all interaction between the Delta and Parquet extension is done over DuckDB APIs through the running DuckDB instance, the extensions only interface over the `TableFunction` and `MultiFileReader` APIs. This also means that any future optimizations that are made to the Parquet extension will automatically be available in the Delta extension.

3. **All I/O will go through DuckDB's `FileSystem` API.** This means that all file systems ([Azure](#docs:lts:core_extensions:azure), [S3](#docs:lts:core_extensions:httpfs:s3api), etc.) that are available to DuckDB, are available to scan with.
   This means that any DuckDB file system that can read and list files can be used for Delta. This is also useful in DuckDB-Wasm where custom filesystem implementations are used. *Warning*, two small notes need to
   be made here. Firstly, currently the DuckDB Delta extension still lets a small part of IO be handled by the Delta kernel through internal filesystem libraries, this is due to the FFI not yet exposing the `FileSystem` APIs, but this will change very soon. Secondly, while the architectural
   design of the Delta Extension is made with DuckDB-Wasm in mind, the Wasm version of the extension is not yet available.

#### How to Use Delta in DuckDB

Using the Delta extension in DuckDB is very simple, as it is distributed as one of the core DuckDB extensions, and available for [autoloading](#docs:lts:core_extensions:overview::autoloading-extensions).
What this means is that you can simply start DuckDB (using v0.10.3 or higher) and run:

```sql
SELECT * FROM delta_scan('./my_delta_table');
```

DuckDB will automatically install and load the Delta Extension. Then it will query the local Delta table `./my_delta_table`.

In case your Delta table lives on S3, there are probably some S3 credentials that you want to set. If these credentials are already in
one of the [default places](https://github.com/aws/aws-sdk-cpp/blob/main/docs/Credentials_Providers.md), such as an environment variable, or in the `~/.aws/credentials` file? Simply run:

```sql
CREATE SECRET delta_s1 (
    TYPE s3,
    PROVIDER credential_chain
)
SELECT * FROM delta_scan('s3://⟨some-bucket⟩/⟨path/to/a/delta/table⟩');
```

Do you prefer remembering your AWS tokens by heart, and would like to type them out? Go with:

```sql
CREATE SECRET delta_s2 (
    TYPE s3,
    KEY_ID '⟨AKIAIOSFODNN7EXAMPLE⟩',
    SECRET '⟨wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY⟩',
    REGION '⟨eu-west-1⟩'
)
SELECT * FROM delta_scan('s3://⟨some-bucket⟩/⟨path/to/a/delta/table⟩');
```

Do you have multiple Delta tables, with different credentials? No problem, you can use scoped secrets:

```sql
CREATE SECRET delta_s3 (
    TYPE s3,
    KEY_ID '⟨AKIAIOSFODNN7EXAMPLE1⟩',
    SECRET '⟨wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY1⟩',
    REGION '⟨eu-west-1⟩',
    SCOPE 's3://⟨some-bucket-1⟩'
)
CREATE SECRET delta_s4 (
    TYPE s3,
    KEY_ID '⟨AKIAIOSFODNN7EXAMPLE2⟩',
    SECRET '⟨wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY2⟩',
    REGION '⟨us-west-1⟩',
    SCOPE 's3://⟨some-bucket-2⟩'
)
SELECT * FROM delta_scan('s3://⟨some-bucket-1⟩/⟨table1⟩');
SELECT * FROM delta_scan('s3://⟨some-bucket-2⟩/⟨table2⟩');
```

Finally, is your table public, but outside the default AWS region? Make sure you set the region using an empty S3 Secret:

```sql
CREATE SECRET delta_s5 (
    TYPE s3,
    REGION '⟨eu-west-2⟩'
)
SELECT * FROM delta_scan('s3://⟨some-public-bucket⟩/⟨table1⟩');
```

#### Current State of the Delta Extension

Currently, the Delta Extension is still considered **experimental**. This is partly because the Delta extension itself is still very new,
but also because the `delta-kernel-rs` project it relies on is still experimental. Nevertheless, core Delta scanning features are
already supported by the current version of the Delta extension, such as:

- All data types
- Filter and projection pushdown
- File skipping based on filter pushdown
- Deletion vectors
- Partitioned tables
- Fully parallel scanning

The Delta extension is available on the platforms `linux_amd64`, `linux_arm64`, `osx_amd64` and `osx_arm64`. Support for the remaining platforms is coming soon. Additionally, we will continue to work together with Databricks on further improving the Delta Extension to add more features such as:

- Write support
- Column mapping
- Time travel
- Variant, RowIds
- Wasm support

For details and info on newly added features, keep an eye on the Delta extension [docs](#docs:lts:core_extensions:delta) and [repository](https://github.com/duckdb/duckdb-delta).

#### Conclusion

In this blog post, we presented DuckDB's new Delta extension, enabling easy interaction with
Delta Lake directly from the comfort of your own DuckDB environment. To do so, we demonstrated what the Delta
Lake format looks like by creating a Delta table and analyzing it using DuckDB.

We want to emphasize the fact that by implementing the Delta extension with the [`delta-kernel-rs`](https://github.com/delta-incubator/delta-kernel-rs) library, both DuckDB and
the Delta extension have been kept relatively simple and largely agnostic to the internals of the Delta protocol.

We hope you give the [Delta extension](#docs:lts:core_extensions:delta) a try and look forward to any feedback from the community! Also, if you're attending the
[2024 Databricks Data + AI Summit](https://www.databricks.com/dataaisummit) be sure to check out DuckDB co-founder [Hannes Mühleisen's](https://hannes.muehleisen.org/) talk on Thursday during
the keynote and the in-depth [breakout session](https://www.databricks.com/dataaisummit/session/delta-lake-meets-duckdb-delta-kernel), also on Thursday, for more details on the DuckDB–Delta integration.

## Command Line Data Processing: Using DuckDB as a Unix Tool

**Publication date:** 2024-06-20

**Author:** Gábor Szárnyas

**TL;DR:** DuckDB's CLI client is portable to many platforms and architectures. It handles CSV files conveniently and offers users the same rich SQL syntax everywhere. These characteristics make DuckDB an ideal tool to complement traditional Unix tools for data processing in the command line.

In this blog post, we dive into the terminal to compare DuckDB with traditional tools used in Unix shells (Bash, Zsh, etc.).
We solve several problems requiring operations such as projection and filtering to demonstrate the differences between using SQL queries in DuckDB versus specialized command line tools.
In the process, we will show off some cool features such as DuckDB's [powerful CSV reader](#docs:lts:data:csv:overview) and the [positional join operator](#::duckdb-positional-join).
Let's get started!

#### The Unix Philosophy

To set the stage, let's recall the [Unix philosophy](https://en.wikipedia.org/wiki/Unix_philosophy). This states that programs should:

* do one thing and do it well,
* work together, and
* handle text streams.

Unix-like systems such as macOS, Linux and [WSL in Windows](https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux) have embraced this philosophy.
Tools such as
[`grep`](https://man7.org/linux/man-pages/man1/grep.1.html),
[`sed`](https://man7.org/linux/man-pages/man1/sed.1.html), and
[`sort`](https://man7.org/linux/man-pages/man1/sort.1.html)
are ubiquitous and widely used in [shell scripts](https://en.wikipedia.org/wiki/Shell_script).

As a purpose-built data processing tool, DuckDB fits the Unix philosophy quite well.
First, it was designed to be a fast in-process analytical SQL database system _(do one thing and do it well)._
Second, it has a standalone [command line client](#docs:lts:clients:cli:overview), which can consume and produce CSV files _(work together),_
and also supports reading and writing text streams _(handle text streams)_.
Thanks to these, DuckDB works well in the ecosystem of Unix CLI tools, as
shown
[in](https://x.com/jooon/status/1781401858411565473)
[several](https://www.pgrs.net/2024/03/21/duckdb-as-the-new-jq/)
[posts](https://x.com/MarginaliaNu/status/1701532341225583044).

#### Portability and Usability

While Unix CLI tools are fast, robust, and available on all major platforms, they often have cumbersome syntax that's difficult to remember.
To make matters worse, these tools often come with slight differences between systems – think of the [differences between GNU `sed` and macOS's `sed`](https://unix.stackexchange.com/a/131940/315847) or the differences between regex syntax among programs, which is aptly captured by Donald Knuth's quip [_“I define Unix as 30 definitions of regular expressions living under one roof.”_](https://en.wikiquote.org/wiki/Donald_Knuth#Quotes)

While there are shells specialized specifically for dataframe processing, such as the [Nushell project](https://github.com/nushell/nushell), older Unix shells (e.g., the Bourne shell `sh` and Bash) are still the most wide-spread, especially on servers.

At the same time, we have DuckDB, an extremely portable database system which uses the same SQL syntax on all platforms.
With [version 1.0.0 released recently](https://duckdb.org/2024/06/03/announcing-duckdb-100), DuckDB's syntax – based on the proven and widely used PostgreSQL dialect – is now in a stable state.
Another attractive feature of DuckDB is that it offers an interactive shell, which aids quick debugging. Moreover, DuckDB is available in [several host languages](#docs:lts:clients:overview) as well as in the browser [via WebAssembly](https://shell.duckdb.org/), so if you ever decide to use your SQL scripts outside of the shell, DuckDB SQL scripts can be ported to a wide variety of environments without any changes.

#### Data Processing with Unix Tools and DuckDB

In the following, we give examples for implementing simple data processing tasks using the CLI tools provided in most Unix shells and using DuckDB SQL queries.
We use DuckDB v1.0.0 and run it in [in-memory mode](#docs:lts:connect:overview::in-memory-database).
This mode makes sense for the problems we are tackling, as we do not create any tables and the operations are not memory-intensive, so there is no data to persist or to spill on disk.

##### Datasets

We use the four input files capturing information on cities and airports in the Netherlands.

<details markdown='1'>
<summary markdown='span'>
    [`pop.csv`](https://duckdb.org/data/cli/pop.csv), the population of each of the top-10 most populous cities.
</summary>
```csv
city,province,population
Amsterdam,North Holland,905234
Rotterdam,South Holland,656050
The Hague,South Holland,552995
Utrecht,Utrecht,361924
Eindhoven,North Brabant,238478
Groningen,Groningen,234649
Tilburg,North Brabant,224702
Almere,Flevoland,218096
Breda,North Brabant,184716
Nijmegen,Gelderland,179073
```
</details>

<details markdown='1'>
<summary markdown='span'>
    [`area.csv`](https://duckdb.org/data/cli/area.csv), the area of each of the top-10 most populous cities.
</summary>
```csv
city,area
Amsterdam,219.32
Rotterdam,324.14
The Hague,98.13
Utrecht,99.21
Eindhoven,88.92
Groningen,197.96
Tilburg,118.13
Almere,248.77
Breda,128.68
Nijmegen,57.63
```
</details>

<details markdown='1'>
<summary markdown='span'>
    [`cities-airports.csv`](https://duckdb.org/data/cli/cities-airports.csv), the [IATA codes](https://en.wikipedia.org/wiki/IATA_airport_code) of civilian airports serving given cities.
</summary>
```csv
city,IATA
Amsterdam,AMS
Haarlemmermeer,AMS
Eindhoven,EIN
Groningen,GRQ
Eelde,GRQ
Maastricht,MST
Beek,MST
Rotterdam,RTM
The Hague,RTM
```
</details>

<details markdown='1'>
<summary markdown='span'>
    [`airport-names.csv`](https://duckdb.org/data/cli/airport-names.csv), the airport names belonging to given IATA codes.
</summary>
```csv
IATA,airport name
AMS,Amsterdam Airport Schiphol
EIN,Eindhoven Airport
GRQ,Groningen Airport Eelde
MST,Maastricht Aachen Airport
RTM,Rotterdam The Hague Airport
```
</details>

You can download all input files as a [single zip file](https://duckdb.org/data/cli/duckdb-cli-data.zip).

##### Projecting Columns

Projecting columns is a very common data processing step. Let's take the `pop.csv` file and project the first and last columns, `city` and `population`.

###### Unix Shell: `cut`

In the Unix shell, we use the [`cut` command](https://man7.org/linux/man-pages/man1/cut.1.html) and specify the file's delimiter (` -d`) and the columns to be projected (` -f`).

```bash
cut -d , -f 1,3 pop.csv
```

This produces the following output:

```csv
city,population
Amsterdam,905234
Rotterdam,656050
The Hague,552995
Utrecht,361924
Eindhoven,238478
Groningen,234649
Tilburg,224702
Almere,218096
Breda,184716
Nijmegen,179073
```

###### DuckDB: `SELECT`

In DuckDB, we can use the CSV reader to load the data, then use the `SELECT` clause with column indexes (` #i`) to designate the columns to be projected:

```sql
SELECT #1, #3 FROM 'pop.csv';
```

Note that we did not have to define any schema or load the data to a table.
Instead, we simply used `'pop.csv'` in the `FROM` clause as we would do with a regular table.
DuckDB detects that this is a CSV file and invokes the [`read_csv` function](#docs:lts:data:csv:overview::csv-functions), which automatically infers the CSV file's dialect (delimiter, presence of quotes, etc.) as well as the schema of the table.
This allows us to simply project columns using `SELECT #1, #3`.
We could also use the more readable syntax `SELECT city, population`.

To make the output of the solutions using Unix tools and DuckDB equivalent, we wrap the query into a [`COPY ... TO` statement](#docs:lts:sql:statements:copy::copy--to):

```sql
COPY (
    SELECT #1, #3 FROM 'pop.csv'
  ) TO '/dev/stdout/';
```

This query produces the same result as the Unix command's output shown [above](#::unix-shell-cut).

To turn this into a standalone CLI command, we can invoke the DuckDB command line client with the `-c ⟨query⟩`{:.language-sql .highlight} argument, which runs the SQL query and exits once it's finished.
Using this technique, the query above can be turned into the following one-liner:

```bash
duckdb -c "COPY (SELECT #1, #3 FROM 'pop.csv') TO '/dev/stdout/'"
```

In the following, we'll omit the code blocks using the standalone `duckdb`{:.language-sql .highlight} command: all solutions can be executed in the `duckdb -c ⟨query⟩`{:.language-sql .highlight} template and yield the same result as the solutions using Unix tools.

##### Sorting Files

Another common task is to sort files based on given columns.
Let's rank the cities within provinces based on their populations.
To do so, we need to sort the `pop.csv` file first based on the name of the `province` using an ascending order, then on the `population` using a descending order.
We then return the `province` column first, followed by the `city` and the `population` columns.

###### Unix Shell: `sort`

In the Unix shell, we rely on the [`sort`](https://man7.org/linux/man-pages/man1/sort.1.html) tool.
We specify the CSV file's separator with the `-t` argument and set the keys to sort on using `-k` arguments.
We first sort on the second column (` province`) with `-k 2,2`.
Then, we sort on the third column (` population`), setting the ordering to be reversed (` r`) and numeric (` n`) with `-k 3rn`.
Note that we need to handle the header of the file separately: we take the first row with `head -n 1` and the rest of the rows with `tail -n +2`, sort the latter, and glue them back together with the header.
Finally, we perform a projection to reorder the columns.
Unfortunately, the [`cut` command cannot reorder the columns](https://stackoverflow.com/questions/2129123/rearrange-columns-using-cut), so we use [`awk`](https://man7.org/linux/man-pages/man1/awk.1p.html) instead:

```bash
(head -n 1 pop.csv; tail -n +2 pop.csv \
    | sort -t , -k 2,2 -k 3rn) \
    | awk -F , '{ print $2 "," $1 "," $3 }'
```

The result is the following:

```csv
province,city,population
Flevoland,Almere,218096
Gelderland,Nijmegen,179073
Groningen,Groningen,234649
North Brabant,Eindhoven,238478
North Brabant,Tilburg,224702
North Brabant,Breda,184716
North Holland,Amsterdam,905234
South Holland,Rotterdam,656050
South Holland,The Hague,552995
Utrecht,Utrecht,361924
```

###### DuckDB: `ORDER BY`

In DuckDB, we simply load the CSV and specify the column ordering via `SELECT province, city, population`, then set the sorting criteria on the selected columns (` province ASC` and `population DESC`).
The CSV reader automatically detects types, so the sorting is numeric by default. Finally, we surround the query with a `COPY` statement to print the results to the standard output.

```sql
COPY (
    SELECT province, city, population
    FROM 'pop.csv'
    ORDER BY province ASC, population DESC
  ) TO '/dev/stdout/';
```

##### Intersecting Columns

A common task is to calculate the intersection of two columns, i.e., to find entities that are present in both.
Let's find the cities that are both in the top-10 most populous cities and have their own airports.

###### Unix Shell: `comm`

The Unix solution for intersection uses the [`comm` tool](https://linux.die.net/man/1/comm), intended to compare two _sorted_ files line-by-line.
We first `cut` the relevant column from both files.
Due to the sorting requirement, we apply `sort` on both inputs before performing the intersection.
The intersection is performed using `comm -12` where the argument `-12` means that we only want to keep lines that are in both files.
We again rely on `head` and `tail` to treat the headers and the rest of the files separately during processing and glue them together at the end.

```bash
head -n 1 pop.csv | cut -d , -f 1; \
    comm -12 \
        <(tail -n +2 pop.csv | cut -d , -f 1 | sort) \
        <(tail -n +2 cities-airports.csv | cut -d , -f 1 | sort) 
```

The script produces the following output:

```csv
city
Amsterdam
Eindhoven
Groningen
Rotterdam
The Hague
```

###### DuckDB: `INTERSECT ALL`

The DuckDB solution reads the CSV files, projects the `city` fields and applies the [`INTERSECT ALL` clause](#docs:lts:sql:query_syntax:setops::intersect-all-bag-semantics) to calculate the intersection:

```sql
COPY (
    SELECT city FROM 'pop.csv'
    INTERSECT ALL
    SELECT city FROM 'cities-airports.csv'
  ) TO '/dev/stdout/';
```

##### Pasting Rows Together

Pasting rows together line-by-line is a recurring task.
In our example, we know that the `pop.csv` and the `area.csv` files have an equal number of rows, so we can produce a single file that contains both the population and the area of every city in the dataset.

###### Unix Shell: `paste`

In the Unix shell, we use the [`paste`](https://man7.org/linux/man-pages/man1/paste.1.html) command and remove the duplicate `city` field using `cut`:

```bash
paste -d , pop.csv area.csv | cut -d , -f 1,2,3,5
```

The output is the following:

```csv
city,province,population,area
Amsterdam,North Holland,905234,219.32
Rotterdam,South Holland,656050,324.14
The Hague,South Holland,552995,98.13
Utrecht,Utrecht,361924,99.21
Eindhoven,North Brabant,238478,88.92
Groningen,Groningen,234649,197.96
Tilburg,North Brabant,224702,118.13
Almere,Flevoland,218096,248.77
Breda,North Brabant,184716,128.68
Nijmegen,Gelderland,179073,57.63
```

###### DuckDB: `POSITIONAL JOIN`

In DuckDB, we can use a [`POSITIONAL JOIN`](#docs:lts:sql:query_syntax:from::positional-joins).
This join type is one of DuckDB's [SQL extensions](#docs:lts:sql:dialect:friendly_sql) and it provides a concise syntax to combine tables row-by-row based on each row's position in the table.
Joining the two tables together using `POSITIONAL JOIN` results in two `city` columns – we use the [`EXCLUDE` clause](#docs:lts:sql:expressions:star::exclude-clause) to remove the duplicate column:

```sql
COPY (
    SELECT pop.*, area.* EXCLUDE city
    FROM 'pop.csv'
    POSITIONAL JOIN 'area.csv'
  ) TO '/dev/stdout/';
```

##### Filtering

Filtering is another very common operation. For this, we'll use [`cities-airports.csv` file](https://duckdb.org/data/cli/cities-airports.csv).
For each airport, this file contains its `IATA` code and the main cities that it serves:

```csv
city,IATA
Amsterdam,AMS
Haarlemmermeer,AMS
Eindhoven,EIN
...
```

Let's try to formulate two queries:

1. Find all cities whose name ends in `dam`.

2. Find all airports whose IATA code is equivalent to the first three letters of a served city's name, but the city's name does _not_ end in `dam`.

###### Unix Shell: `grep`

To answer the first question in the Unix shell, we use `grep` and the regular expression `^[^,]*dam,`:

```bash
grep "^[^,]*dam," cities-airports.csv
```

In this expression, `^` denotes the start of the line, `[^,]*` searches for a string that does not contain the comma character (the separator).
The expression `dam,` ensures that the end of the string in the first field is `dam`.
The output is:

```csv
Amsterdam,AMS
Rotterdam,RTM
```

Let's try to answer the second question. For this, we need to match the first three characters in the `city` field to the `IATA` field but we need to do so in a case-insensitive manner.
We also need to use a negative condition to exclude the lines where the city's name ends in `dam`.
Both of these requirements are difficult to achieve with a single `grep` or `egrep` command as they lack support for two features.
First, they do not support case-insensitive matching _using a backreference_ (` grep -i` alone is not sufficient to ensure this).
Second, they do not support [negative lookbehinds](https://www.regular-expressions.info/lookaround.html).
Therefore, we use [`pcregrep`](https://man7.org/linux/man-pages/man1/pcregrep.1.html), and formulate our question as follows:

```bash
pcregrep -i '^([a-z]{3}).*?(?<!dam),\1$' cities-airports.csv
```

Here, we call `pcregrep` with the case-insensitive flag (` -i`), which in `pcregrep` also affects backreferences such as `\1`.
We capture the first three letters with `([a-z]{3})` (e.g., `Ams`) and match it to the second field with the backreference: `,\1$`.
We use a non-greedy `.*?` to seek to the end of the first field, then apply a negative lookbehind with the `(?<!dam)` expression to ensure that the field does not end in `dam`.
The result is a single line:

```csv
Eindhoven,EIN
```

###### DuckDB: `WHERE ... LIKE`

Let's answer the questions now in DuckDB.
To answer the first question, we can use [`LIKE` for pattern matching](#docs:lts:sql:functions:pattern_matching).
The header should not be part of the output, so we disable it with `HEADER false`.
The complete query looks like follows:

```sql
COPY (
    FROM 'cities-airports.csv'
    WHERE city LIKE '%dam'
  ) TO '/dev/stdout/' (HEADER false);
```

For the second question, we use [string slicing](#docs:lts:sql:functions:text::stringbeginend) to extract the first three characters, [`upper`](#docs:lts:sql:functions:text::upperstring) to ensure case-insensitivity, and `NOT LIKE` for the negative condition:

```sql
COPY (
    FROM 'cities-airports.csv'
    WHERE upper(city[1:3]) = IATA
      AND city NOT LIKE '%dam'
  ) TO '/dev/stdout/' (HEADER false);
```

These queries return exactly the same results as the solutions using `grep` and `pcregrep`.

In both of these queries, we used the [`FROM`-first syntax](#docs:lts:sql:query_syntax:from::from-first-syntax).
If the `SELECT` clause is omitted, the query is executed as if `SELECT *` was used, i.e., it returns all columns.

##### Joining Files

Joining tables is an essential task in data processing. Our next example is going to use a join to return city name–airport name combinations.
This is achieved by joining the `cities-airports.csv` and the `airport-names.csv` files on their IATA code fields.

###### Unix Shell: `join`

Unix tools support joining files via the [`join` command](https://man7.org/linux/man-pages/man1/join.1.html), which joins lines of two _sorted_ inputs on a common field.
To make this work, we sort the files based on their `IATA` fields, then perform the join on the first file's 2nd column (` -1 2`) and the second file's 1st column (` -2 1`).
We have to omit the header for the `join` command to work, so we do just that and construct a new header with an `echo` command:

```bash
echo "IATA,city,airport name"; \
    join -t , -1 2 -2 1 \
        <(tail -n +2 cities-airports.csv | sort -t , -k 2,2) \
        <(tail -n +2 airport-names.csv   | sort -t , -k 1,1)
```

The result is the following:

```csv
IATA,city,airport name
AMS,Amsterdam,Amsterdam Airport Schiphol
AMS,Haarlemmermeer,Amsterdam Airport Schiphol
EIN,Eindhoven,Eindhoven Airport
GRQ,Eelde,Groningen Airport Eelde
GRQ,Groningen,Groningen Airport Eelde
MST,Beek,Maastricht Aachen Airport
MST,Maastricht,Maastricht Aachen Airport
RTM,Rotterdam,Rotterdam The Hague Airport
RTM,The Hague,Rotterdam The Hague Airport
```

###### DuckDB

In DuckDB, we load the CSV files and connect them using the [`NATURAL JOIN` clause](#docs:lts:sql:query_syntax:from::natural-joins), which joins on column(s) with the same name.
To ensure that the result matches with that of the Unix solution, we use the [`ORDER BY ALL` clause](#docs:lts:sql:query_syntax:orderby::order-by-all), which sorts the result on all columns, starting from the first one, and stepping through them for tie-breaking to the last column.

```sql
COPY (
    SELECT "IATA", "city", "airport name"
    FROM 'cities-airports.csv'
    NATURAL JOIN 'airport-names.csv'
    ORDER BY ALL
  ) TO '/dev/stdout/';
```

##### Replacing Strings

You may have noticed that we are using very clean datasets. This is of course very unrealistic, so in an evil twist, let's reduce the data quality a bit:

* Replace the space in the province's name with an underscore, e.g., turning `North Holland` to `North_Holland`.
* Add thousand separating commas, e.g., turning `905234` to `905,234`.
* Change the CSV's separator to the semicolon character (` ;`).

And while we're at it, also fetch the data set via HTTPS this time, using the URL [`https://duckdb.org/data/cli/pop.csv`](https://duckdb.org/data/cli/pop.csv).

###### Unix Shell: `curl` and `sed`

In Unix, remote data sets are typically fetched via [`curl`](https://man7.org/linux/man-pages/man1/curl.1.html).
The output of `curl` is piped into the subsequent processing steps, in this case, a bunch of [`sed`](https://man7.org/linux/man-pages/man1/sed.1.html) commands.

```bash
curl -s https://duckdb.org/data/cli/pop.csv \
    | sed 's/\([^,]*,.*\) \(.*,[^,]*\)/\1_\2/g' \
    | sed 's/,/;/g' \
    | sed 's/\([0-9][0-9][0-9]\)$/,\1/'
```

This results in the following output:

```csv
city;province;population
Amsterdam;North_Holland;905,234
Rotterdam;South_Holland;656,050
The Hague;South_Holland;552,995
Utrecht;Utrecht;361,924
Eindhoven;North_Brabant;238,478
Groningen;Groningen;234,649
Tilburg;North_Brabant;224,702
Almere;Flevoland;218,096
Breda;North_Brabant;184,716
Nijmegen;Gelderland;179,073
```

###### DuckDB: `httpfs` and `regexp_replace`

In DuckDB, we use the following query:

```sql
COPY (
    SELECT
        city,
        replace(province, ' ', '_') AS province,
        regexp_replace(population::VARCHAR, '([0-9][0-9][0-9])$', ',\1')
            AS population
    FROM 'https://duckdb.org/data/cli/pop.csv'
  ) TO '/dev/stdout/' (DELIMITER ';');
```

Note that the `FROM` clause now has an HTTPS URL instead of a simple CSV file.
The presence of the `https://` prefix triggers DuckDB to load the [`httpfs` extension](#docs:lts:core_extensions:httpfs:overview) and use it to fetch the JSON document.
We use the [`replace` function](#docs:lts:sql:functions:text::replacestring-source-target) to substitute the spaces with underscores,
and the [`regexp_replace` function](#docs:lts:sql:functions:text::regexp_replacestring-pattern-replacement) for the replacement using a regular expression.
(We could have also used string formatting functions such as [`format`](#docs:lts:sql:functions:text::fmt-syntax) and [`printf`](#docs:lts:sql:functions:text::printf-syntax)).
To change the separator to a semicolon, we serialize the file using the `COPY` statement with the `DELIMITER ';'` option.

##### Reading JSON

As a final exercise, let's query the number of stars given to the [`duckdb/duckdb` repository on GitHub](https://github.com/duckdb/duckdb).

###### Unix Shell: `curl` and `jq`

In Unix tools, we can use `curl` to get the JSON file from `https://api.github.com` and pipe its output to [`jq`](https://jqlang.github.io/jq/manual/) to query the JSON object.

```bash
curl -s https://api.github.com/repos/duckdb/duckdb \
    | jq ".stargazers_count"
```

###### DuckDB: `read_json`

In DuckDB, we use the [`read_json` function](#docs:lts:data:json:overview), invoking it with the remote HTTPS endpoint's URL.
The schema of the JSON file is detected automatically, so we can simply use `SELECT` to return the required field.

```sql
SELECT stargazers_count
  FROM read_json('https://api.github.com/repos/duckdb/duckdb');
```

###### Output

Both of these commands return the current number of stars of the repository.

#### Performance

At this point, you might be wondering about the performance of the DuckDB solutions.
After all, all of our prior examples have only consisted of a few lines, so benchmarking them against each other will not result in any measurable performance differences.
So, let's switch to the Dutch railway services dataset that we used in a [previous blog post](https://duckdb.org/2024/05/31/analyzing-railway-traffic-in-the-netherlands) and formulate a different problem.

We'll use the [2023 railway services file (` services-2023.csv.gz`)](https://blobs.duckdb.org/nl-railway/services-2023.csv.gz) and count the number of Intercity services that operated in that year.

In Unix, we can use the [`gzcat`](https://man7.org/linux/man-pages/man1/zcat.1p.html) command to decompress the `csv.gz` file into a pipeline. Then, we can use `grep` or `pcregrep` (which is more performant), and top it off with the [`wc`](https://man7.org/linux/man-pages/man1/wc.1.html) command to count the number of lines (` -l`).
In DuckDB, the built-in CSV reader also supports [compressed CSV files](#docs:lts:data:csv:overview::parameters), so we can use that without any extra configuration.

```bash
gzcat services-2023.csv.gz | grep '^[^,]*,[^,]*,Intercity,' | wc -l
gzcat services-2023.csv.gz | pcregrep '^[^,]*,[^,]*,Intercity,' | wc -l
duckdb -c "SELECT count(*) FROM 'services-2023.csv.gz' WHERE \"Service:Type\" = 'Intercity';"
```

We also test the tools on uncompressed input:

```bash
gunzip -k services-2023.csv.gz
grep '^[^,]*,[^,]*,Intercity,' services-2023.csv | wc -l
pcregrep '^[^,]*,[^,]*,Intercity,' services-2023.csv | wc -l
duckdb -c "SELECT count(*) FROM 'services-2023.csv' WHERE \"Service:Type\" = 'Intercity';"
```

To reduce the noise in the measurements, we used the [`hyperfine`](https://github.com/sharkdp/hyperfine) benchmarking tool and took the mean execution time of 10 runs.
The experiments were carried out on a MacBook Pro with a 12-core M2 Pro CPU and 32 GB RAM, running macOS Sonoma 14.5.
To reproduce them, run the [`grep-vs-duckdb-microbenchmark.sh` script](https://duckdb.org/microbenchmarks/grep-vs-duckdb-microbenchmark.sh).
The following table shows the runtimes of the solutions on both compressed and uncompressed inputs:

| Tool               | Runtime (compressed) | Runtime (uncompressed) |
| ------------------ | -------------------: | ---------------------: |
| grep 2.6.0-FreeBSD |               20.9 s |                 20.5 s |
| pcregrep 8.45      |                3.1 s |                  2.9 s |
| DuckDB 1.0.0       |                4.2 s |                  1.2 s |

The results show that on compressed input, `grep` was the slowest, while DuckDB is slightly edged out by `gzcat`+`pcregrep`, which ran in 3.1 seconds compared to DuckDB's 4.2 seconds.
On uncompressed input, DuckDB can utilize all CPU cores from the get-go (instead of starting with a single-threaded decompression step), allowing it to outperform both `grep` and `pcregrep` by a significant margin: 2.5× faster than `pcregrep` and more than 15× faster than `grep`.

While this example is quite simple, as queries get more complex, there are more opportunities for optimization and larger intermediate dataset may be produced. While both of these can be tackled within a shell script (by manually implementing optimizations and writing the intermediate datasets to disk), these will likely be less efficient than what a DBMS can come up with. Shell scripts implementing complex pipelines can also be very brittle and need to be rethought even for small changes, making the performance advantage of using a database even more significant for more complex problems.

#### Summary

In this post, we used DuckDB as a standalone CLI application, and explored its abilities to complement or substitute existing command line tools (` sort`, `grep`, `comm`, `join`, etc.).
While we obviously like DuckDB a lot and prefer to use it in many cases, we also believe Unix tools have their place:
on most systems, they are already pre-installed and a well-chosen toolchain of Unix commands _can_ be
[fast](https://adamdrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html),
[efficient](https://pesin.space/posts/2019-07-02/),
and portable (thanks to [POSIX-compliance](https://en.wikipedia.org/wiki/POSIX#POSIX-oriented_operating_systems)).
Additionally, they can be very concise for certain problems.
However, to reap their benefits, you will need to learn the syntax and quirks of each tool such as `grep` variants, [`awk`](https://man7.org/linux/man-pages/man1/awk.1p.html)
as well as advanced ones such as [`xargs`](https://man7.org/linux/man-pages/man1/xargs.1.html) and [`parallel`](https://www.gnu.org/software/parallel/).
In the meantime, DuckDB's SQL is easy-to-learn (you likely know quite a bit of it already) and DuckDB handles most of the optimization for you.

If you have a favorite CLI use case for DuckDB, let us know on social media or submit it to [DuckDB snippets](https://duckdbsnippets.com/). Happy hacking!

## 20 000 Stars on GitHub

**Publication date:** 2024-06-22

**Author:** The DuckDB team

DuckDB reached 20&nbsp;000 stars today on [GitHub](https://github.com/duckdb/duckdb).
We would like to thank our amazing community of users and [contributors](https://github.com/duckdb/duckdb/graphs/contributors).

To this day, we continue to be amazed by the adoption of DuckDB.
We hit the previous milestone, [10&nbsp;000 stars](https://duckdb.org/2023/05/12/github-10k-stars) just a little over a year ago.
Since then, the growth of stars has been slowly increasing, until the [release of version 1.0.0 in early June](https://duckdb.org/2024/06/03/announcing-duckdb-100) gave the boost that propelled the star count to 20&nbsp;000.

![](../images/blog/github-20k-stars-duckdb.png)

    <br/>
    (image source: <a href="https://star-history.com/">star-history.com</a>)

#### What Else Happened in June?

The last few weeks since the release were quite eventful:

1. MotherDuck, a DuckDB-based cloud warehouse, just [reached General Availability](https://motherduck.com/blog/announcing-motherduck-general-availability-data-warehousing-with-duckdb/) last week.
    Congratulations to the team on the successful release!

2. We added support to DuckDB for [Delta Lake](https://delta.io/), an open-source lakehouse framework.
    This feature was described in Sam Ansmink's [blog post](https://duckdb.org/2024/06/10/delta) and Hannes Mühleisen's [keynote segment at the DATA+AI summit](https://www.youtube.com/watch?v=wuP6iEYH11E).

    With extensions for both [Delta Lake](#docs:lts:core_extensions:delta) and [Iceberg](#docs:lts:core_extensions:iceberg:overview),
    DuckDB can now read the two most popular data lake formats.

3. We ran a poster campaign for DuckDB in Amsterdam:

    ![](../images/blog/duckdb-poster-campaign-amsterdam.jpg)

        <br/>

4. [DuckDB Labs](https://duckdblabs.com) sponsored the [Hack4Her event](https://hack4her.github.io/), a female-focused student hackathon in the Netherlands. During the DuckDB Challenge of the event, teams built a community-driven app providing safe walking routes in Amsterdam using DuckDB and its [geospatial library](#docs:lts:core_extensions:spatial:overview).

    ![](../images/blog/hack4her-duckdb-amsterdam.jpg)

        <br/>

#### Looking Ahead

There are several interesting events lined up for the summer.

First, two books about DuckDB are expected to be released:

* [**Getting Started with DuckDB**](https://www.packtpub.com/product/getting-started-with-duckdb/9781803241005), authored by Simon Aubury and Ned Letcher, and published by Packt Publishing
* [**DuckDB in Action**](https://www.manning.com/books/duckdb-in-action), authored by Mark Needham, Michael Hunger and Michael Simons, and published by Manning Publications

Second, we are holding our next user community meeting, [DuckCon #5](#_events:2024-08-15-duckcon5) in Seattle on August 15 with the regular "State of the Duck" update as well as three regular talks and several lightning talks.

<a href="{% link _events/2024-08-15-duckcon5.md %}">![](../images/duckcon5-splashscreen.svg)
</a>

Third, we will improve DuckDB's extension ecosystem and streamline the publication process for community extensions.

Finally, we have a series of blog posts lined up for publication.
These will discuss DuckDB's performance over time, the results of the user survey we conducted during the spring, DuckDB's storage format, and many more.
Stay tuned!

We are looking forward to the next part of our journey and, of course, the next 10&nbsp;000 stars on GitHub.

## Benchmarking Ourselves over Time at DuckDB

**Publication date:** 2024-06-26

**Author:** Alex Monahan

**TL;DR:** In the last 3 years, DuckDB has become 3-25× faster and can analyze ~10× larger datasets all on the same hardware.


<script src="/js/plotly-1.58.5.min.js"></script>


<script>
    fetch('/data/perf_over_time_overall_results_by_time.json')
        .then(res => res.json())
        .then(parsed_json => {
            let overall_results_by_time_header = document.getElementById('overall_results_by_time_header');
            parsed_json.layout = {...parsed_json.layout, "title": "Benchmark results over time"};
            Plotly.plot( overall_results_by_time_header, parsed_json.data, parsed_json.layout );
            });
</script>

A big part of DuckDB's focus is on the developer experience of working with data.
However, performance is an important consideration when investigating data management systems.
Fairly comparing data processing systems using benchmarks is [very difficult](https://mytherin.github.io/papers/2018-dbtest.pdf).
Whoever creates the benchmark is likely to know one system better than the rest, influencing benchmark selection, how much time is spent tuning parameters, and more.

Instead, this post focuses on benchmarking *our own* performance over time.

This approach avoids many comparison pitfalls, and also provides several valuable data points to consider when selecting a system.

* **How fast is it improving?**
    Learning a new tool is an investment.
    Picking a vibrant, rapidly improving database ensures your choice pays dividends for years to come.
    Plus, if you haven't experimented with a tool in a while, you can see how much faster it has become since you last checked!

* **What is it especially good at?**
    The choice of benchmark is an indicator of what types of workloads a tool is useful for.
    The higher the variety of analyses in the benchmark, the more broadly useful the tool can be.

* **What scale of data can it handle?**
    Many benchmarks are deliberately smaller than typical workloads.
    This allows the benchmark to complete in a reasonable amount of time when run with many configurations.
    However, an important question to answer when selecting a system is whether the size of your data can be handled within the size of your compute resources.



There are some limitations when looking at the performance of a system over time.
If a feature is brand new, there is no prior performance to compare to!
As a result, this post focuses on fundamental workloads rather than DuckDB's ever-increasing set of integrations with different lakehouse data formats, cloud services, and more.

The code used to run the benchmark also avoids many of DuckDB's [Friendlier SQL](#docs:lts:sql:dialect:friendly_sql) additions, as those have also been added more recently.
(When writing these queries, it felt like going back in time!)

#### Benchmark Design Summary

This post measures DuckDB's performance over time using the [H2O.ai benchmark](https://duckdblabs.github.io/db-benchmark/), plus some new benchmarks added for importing, exporting, and using window functions.
Please see our previous [blog](https://duckdb.org/2023/04/14/h2oai) [posts](https://duckdb.org/2023/11/03/db-benchmark-update) for details on why we believe the H2O.ai benchmark is a good approach! The full details of the benchmark design are in the appendix.

* H2O.ai, plus import/export and window function tests
* Python instead of R
* 5 GB scale for everything, plus 50 GB scale for group bys and joins
* Median of 3 runs
* Using a MacBook Pro M1 with 16 GB RAM
* DuckDB versions 0.2.7 through 1.0.0
    * Nearly 3 years, from 2021-06-14 to 2024-06-03
* Default settings
* Pandas pre-version 0.5.1, Apache Arrow 0.5.1+

#### Overall Benchmark Results

The latest DuckDB can complete one run of the full benchmark suite in under 35 seconds, while version 0.2.7 required nearly 500 seconds for the same task in June 2021.
**That is 14 times faster, in only 3 years!**

##### Performance over Time


<script>
    fetch('/data/perf_over_time_overall_results_by_time.json')
        .then(res => res.json())
        .then(parsed_json => {
            let overall_results_by_time = document.getElementById('overall_results_by_time');
            parsed_json.layout = {...parsed_json.layout, "title": "Benchmark results over time"};
            Plotly.plot( overall_results_by_time, parsed_json.data, parsed_json.layout );
            });
</script>

> **Note.** These graphs are interactive, thanks to [Plotly.js](https://plotly.com/javascript/)!
> Feel free to filter the various series (single click to hide, double click to show only that series) and click-and-drag to zoom in.
> Individual benchmark results are visible on hover.

The above plot shows the median runtime in seconds for all tests.
Due to the variety of uses for window functions, and their relative algorithmic complexity, the 16 window function tests require the most time of any category.


<script>
    fetch('/data/perf_over_time_overall_results_by_time_relative.json')
        .then(res => res.json())
        .then(parsed_json => {
            let overall_results_by_time_relative = document.getElementById('overall_results_by_time_relative');
            parsed_json.layout = {...parsed_json.layout, "title": "Relative benchmark results over time"};
            Plotly.plot( overall_results_by_time_relative, parsed_json.data, parsed_json.layout );
            });
</script>

This plot normalizes performance to the latest version of DuckDB to show relative improvements over time.
If you look at the point in time when you most recently measured DuckDB performance, that number will show you how many times faster DuckDB is now!

A portion of the overall improvement is DuckDB's addition of multi-threading, which became the default in November 2021 with version 0.3.1.
DuckDB also moved to a push-based execution model in that version for additional gains.
Parallel data loading boosted performance in December 2022 with version 0.6.1, as did improvements to the core `JOIN` algorithm.
We will explore other improvements in detail later in the post.

However, we see that all aspects of the system have seen improvements, not just raw query performance!
DuckDB focuses on the entire data analysis workflow, not just aggregate or join performance.
CSV parsing has seen significant gains, import and export have improved significantly, and window functions have improved the most of all.

What was the slight regression from December 2022 to June 2023?
Window functions received additional capabilities and experienced a slight performance degradation in the process.
However, from June 2023 onward we see substantial performance improvement across the board for window functions.
If window functions are filtered out of the chart, we see a smoother trend.

You may also notice that starting with version 0.9 in September 2023, the performance appears to plateau.
What is happening here?
First, don't forget to zoom in!
Over the last year, DuckDB has still improved over 3×!
More recently, the DuckDB Labs team focused on scalability by developing algorithms that support larger-than-memory calculations.
We will see the fruits of those labors in the scale section later on!
In addition, DuckDB focused exclusively on bug fixes in versions 0.10.1, 0.10.2, and 0.10.3 in preparation for an especially robust DuckDB 1.0.
Now that those two major milestones (larger than memory calculations and DuckDB 1.0) have been accomplished, performance improvements will resume!
It is worth noting that the boost from moving to multi-threading will only occur once, but there are still many opportunities moving forward.

##### Performance by Version

We can also recreate the overall plot by version rather than by time.
This demonstrates that DuckDB has been doing more frequent releases recently.
See [DuckDB's release calendar](#release_calendar) for the full version history.


<script>
    fetch('/data/perf_over_time_overall_results_by_version.json')
        .then(res => res.json())
        .then(parsed_json => {
            let overall_results_by_version = document.getElementById('overall_results_by_version');
            parsed_json.layout = {...parsed_json.layout, "title": "Benchmark results by version"};
            Plotly.plot( overall_results_by_version, parsed_json.data, parsed_json.layout );
            });
</script>

If you remember the version that you last tested, you can compare how much faster things are now with 1.0!


<script>
    fetch('/data/perf_over_time_overall_results_by_version_relative.json')
        .then(res => res.json())
        .then(parsed_json => {
            let overall_results_by_version_relative = document.getElementById('overall_results_by_version_relative');
            parsed_json.layout = {...parsed_json.layout, "title": "Relative benchmark results by version"};
            Plotly.plot( overall_results_by_version_relative, parsed_json.data, parsed_json.layout );
            });
</script>

#### Results by Workload

##### CSV Reader


<script>
    fetch('/data/perf_over_time_csv_reader_area.json')
        .then(res => res.json())
        .then(parsed_json => {
            let perf_over_time_csv_reader_area = document.getElementById('perf_over_time_csv_reader_area');
            Plotly.plot( perf_over_time_csv_reader_area, parsed_json.data, parsed_json.layout );
            });
</script>

DuckDB has invested substantially in building a [fast and robust CSV parser](https://duckdb.org/2023/10/27/csv-sniffer).
This is often the first task in a data analysis workload, and it tends to be undervalued and underbenchmarked.
DuckDB has **improved CSV reader performance by nearly 3×**, while adding the ability to handle many more CSV dialects automatically.

##### Group By


<script>
    fetch('/data/perf_over_time_group_by_area.json')
        .then(res => res.json())
        .then(parsed_json => {
            let perf_over_time_group_by_area = document.getElementById('perf_over_time_group_by_area');
            Plotly.plot( perf_over_time_group_by_area, parsed_json.data, parsed_json.layout );
            });
</script>

Group by or aggregation operations are critical steps in OLAP workloads, and have therefore received substantial focus in DuckDB, **improving over 12× in the last 3 years**.

In November 2021, version 0.3.1 enabled multithreaded aggregation by default, providing a significant speedup.

In December 2022, data loads into tables were parallelized with the release of version 0.6.1.
This is another example of improving the entire data workflow, as this group by benchmark actually stressed the insertion performance substantially.
Inserting the results was taking the majority of the time!

Enums were also used in place of strings for categorical columns in version 0.6.1.
This means that DuckDB was able to use integers rather than strings when operating on those columns, further boosting performance.

Despite what appears at first glance to be a performance plateau, zooming in to 2023 and 2024 reveals a ~20% improvement.
In addition, aggregations have received significant attention in the most recent versions to enable larger-than-memory aggregations.
You can see that this was achieved while continuing to improve performance for the smaller-than-memory case.


##### Join


<script>
    fetch('/data/perf_over_time_join_area.json')
        .then(res => res.json())
        .then(parsed_json => {
            let perf_over_time_join_area = document.getElementById('perf_over_time_join_area');
            Plotly.plot( perf_over_time_join_area, parsed_json.data, parsed_json.layout );
            });
</script>

Join operations are another area of focus for analytical databases, and DuckDB in particular.
Join speeds have **improved by 4× in the last 3 years**!

Version 0.6.1 in December 2022 introduced improvements to the out-of-core hash join that actually improved the smaller-than-memory case as well.
Parallel data loading from 0.6.1 also helps in this benchmark as well, as some results are the same size as the input table.

In recent versions, joins have also been upgraded to support larger-than-memory capabilities.
This focus has also benefitted the smaller-than-memory case and has led to the improvements in 0.10, launched in February 2024.

##### Window Functions


<script>
    fetch('/data/perf_over_time_window_area.json')
        .then(res => res.json())
        .then(parsed_json => {
            let perf_over_time_window_area = document.getElementById('perf_over_time_window_area');
            Plotly.plot( perf_over_time_window_area, parsed_json.data, parsed_json.layout );
            });
</script>

Over the time horizon studied, window functions have **improved a dramatic 25×**!

Window function performance was improved substantially with the 0.9.0 release in September 2023.
[14 different performance optimizations contributed](https://github.com/duckdb/duckdb/issues/7809#issuecomment-1679387022).
Aggregate computation was vectorized (with special focus on the [segment tree data structure](https://www.vldb.org/pvldb/vol8/p1058-leis.pdf)).
Work stealing enabled multi-threaded processing and sorting was adapted to run in parallel.
Care was also taken to pre-allocate memory in larger batches.

DuckDB's window functions are also capable of processing larger-than-memory datasets.
We leave benchmarking that feature for future work!

##### Export


<script>
    fetch('/data/perf_over_time_export_area.json')
        .then(res => res.json())
        .then(parsed_json => {
            let perf_over_time_export_area = document.getElementById('perf_over_time_export_area');
            Plotly.plot( perf_over_time_export_area, parsed_json.data, parsed_json.layout );
            });
</script>

Often DuckDB is not the final step in a workflow, so export performance has an impact.
Exports are **10× faster now!**
Until recently, the DuckDB format was not backward compatible, so the recommended long term persistence format was Parquet.
Parquet is also critical to interoperability with many other systems, especially data lakes.
DuckDB works well as a workflow engine, so exporting to other in-memory formats is quite common as well.

In the September 2022 release (version 0.5.1) we see significant improvements driven by switching from Pandas to Apache Arrow as the recommended in-memory export format.
DuckDB's underlying data types share many similarities with Arrow, so data transfer is quite quick.

Parquet export performance has improved by 4–5× over the course of the benchmark, with dramatic improvements in versions 0.8.1 (June 2023) and 0.10.2 (April 2024).
Version 0.8.1 added [parallel Parquet writing](https://github.com/duckdb/duckdb/pull/7375) while continuing to preserve insertion order.

The change driving the improvement in 0.10.2 was more subtle.
When exporting strings with high cardinality, DuckDB decides whether or not to do dictionary compression depending on if it reduces file size.
From 0.10.2 onward, the [compression ratio is tested after a sample of the values are inserted into the dictionary](https://github.com/duckdb/duckdb/pull/11461), rather than after all values are added.
This prevents substantial unnecessary processing for high-cardinality columns where dictionary compression is unhelpful.

###### Exporting Apache Arrow vs. Pandas vs. Parquet


<script>
    fetch('/data/perf_over_time_export_arrow_pandas_parquet.json')
        .then(res => res.json())
        .then(parsed_json => {
            let perf_over_time_export_arrow_pandas_parquet = document.getElementById('perf_over_time_export_arrow_pandas_parquet');
            Plotly.plot( perf_over_time_export_arrow_pandas_parquet, parsed_json.data, parsed_json.layout );
            });
</script>

This plot shows the performance of all three export formats over the entire time horizon (rather than picking the winner between Pandas and Arrow).
It allows us to see at what point Apache Arrow passes Pandas in performance.

Pandas export performance has improved substantially over the course of the benchmark.
However, Apache Arrow has proven to be the more efficient data format, so Arrow is now preferred for in-memory exports.
Interestingly, DuckDB's Parquet export is now so efficient that it is faster to write a persistent Parquet file than it is to write to an in-memory Pandas dataframe!
It is even competitive with Apache Arrow.

##### Scan Other Formats


<script>
    fetch('/data/perf_over_time_scan_other_formats_area.json')
        .then(res => res.json())
        .then(parsed_json => {
            let perf_over_time_scan_other_formats_area = document.getElementById('perf_over_time_scan_other_formats_area');
            Plotly.plot( perf_over_time_scan_other_formats_area, parsed_json.data, parsed_json.layout );
            });
</script>

In some use cases, DuckDB does not need to store the raw data, but instead should simply read and analyze it.
This allows DuckDB to fit seamlessly into other workflows.
This benchmark measures how fast DuckDB can scan and aggregate various data formats.

To enable comparisons over time, we switch from Pandas to Arrow at version 0.5.1 as mentioned.
DuckDB is **over 8× faster in this workload**, and the absolute time required is very short.
DuckDB is a great fit for this type of work!

###### Scanning Apache Arrow vs. Pandas vs. Parquet


<script>
    fetch('/data/perf_over_time_scan_other_formats_arrow_pandas_parquet.json')
        .then(res => res.json())
        .then(parsed_json => {
            let perf_over_time_scan_other_formats_arrow_pandas_parquet = document.getElementById('perf_over_time_scan_other_formats_arrow_pandas_parquet');
            Plotly.plot( perf_over_time_scan_other_formats_arrow_pandas_parquet, parsed_json.data, parsed_json.layout );
            });
</script>

Once again, we examine all three formats over the entire time horizon.

When scanning data, Apache Arrow and Pandas are more comparable in performance.
As a result, while Arrow is clearly preferable for exports, DuckDB will happily read Pandas with similar speed.
However, in this case, the in-memory nature of both Arrow and Pandas allow them to perform 2–3× faster than Parquet.
In absolute terms, the time required to complete this operation is a very small fraction of the benchmark, so other operations should be the deciding factor.

#### Scale tests

Analyzing larger-than-memory data is a superpower for DuckDB, allowing it to be used for substantially larger data analysis tasks than were previously possible.


<script>
    fetch('/data/perf_over_time_scale_by_time.json')
        .then(res => res.json())
        .then(parsed_json => {
            let perf_over_time_scale_by_time = document.getElementById('perf_over_time_scale_by_time');
            Plotly.plot( perf_over_time_scale_by_time, parsed_json.data, parsed_json.layout );
            });
</script>

In version 0.9.0, launched in September 2023, [DuckDB's hash aggregate was enhanced to handle out-of-core (larger than memory) intermediates](https://github.com/duckdb/duckdb/pull/7931).
The details of the algorithm, along with some benchmarks, are available in [this blog post](https://duckdb.org/2024/03/29/external-aggregation).
This allows for DuckDB to aggregate one billion rows of data (50 GB in size) on a MacBook Pro with only 16 GB of RAM, even when the number of unique groups in the group by is large.
This represents at least a 10× improvement in aggregate processing scale over the course of the 3 years of the benchmark.

DuckDB's hash join operator has supported larger-than-memory joins since version 0.6.1 in December 2022.
However, the scale of this benchmark (coupled with the limited RAM of the benchmarking hardware), meant that this benchmark could still not complete successfully.
In version 0.10.0, launched in February 2024, [DuckDB's memory management received a significant upgrade](https://github.com/duckdb/duckdb/pull/10147) to handle multiple concurrent operators all requiring significant memory.
The [0.10.0 release blog post](https://duckdb.org/2024/02/13/announcing-duckdb-0100#temporary-memory-manager) shares additional details about this feature.

As a result, by version 0.10.0 DuckDB was able to handle calculations on data that is significantly larger than memory, even if the intermediate calculations are large in size.
All operators are supported, including sorting, aggregating, joining, and windowing.
Future work can further test the boundaries of what is possible with DuckDB's out-of-core support, including window functions and even larger data sizes.

##### Hardware Capabilities over Time

DuckDB's performance on the same hardware has improved dramatically, and at the same time, the capabilities of hardware are increasing rapidly as well.

![ram-prices](../images/blog/performance_over_time/historical-cost-of-computer-memory-and-storage-memory.png){: width="450" }
![ssd-prices](../images/blog/performance_over_time/historical-cost-of-computer-memory-and-storage-SSDs.png){: width="450" }
Source: [Our World in Data](https://ourworldindata.org/grapher/historical-cost-of-computer-memory-and-storage?yScale=linear&time=2021..latest&facet=metric&uniformYAxis=0)

The price of RAM has declined by 2.2× and the price of SSD storage has decreased by 2.7× from 2021 to 2023 alone.
Thanks to the combination of DuckDB enhancements and hardware prices, the scale of analysis possible on a single node has increased by substantially more than an order of magnitude in just 3 years!

#### Analyzing the Results Yourself

A DuckDB 1.0 database containing the results of these benchmarks is available at <https://blobs.duckdb.org/data/duckdb_perf_over_time.duckdb>.
Any DuckDB client with the `httpfs` extension can read that file.

You can even use the DuckDB Wasm web shell to **[query the file directly from your browser](https://shell.duckdb.org/#queries=v0,ATTACH-'https%3A%2F%2Fblobs.duckdb.org%2Fdata%2Fduckdb_perf_over_time.duckdb'-AS-performance_results~,FROM-performance_results.benchmark_results-SELECT-%22DuckDB-Version%22%2C-sum(%22Time-(seconds)%22)%3A%3ADECIMAL(15%2C2)-as-sum_time-GROUP-BY-%22DuckDB-Version%22-ORDER-BY-any_value(%22Release-Date%22)~)** (with the queries pre-populated and automatically executed!):

```sql
LOAD httpfs;
ATTACH 'https://blobs.duckdb.org/data/duckdb_perf_over_time.duckdb' AS performance_results;
USE performance_results;
```

The file contains two tables: `benchmark_results` and `scale_benchmark_results`.
Please let us know if you uncover any interesting findings!

#### Conclusion

In summary, not only is DuckDB's feature set growing substantially with each release, DuckDB is getting faster very fast!
Overall, performance has **improved by 14× in only 3 years!**

Yet query performance is only part of the story!
The variety of workloads that DuckDB can handle is wide and growing wider thanks to a full-featured SQL dialect, including high performance window functions.
Additionally, critical workloads like data import, CSV parsing, and data export have improved dramatically over time.
The complete developer experience is critical for DuckDB!

Finally, DuckDB now supports larger-than-memory calculations across all operators: sorting, aggregating, joining, and windowing.
The size of problem that you can handle on your current compute resources just got 10× bigger, or more!

If you have made it this far, welcome to the flock! 🦆
[Join us on Discord](https://discord.duckdb.org/), we value your feedback!

#### Appendix

##### Benchmark Design

###### H2O.ai as the Foundation

This post measures DuckDB's performance over time on the H2O.ai benchmark for both joins and group by queries.

The result of each H2O.ai query is written to a table in a persistent DuckDB file.
This does require additional work when compared with an in-memory workflow (especially the burden on the SSD rather than RAM), but improves scalability and is a common approach for larger analyses.

As in the current H2O.ai benchmark, categorical-type columns (` VARCHAR` columns with low cardinality) were converted to the `ENUM` type as a part of the benchmark.
The time for converting into `ENUM` columns was included in the benchmark time, and resulted in a lower total amount of time (so the upfront conversion was worthwhile).
However, the `ENUM` data type was not fully operational in DuckDB until version 0.6.1 (December 2022), so earlier versions skip this step.

###### Python Client

To measure interoperability with other dataframe formats, we have used Python rather than R (used by H2O.ai) for this analysis.
We do continue to use R for the data generation step for consistency with the benchmark.
Python is DuckDB's most popular client, great for data science, and also the author's favorite language for this type of work.

###### Export and Replacement Scans

We now extend this benchmark in several important ways.
In addition to considering raw query performance, we measure import and export performance with several formats: Pandas, Apache Arrow, and Apache Parquet.
The results of both the join and group by benchmarks are exported to each format.

When exporting to dataframes, we measured the performance in both cases.
However, when summarizing the total performance, we chose the best performing format at the time.
This likely mirrors the behavior of performance-sensitive users (as they would likely not write to both formats!).
In version 0.5.1, released September 2022, DuckDB's performance when writing to and reading from the Apache Arrow format surpassed Pandas.
As a result, versions 0.2.7 to 0.4.0 use Pandas, and 0.5.1 onward uses Arrow.

On the import side, replacement scans allow DuckDB to read those same formats without a prior import step.
In the replacement scan benchmark, the data that is scanned is the output of the final H2O.ai group by benchmark query.
At the 5 GB scale it is a 100 million row dataset.
Only one column is read, and a single aggregate is calculated.
This focuses the benchmark on the speed of scanning the data rather than DuckDB's aggregation algorithms or speed of outputting results.
The query used follows the format:

```sql
SELECT 
    sum(v3) AS v3 
FROM ⟨dataframe_or_Parquet_file⟩;
```

###### Window Functions

We also added an entire series of window function benchmarks.
Window functions are a critical workload in real world data analysis scenarios, and can stress test a system in other ways.
DuckDB has implemented state of the art algorithms to quickly process even the most complex window functions.
We use the largest table from the join benchmark as the raw data for these new tests to help with comparability to the rest of the benchmark.

Window function benchmarks are much less common than more traditional joins and aggregations, and we were unable to find a suitable suite off the shelf.
These queries were designed to showcase the variety of uses for window functions, but there are certainly more that could be added.
We are open to your suggestions for queries to add, and hope these queries could prove useful for other systems!

Since the window functions benchmark is new, the window functions from each of the queries included are shown in the appendix at the end of the post.

###### Workload Size

We test only the middle 5 GB dataset size for the workloads mentioned thus far, primarily because some import and export operations to external formats like Pandas must fit in memory (and we used a MacBook Pro M1 with only 16 GB of RAM).
Additionally, running the tests for 21 DuckDB versions was time-intensive even at that scale, due to the performance of older versions.

###### Scale Tests

Using only 5 GB of data does not answer our second key question: “What scale of data can it handle?”!
We also ran only the group by and join related operations (avoiding in-memory imports and exports) at the 5 GB and the 50 GB scale.
Older versions of DuckDB could not handle the 50 GB dataset when joining or aggregating, but modern versions can handle both, even on a memory-constrained 16 GB RAM laptop.
Instead of measuring performance, we measure the size of the benchmark that was able to complete on a given version.

###### Summary Metrics

With the exception of the scale tests, each benchmark was run 3 times and the median time was used for reporting results.
The scale tests were run once and produced a binary metric, success or failure, at each data size tested.
As older versions would not fail gracefully, the scale metrics were accumulated across multiple partial runs.

###### Computing Resources

All tests use a MacBook Pro M1 with 16 GB of RAM.
In 2024, this is far from state of the art!
If you have more powerful hardware, you will see both improved performance and scalability.

###### DuckDB Versions

Version 0.2.7, published in June 2021, was the first version to include a Python client compiled for ARM64, so it was the first version that could easily run on the benchmarking compute resources.
Version 1.0.0 is the latest available at the time of publication (June 2024), although we also provide a sneak preview of an in-development feature branch.

###### Default Settings

All versions were run with the default settings.
As a result, improvements from a new feature only appear in these results once that feature became the default and was therefore ready for production workloads.

##### Window Functions Benchmark

Each benchmark query follows the format below, but with different sets of window functions in the `⟨window_function(s)⟩`{:.language-sql .highlight} placeholder.
The table in use is the largest table from the H2O.ai join benchmark, and in this case the 5 GB scale was used.

```sql
DROP TABLE IF EXISTS windowing_results;

CREATE TABLE windowing_results AS
    SELECT 
        id1,
        id2,
        id3,
        v2,
        ⟨window_function(s)⟩
    FROM join_benchmark_largest_table;
```

The various window functions that replace the placeholder are below and are labelled to match the result graphs.
These were selected to showcase the variety of use cases for window functions, as well as the variety of algorithms required to support the full range of the syntax.
The DuckDB documentation contains a [full railroad diagram of the available syntax](#docs:lts:sql:functions:window_functions::syntax).
If there are common use cases for window functions that are not well-covered in this benchmark, please let us know!

```sql
/* 302 Basic Window */ 
sum(v2) OVER () AS window_basic

/* 303 Sorted Window */
first(v2) OVER (ORDER BY id3) AS first_order_by,
row_number() OVER (ORDER BY id3) AS row_number_order_by

/* 304 Quantiles Entire Dataset */
quantile_cont(v2, [0, 0.25, 0.50, 0.75, 1]) OVER ()
    AS quantile_entire_dataset

/* 305 PARTITION BY */
sum(v2) OVER (PARTITION BY id1) AS sum_by_id1,
sum(v2) OVER (PARTITION BY id2) AS sum_by_id2,
sum(v2) OVER (PARTITION BY id3) AS sum_by_id3

/* 306 PARTITION BY ORDER BY */
first(v2) OVER
    (PARTITION BY id2 ORDER BY id3) AS first_by_id2_ordered_by_id3

/* 307 Lead and Lag */
first(v2) OVER
    (ORDER BY id3 ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING)
    AS my_lag,
first(v2) OVER
    (ORDER BY id3 ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING)
    AS my_lead

/* 308 Moving Averages */
avg(v2) OVER
    (ORDER BY id3 ROWS BETWEEN 100 PRECEDING AND CURRENT ROW)
    AS my_moving_average,
avg(v2) OVER
    (ORDER BY id3 ROWS BETWEEN id1 PRECEDING AND CURRENT ROW)
    AS my_dynamic_moving_average

/* 309 Rolling Sum */
sum(v2) OVER
    (ORDER BY id3 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
    AS my_rolling_sum

/* 310 RANGE BETWEEN */
sum(v2) OVER
    (ORDER BY v2 RANGE BETWEEN 3 PRECEDING AND CURRENT ROW)
    AS my_range_between,
sum(v2) OVER
    (ORDER BY v2 RANGE BETWEEN id1 PRECEDING AND CURRENT ROW)
    AS my_dynamic_range_between

/* 311 Quantiles PARTITION BY */
quantile_cont(v2, [0, 0.25, 0.50, 0.75, 1])
    OVER (PARTITION BY id2)
    AS my_quantiles_by_id2

/* 312 Quantiles PARTITION BY ROWS BETWEEN */
first(v2) OVER
    (PARTITION BY id2 ORDER BY id3 ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING)
    AS my_lag_by_id2,
first(v2) OVER
    (PARTITION BY id2 ORDER BY id3 ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING)
    AS my_lead_by_id2

/* 313 Moving Averages PARTITION BY */
avg(v2) OVER
    (PARTITION BY id2 ORDER BY id3 ROWS BETWEEN 100 PRECEDING AND CURRENT ROW)
    AS my_moving_average_by_id2,
avg(v2) OVER
    (PARTITION BY id2 ORDER BY id3 ROWS BETWEEN id1 PRECEDING AND CURRENT ROW)
    AS my_dynamic_moving_average_by_id2

/* 314 Rolling Sum PARTITION BY */
sum(v2) OVER
    (PARTITION BY id2 ORDER BY id3 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
    AS my_rolling_sum_by_id2

/* 315 RANGE BETWEEN PARTITION BY */
sum(v2) OVER
    (PARTITION BY id2 ORDER BY v2 RANGE BETWEEN 3 PRECEDING AND CURRENT ROW)
    AS my_range_between_by_id2,
sum(v2) OVER
    (PARTITION BY id2 ORDER BY v2 RANGE BETWEEN id1 PRECEDING AND CURRENT ROW)
    AS my_dynamic_range_between_by_id2

/* 316 Quantiles PARTITION BY ROWS BETWEEN */
quantile_cont(v2, [0, 0.25, 0.50, 0.75, 1]) OVER
    (PARTITION BY id2 ORDER BY id3 ROWS BETWEEN 100 PRECEDING AND CURRENT ROW)
    AS my_quantiles_by_id2_rows_between
```

## DuckDB Community Extensions

**Publication date:** 2024-07-05

**Author:** The DuckDB team

**TL;DR:** DuckDB extensions can now be published as [DuckDB Community Extensions](https://duckdb.org/community_extensions/). The repository makes it easier for users to install extensions using the `INSTALL extension_name FROM community` syntax. Extension developers avoid the burdens of compilation and distribution.

> To browse existing community extensions, visit the [DuckDB Community Extensions documentation page](https://duckdb.org/community_extensions/).

#### DuckDB Extensions

##### Design Philosophy

One of the main design goals of DuckDB is *simplicity*, which – to us – implies that the system should be rather nimble, very light on dependencies, and generally small enough to run on constrained platforms like [WebAssembly](#docs:lts:clients:wasm:overview). This goal is in direct conflict with very reasonable user requests to support advanced features like spatial data analysis, vector indexes, connectivity to various other databases, support for data formats, etc. Baking all those features into a monolithic binary is certainly possible and the route some systems take. But we want to preserve DuckDB’s simplicity. Also, shipping all possible features would be quite excessive for most users because no use cases require *all* extensions at the same time (the “Microsoft Word paradox”, where even power users only use a few features of the system, but the exact set of features vary between users).

To achieve this, DuckDB has a powerful extension mechanism, which allows users to add new functionalities to DuckDB. This mechanism allows for registering new functions, supporting new file formats and compression methods, handling new network protocols, etc. In fact, many of DuckDB’s popular features are implemented as extensions: the [Parquet reader](#docs:lts:data:parquet:overview), the [JSON reader](#docs:lts:data:json:overview), and the [HTTPS/S3 connector](#docs:lts:core_extensions:httpfs:overview) all use the extension mechanism.

##### Using Extensions

Since [version 0.3.2](https://github.com/duckdb/duckdb/releases/tag/v0.3.2), we have already greatly simplified the discovery and installation by hosting them on a centralized extension repository. So, for example, to install the [spatial extension](#docs:lts:core_extensions:spatial:overview), one can just run the following commands using DuckDB’s SQL interface:

```sql
INSTALL spatial; -- once
LOAD spatial;    -- on each use
```

What happens behind the scenes is that DuckDB downloads an extension binary suitable to the current operating system and processor architecture (e.g., macOS on ARM64) and stores it in the `~/.duckdb` folder. On each `LOAD`, this file is loaded into the running DuckDB instance, and things happily continue from there. Of course, for this to work, we compile, sign and host the extensions for a rather large and growing list of processor architecture – operating system combinations. This mechanism is already heavily used, currently, we see around six million extension downloads *each week* with a corresponding data transfer volume of around 40 terabytes!

Until now, publishing third-party extensions has been a *difficult process* which required the extension developer to build the extensions in their repositories for a host of platforms. Moreover, they were unable to sign the extensions using official keys, forcing users to use the `allow_unsigned_extensions` option that disables signature checks which is problematic in itself.

#### DuckDB Community Extensions

Distributing software in a safe way has never been easier, allowing us to reach a wide base of users across pip, conda, cran, npm, brew, etc. We want to provide a similar experience both to users who can easily grab the extension they will want to use, and developers who should not be burdened with distribution details. We are also interested in lowering the bar to package utilities and scripts as a DuckDB extension, empowering users to package useful functionality connected to their area of expertise (or pain points).

We believe that fostering a community extension ecosystem is the next logical step for DuckDB. That’s why we’re very excited about launching our [Community Extensions repository](https://github.com/duckdb/community-extensions/) which was [announced at the Data + AI Summit](https://youtu.be/wuP6iEYH11E?t=275).

For users, this repository allows for easy discovery, installation and maintenance of community extensions directly from the DuckDB SQL prompt. For developers, it greatly streamlines the publication process of extensions. In the following, we’ll discuss how the new extension repository enhances the experiences of these groups.

##### User Experience

We are going to use the [`h3` extension](https://github.com/isaacbrodsky/h3-duckdb) as our example. This extension implements [hierarchical hexagonal indexing](https://github.com/uber/h3) for geospatial data.

Using the DuckDB Community Extensions repository, you can now install and load the `h3` extension as follows:

```sql
INSTALL h3 FROM community;
LOAD h3;
```

Then, you can instantly start using it. Note that the sample data is 500 MB:

```sql
SELECT
    h3_latlng_to_cell(pickup_latitude, pickup_longitude, 9) AS cell_id,
    h3_cell_to_boundary_wkt(cell_id) AS boundary,
    count() AS cnt
FROM read_parquet('https://blobs.duckdb.org/data/yellow_tripdata_2010-01.parquet')
GROUP BY cell_id
HAVING cnt > 10;
```

On load, the extension’s signature is checked, both to ensure platform and versions are compatible, and to verify that the source of the binary is the Community Extensions repository. Extensions are built, signed and distributed for Linux, macOS, Windows, and WebAssembly. This allows extensions to be available to any DuckDB client using version 1.0.0 and upcoming versions.

The `h3` extension’s documentation is available at <https://duckdb.org/community_extensions/extensions/h3>.

##### Developer Experience

From the developer’s perspective, the Community Extensions repository performs the steps required for publishing extensions, including building the extensions for all relevant [platforms](#docs:lts:dev:building:overview::supported-platforms), signing the extension binaries and serving them from the repository.

For the [maintainer of `h3`](https://github.com/isaacbrodsky/), the publication process required performing the following steps:

1. Sending a PR with a metadata file `description.yml` contains the description of the extension:

   ```yaml
   extension:
     name: h3
     description: Hierarchical hexagonal indexing for geospatial data
     version: 1.0.0
     language: C++
     build: cmake
     license: Apache-2.0
     maintainers:
       - isaacbrodsky

   repo:
     github: isaacbrodsky/h3-duckdb
     ref: 3c8a5358e42ab8d11e0253c70f7cc7d37781b2ef
   ```

2. The CI will build and test the extension. The checks performed by the CI are aligned with the [`extension-template` repository](https://github.com/duckdb/extension-template), so iterations can be done independently.

3. Wait for approval from the DuckDB Community Extensions repository’s maintainers and for the build process to complete.

#### Published Extensions

To show that it’s feasible to publish extensions, we reached out to a few developers of key extensions. At the time of the publication of this blog post, the DuckDB Community Extensions repository already contains the following extensions.

| Name                                                                | Description                                                                       |
| ------------------------------------------------------------------- | --------------------------------------------------------------------------------- |
| [crypto](https://github.com/rustyconover/duckdb-crypto-extension)   | Adds cryptographic hash functions and [HMAC](https://en.wikipedia.org/wiki/HMAC). |
| [h3](https://github.com/isaacbrodsky/h3-duckdb)                     | Implements hierarchical hexagonal indexing for geospatial data.                   |
| [lindel](https://github.com/rustyconover/duckdb-lindel-extension)   | Implements linearization/delinearization, Z-Order, Hilbert and Morton curves.     |
| [prql](https://github.com/ywelsch/duckdb-prql)                      | Allows running [PRQL](https://prql-lang.org/) commands directly within DuckDB.    |
| [scrooge](https://github.com/pdet/Scrooge-McDuck)                   | Supports a set of aggregation functions and data scanners for financial data.     |
| [shellfs](https://github.com/rustyconover/duckdb-shellfs-extension) | Allows shell commands to be used for input and output.                            |

DuckDB Labs and the DuckDB Foundation do not vet the code within community extensions and, therefore, cannot guarantee that DuckDB community extensions are safe to use. The loading of community extensions can be explicitly disabled with the following one-way configuration option:

```sql
SET allow_community_extensions = false;
```

For more details, see the documentation’s [Securing DuckDB page](#docs:lts:operations_manual:securing_duckdb:securing_extensions::community-extensions).

#### Summary and Looking Ahead

In this blog post, we introduced the DuckDB Community Extensions repository, which allows easy installation of third-party DuckDB extensions.

We are looking forward to continuously extending this repository. If you have an idea for creating an extension, take a look at the already published extension source codes, which provide good examples of how to package community extensions, and join the `#extensions` channel on our [Discord](https://discord.duckdb.org/).
Once you have an extension, please contribute it via a [pull request](https://github.com/duckdb/community-extensions/pulls).

Finally, we would like to thank the early adopters of DuckDB’s extension mechanism and Community Extensions repository. Thanks for iterating with us and providing feedback to us.

## Memory Management in DuckDB

**Publication date:** 2024-07-09

**Author:** Mark Raasveldt

Memory is an important resource when processing large amounts of data. Memory is a fast caching layer that can provide immense speed-ups to query processing. However, memory is finite and expensive, and when working with large data sets there is generally not enough memory available to keep all necessary data structures cached. Managing memory effectively is critical for a high-performance query engine – as memory must be utilized in order to provide that high performance, but we must be careful so that we do not use excessive memory which can cause out-of-memory errors or can cause the ominous [OOM killer](https://en.wikipedia.org/wiki/Out_of_memory#Recovery) to zap the process out of existence.

DuckDB is built to effectively utilize available memory while avoiding running out of memory:

* The streaming execution engine allows small chunks of data to flow through the system without requiring entire data sets to be materialized in memory.
* Data from intermediates can be spilled to disk temporarily in order to free up space in memory, allowing computation of complex queries that would otherwise exceed the available memory.
* The buffer manager caches as many pages as possible from any attached databases without exceeding the pre-defined memory limits.

In this blog post we will cover these aspects of memory management within DuckDB – and provide examples of where they are utilized.

#### Streaming Execution

DuckDB uses a streaming execution engine to process queries. Data sources, such as tables, CSV files or Parquet files, are never fully materialized in memory. Instead, data is read and processed one chunk at a time. For example, consider the execution of the following query:

```sql
SELECT
    UserAgent,
    count(*)
FROM 'hits.csv'
GROUP BY UserAgent;
```

Instead of reading the entire CSV file at once, DuckDB reads data from the CSV file in pieces, and computes the aggregation incrementally using the data read from those pieces. This happens continuously until the entire CSV file is read, at which point the entire aggregation result is computed. 

![](../images/blog/streamingexecution.png)


In the above example we are only showing a single data stream. In practice, DuckDB uses multiple data streams to enable multi-threaded execution – each thread executes its own data stream. The aggregation results of the different threads are combined to compute the final result.

While streaming execution is conceptually simple, it is powerful, and is sufficient to provide larger-than-memory support for many simple use cases. For example, streaming execution enables larger-than-memory support for:

* Computing aggregations where the total number of groups is small
* Reading data from one file and writing to another (e.g., reading from CSV and writing to Parquet)
* Computing a Top-N over the data (where N is small)

Note that nothing needs to be done to enable streaming execution – DuckDB always processes queries in this manner. 

#### Intermediate Spilling

While streaming execution enables larger-than-memory processing for simple queries, there are many cases where streaming execution alone is not sufficient.

In the previous example, streaming execution enabled larger-than-memory processing because the computed aggregate result was very small – as there are very few unique user agents in comparison to the total number of web requests. As a result, the aggregate hash table would always remain small, and never exceed the amount of available memory.

Streaming execution is not sufficient if the intermediates required to process a query are larger than memory. For example, suppose we group by the source IP in the previous example:

```sql
SELECT
    IPNetworkID,
    count(*)
FROM 'hits.csv'
GROUP BY IPNetworkID;
```

Since there are many more unique source IPs, the hash table we need to maintain is significantly larger. If the size of the aggregate hash table exceeds memory, the streaming execution engine is not sufficient to prevent out-of-memory issues.

Larger-than-memory intermediates can happen in many scenarios, in particular when executing more complex queries. For example, the following scenarios can lead to larger-than-memory intermediates:

* Computing an aggregation with many unique groups
* Computing an exact distinct count of a column with many distinct values
* Joining two tables together that are both larger than memory
* Sorting a larger-than-memory dataset
* Computing a complex window over a larger-than-memory table

DuckDB deals with these scenarios by disk spilling. Larger-than-memory intermediates are (partially) written to disk in the temporary directory when required. While powerful, disk spilling reduces performance – as additional I/O must be performed. For that reason, DuckDB tries to minimize disk spilling. Disk spilling is adaptively used only when the size of the intermediates increases past the memory limit. Even in those scenarios, as much data is kept in memory as possible to maximize performance. The exact way this is done depends on the operators and is detailed in other blog posts
([aggregation](https://duckdb.org/2024/03/29/external-aggregation),
[sorting](https://duckdb.org/2021/08/27/external-sorting)).

The `memory_limit` setting controls how much data DuckDB is allowed to keep in memory. By default, this is set to `80%` of the physical RAM of your system (e.g., if your system has 16 GB RAM, this defaults to 12.8 GB). The memory limit can be changed using the following command:

```sql
SET memory_limit = '4GB';
```

The location of the temporary directory can be chosen using the `temp_directory` setting, and is by default the connected database with a `.tmp` suffix (e.g., `database.db.tmp`), or only `.tmp` if connecting to an in-memory database. The maximum size of the temporary directory can be limited using the `max_temp_directory_size` setting, which defaults to `90%` of the remaining disk space on the drive where the temporary files are stored. These settings can be adjusted as follows:

```sql
SET temp_directory = '/tmp/duckdb_swap';
SET max_temp_directory_size = '100GB';
```

If the memory limit is exceeded and disk spilling cannot be used, either because disk spilling is explicitly disabled, the temporary directory size exceeds the provided limit, or a system limitation means that disk spilling cannot be used for a given query – an out-of-memory error is reported and the query is canceled.

#### Buffer Manager

Another core component of memory management in DuckDB is the buffer manager. The buffer manager is responsible for caching pages from DuckDB's own persistent storage. Conceptually the buffer manager works in a similar fashion to the intermediate spilling. Pages are kept in memory as much as possible, and evicted from memory when space is required for other data structures. The buffer manager abides by the same memory limit as any intermediate data structures. Pages in the buffer manager can be freed up to make space for intermediate data structures, or vice versa.

There are two main differences between the buffer manager and intermediate data structures:

* As the buffer manager caches pages that already exist on disk (in DuckDB's persistent storage) – they do not need to be written to the temporary directory when evicted. Instead, when they are required again, they can be re-read from the attached storage file directly.
* Query intermediates have a natural life-cycle, namely when the query is finished processing the intermediates are no longer required. Pages that are buffer managed from the persistent storage are useful across queries. As such, the pages kept by the buffer manager are kept cached until either the persistent database is closed, or until space must be freed up for other operations.

The performance boost of the buffer manager depends on the speed of the underlying storage medium. When data is stored on a very fast disk, reading data is fast and the speed-up is minimal. When data is stored on a network drive or read over http/S3, reading requires performing network requests, and the speed-up can be very large.

#### Profiling Memory Usage

DuckDB contains a number of tools that can be used to profile memory usage.

The `duckdb_memory()` function can be used to inspect which components of the system are using memory. Memory used by the buffer manager is labeled as `BASE_TABLE`, while query intermediates are divided into separate groups.

```sql
FROM duckdb_memory();
```

```text
┌──────────────────┬────────────────────┬─────────────────────────┐
│       tag        │ memory_usage_bytes │ temporary_storage_bytes │
│     varchar      │       int64        │          int64          │
├──────────────────┼────────────────────┼─────────────────────────┤
│ BASE_TABLE       │          168558592 │                       0 │
│ HASH_TABLE       │                  0 │                       0 │
│ PARQUET_READER   │                  0 │                       0 │
│ CSV_READER       │                  0 │                       0 │
│ ORDER_BY         │                  0 │                       0 │
│ ART_INDEX        │                  0 │                       0 │
│ COLUMN_DATA      │                  0 │                       0 │
│ METADATA         │                  0 │                       0 │
│ OVERFLOW_STRINGS │                  0 │                       0 │
│ IN_MEMORY_TABLE  │                  0 │                       0 │
│ ALLOCATOR        │                  0 │                       0 │
│ EXTENSION        │                  0 │                       0 │
├──────────────────┴────────────────────┴─────────────────────────┤
│ 12 rows                                               3 columns │
└─────────────────────────────────────────────────────────────────┘
```

The `duckdb_temporary_files` function can be used to examine the current contents of the temporary directory.

```sql
FROM duckdb_temporary_files();
```

```text
┌────────────────────────────────┬───────────┐
│              path              │   size    │
│            varchar             │   int64   │
├────────────────────────────────┼───────────┤
│ .tmp/duckdb_temp_storage-0.tmp │ 967049216 │
└────────────────────────────────┴───────────┘
```

#### Conclusion

Memory management is critical for a high-performance analytics engine. DuckDB is built to take advantage of any available memory to speed up query processing, while gracefully dealing with larger-than-memory datasets using intermediate spilling. Memory management is still an active area of development and has [continuously improved across DuckDB versions](https://duckdb.org/2024/06/26/benchmarks-over-time#scale-tests). Amongst others, we are working on improving memory management for complex queries that involve multiple operators with larger-than-memory intermediates.

## Friendly Lists and Their Buddies, the Lambdas

**Publication date:** 2024-08-08

**Authors:** Tania Bogatsch, Maia de Graaf

#### Introduction

Nested data types, such as lists and structs, are widespread in analytics.
Several popular formats, such as Parquet and JSON, support nested types.
Traditionally, working with nested types requires normalizing steps before any analysis.
Then, to return nested results, systems need to (re-)aggregate their data.
Normalization and aggregation are undesirable from both a usability and performance perspective.
To streamline the operation on nested data, analytical systems, including DuckDB, provide native functionality on these nested types.

In this blog post, we'll first cover the basics of [lists](#::lists) and [lambdas](#::lambdas).
Then, we dive into their [technical details](#::zooming-in-list-transformations).
Finally, we'll show some [examples](#::lists-and-lambdas-in-the-community) from the community.
Feel free to skip ahead if you're already familiar with lists and lambdas and are just here for our out-of-the-box examples!

#### Lists

Before jumping into lambdas, let's take a quick detour into DuckDB's [`LIST` type](#docs:lts:sql:data_types:list).
A list contains any number of elements with the same data type.
Below is a table containing two columns, `l` and `n`.
`l` contains lists of integers, and `n` contains integers.

```sql
CREATE OR REPLACE TABLE my_lists (l INTEGER[], n INTEGER);
INSERT INTO my_lists VALUES ([1], 1), ([1, 2, 3], 2), ([-1, NULL, 2], 2);
FROM my_lists;
```

```text
┌───────────────┬───────┐
│       l       │   n   │
│    int32[]    │ int32 │
├───────────────┼───────┤
│ [1]           │     1 │
│ [1, 2, 3]     │     2 │
│ [-1, NULL, 2] │     2 │
└───────────────┴───────┘
```

Internally, all data moves through DuckDB's execution engine in `Vectors`.
For more details on `Vectors` and vectorized execution, please refer to the [documentation](#docs:lts:internals:vector) and respective research papers ([1](https://15721.courses.cs.cmu.edu/spring2016/papers/p5-sompolski.pdf) and [2](https://drive.google.com/file/d/1LJeys01Ho9DREfRJhb9wHu3ssSC22Lll/view)).
In this case, we get two vectors, as depicted below.
This representation is mostly similar to [Arrow's](https://arrow.apache.org) physical list representation.

When examined closely, we can observe that the nested child vector of `l` looks suspiciously similar to the vector `n`.
These nested vector representations enable our execution engine to reuse existing components on nested types.
We'll elaborate more on why this is relevant later.

![](../images/blog/lambda/vectors.png)


#### Lambdas

A **lambda function** is an anonymous function, i.e., a function without a name.
In DuckDB, a lambda function's syntax is `lambda param1, param2, ...: expression`.
The parameters can have any name, and the `expression` can be any SQL expression.

Currently, DuckDB has three scalar functions for working with lambdas:
[`list_transform`](#docs:lts:sql:functions:lambda::list_transformlist-lambda),
[`list_filter`](#docs:lts:sql:functions:lambda::list_filterlist-lambda),
and
[`list_reduce`](#docs:lts:sql:functions:lambda::list_reducelist-lambda),
along with their aliases.
Each accepts a `LIST` as its first argument and a lambda function as its second argument.

Lambdas were the guest star in our [SQL Gymnastics: Bending SQL into Flexible New Shapes](https://duckdb.org/2024/03/01/sql-gymnastics.html#creating-the-macro!) blog post.
This time, we want to put them in the spotlight.

#### Zooming In: List Transformations

To return to our previous example, let's say we want to add `n` to each element of the corresponding list `l`.

##### Pure Relational Solution

Using pure relational operators, i.e., avoiding list-native functions, we would need to perform the following steps:

1. Unnest the lists while keeping the connection to their respective rows.
   We can achieve this by inventing a temporary unique identifier, such as a [`rowid`](#docs:lts:sql:statements:select::row-ids) or a [`UUID`](#docs:lts:sql:data_types:numeric::universally-unique-identifiers-uuids).
2. Transform each element by adding `n`.
3. Using our temporary identifier `rowid`, we can reaggregate the transformed elements by grouping them into lists.

In SQL, it would look like this:

```sql
WITH flattened_tbl AS (
    SELECT unnest(l) AS elements, n, rowid
    FROM my_lists
)
SELECT array_agg(elements + n) AS result
FROM flattened_tbl
GROUP BY rowid
ORDER BY rowid;
```

```text
┌──────────────┐
│    result    │
│   int32[]    │
├──────────────┤
│ [2]          │
│ [3, 4, 5]    │
│ [1, NULL, 4] │
└──────────────┘
```

While the above example is reasonably readable, more complex transformations can become lengthy queries, which are difficult to compose and maintain.
More importantly, this query adds an `unnest` operation and an aggregation (` array_agg`) with a `GROUP BY`.
Adding a `GROUP BY` can be costly, especially for large datasets.

We have to dive into the technical implications to fully understand why the above query yields suboptimal performance.
Internally, the query execution performs the steps depicted in the diagram below.
We can directly emit the child vector for the `unnest` operation, i.e., without copying any data.
For the correlated columns `rowid` and `n`, we use [selection vectors](https://duckdb.org/docs/internals/vector.html), which again prevents the copying of data.
This way, we can fire our expression execution on the child vector, another nested vector, and the expanded vector `n`.

![](../images/blog/lambda/relational.png)


The heavy-hitting operation is the last one, reaggregating the transformed elements into their respective lists.
As we don't propagate the parent vector, we have no information about the resulting element's correlation to the initial lists.
Recreating these lists requires a full copy of the data and partitioning, which impacts performance even with [DuckDB's high-performance aggregation operator](https://duckdb.org/2024/03/29/external-aggregation).

As a consequence, the normalized approach is both cumbersome to write and it is inefficient as it produces a significant (and unnecessary) overhead despite the relative simplicity of the query.
This is yet another example of how shaping nested data into relational forms or [forcing it through rectangles](https://open.substack.com/pub/lloydtabb/p/data-is-rectangular-and-other-limiting?utm_campaign=post&utm_medium=web) can have a significant negative performance impact.

##### Native List Functions

With support for native list functions, DuckDB mitigates these drawbacks by operating directly on the `LIST` data structure.
Since, as we've seen, lists are essentially nested columns, we can reshape these functions into concepts already understood by our execution engine and leverage their full potential.

In the case of transformations, the corresponding list-native function is `list_transform`.
Here is the rewritten query:

```sql
SELECT list_transform(l, lambda x: x + n) AS result
FROM my_lists;
```

Alternatively, with Python's list comprehension syntax:

```sql
SELECT [x + n FOR x IN l] AS result
FROM my_lists;
```

Internally, this query expands all related vectors, which is just `n` in this case.
Just like before, we employ selection vectors to avoid any data copies.
Then, we use the lambda function `x: x + n` to fire our expression execution on the child vector and the expanded vector `n`.
As this is a list-native function, we’re aware of the existence of a parent vector and keep it alive.
So, once we get the result from the transformation, we can completely omit the reaggregation step.

![](../images/blog/lambda/native.png)


To see the efficiency of `list_transform` in action, we executed a simple benchmark.
Firstly, we added 1M rows to our table `my_lists`, each containing five elements.

```sql
INSERT INTO my_lists
    SELECT [r, r % 10, r + 5, r + 11, r % 2], r
    FROM range(1_000_000) AS tbl(r);
```

Then, we ran both our normalized and list-native queries on this data.
Both queries were run in the CLI with DuckDB v1.0.0 on a MacBook Pro 2021 with a M1 Max chip.

| Normalized |  Native |
| ---------: | ------: |
|    0.522 s | 0.027 s |

As we can see, the native query is more than 10× faster. Amazing!
If we look at the execution plan using `EXPLAIN ANALYZE` (not shown in this blog post), we can see that DuckDB spends most of its time in the `HASH_GROUP_BY` and `UNNEST` operators.
In comparison, these operators no longer exist in the list-native query plan.

#### Lists and Lambdas in the Community

To better present what's possible by combining our `LIST` type and lambda functions, we've scoured the community Discord and GitHub, as well as some far corners of the internet, for exciting use cases.

##### `list_transform`

As established earlier, [`list_transform`](#docs:lts:sql:functions:lambda::list_transformlist-lambda) applies a lambda function to each element of the input list and returns a new list with the transformed elements.
Here, one of our [users](https://discord.com/channels/909674491309850675/1032659480539824208/1248004651983573162) implemented a `list_shuffle` function by nesting different `LIST` native functions.

```sql
CREATE OR REPLACE MACRO list_shuffle(l) AS (
    list_select(l, list_grade_up([random() FOR _ IN l]))
);
```

Another [user](https://til.simonwillison.net/duckdb/remote-parquet) investigated querying remote Parquet files using DuckDB.
In their query, they first use `list_transform` to generate a list of URLs for Parquet files.
This is followed by the `read_parquet` function, which reads the Parquet files and calculates the total size of the data.
The query looks like this:

```sql
SELECT
    sum(size) AS size
FROM read_parquet(
    ['https://huggingface.co/datasets/vivym/midjourney-messages/resolve/main/data/' ||
        format('{:06d}', n) || '.parquet'
        FOR n IN generate_series(0, 55)
    ]
);
```

##### `list_filter`

The [`list_filter` function](#docs:lts:sql:functions:lambda::list_filterlist-lambda) filters all elements of the input list for which the lambda function returns `true`.

Here is an example using `list_filter` from a [discussion on our Discord](https://discord.com/channels/909674491309850675/921073327009853451/1235818484047544371) where the user wanted to remove the element at index `idx` from each list.

```sql
CREATE OR REPLACE MACRO remove_idx(l, idx) AS (
    list_filter(l, lambda _, i: i != idx)
);
```

So far, we've primarily focused on showcasing our lambda function support in this blog post.
Yet, there are often many possible paths with SQL and its rich dialects.
We couldn't help but show how we can achieve the same functionality with some of our other native list functions.
In this case, we used [`list_slice`](#docs:lts:sql:functions:list::list_slicelist-begin-end) and [`list_concat`](#docs:lts:sql:functions:list::list_concatlist1-list2).

```sql
CREATE OR REPLACE MACRO remove_idx(l, idx) AS (
    l[:idx - 1] || l[idx + 1:]
);
```

##### `list_reduce`

Most recently, we've added [`list_reduce`](#docs:lts:sql:functions:lambda::list_reducelist-lambda), which applies a lambda function to an accumulator value.
The accumulator is the result of the previous lambda function and is also what the function ultimately returns.

We took the following example from a [discussion on GitHub](https://github.com/duckdb/duckdb/discussions/9752).
The user wanted to use a lambda to validate [BSN numbers](https://www.netherlandsworldwide.nl/bsn), the Dutch equivalent of social security numbers.
A BSN must be 8 or 9 digits, but to limit our scope we'll just focus on BSNs that are 9 digits long.
After multiplying each digit by its index, from 9 down to 2, and the last digit by -1, the sum must be divisible by 11 to be valid.

###### Setup

For our example, we assume that input BSNs are of type `INTEGER[]`.

```sql
CREATE OR REPLACE TABLE bsn_tbl AS
    FROM VALUES
        ([2, 4, 6, 7, 4, 7, 5, 9, 6]),
        ([1, 2, 3, 4, 5, 6, 7, 8, 9]),
        ([7, 6, 7, 4, 4, 5, 2, 1, 1]),
        ([8, 7, 9, 0, 2, 3, 4, 1, 7]),
        ([1, 2, 3, 4, 5, 6, 7, 8, 9, 0])
        tbl(bsn);
```

###### Solution

```sql
CREATE OR REPLACE MACRO valid_bsn(bsn) AS (
    bsn.list_reverse().list_reduce(
        lambda x, y, i: IF (i = 2, -x, x) + y * i
    ) % 11 = 0
    AND len(bsn) = 9
);
```

Using our macro with the example table we get the following result:

```sql
SELECT bsn, valid_bsn(bsn) AS valid
FROM bsn_tbl;
```

```text
┌────────────────────────────────┬─────────┐
│              bsn               │  valid  │
│            int32[]             │ boolean │
├────────────────────────────────┼─────────┤
│ [2, 4, 6, 7, 4, 7, 5, 9, 6]    │ true    │
│ [1, 2, 3, 4, 5, 6, 7, 8, 9]    │ false   │
│ [7, 6, 7, 4, 4, 5, 2, 1, 1]    │ true    │
│ [8, 7, 9, 0, 2, 3, 4, 1, 7]    │ true    │
│ [1, 2, 3, 4, 5, 6, 7, 8, 9, 0] │ false   │
└────────────────────────────────┴─────────┘
```

#### Conclusion

Native nested type support is critical for analytical systems.
As such, DuckDB offers native nested type support and many functions to work with these types directly.
These functions make working with nested types easier and substantially faster.
In this blog post, we looked at the technical details of working with nested types by diving into our `list_transform` function.
Additionally, we highlighted various use cases that we came across in our community.

## DuckDB Tricks – Part 1

**Publication date:** 2024-08-19

**Author:** Gábor Szárnyas

**TL;DR:** We use a simple example data set to present a few tricks that are useful when using DuckDB.

In this blog post, we present five simple DuckDB operations that we found particularly useful for interactive use cases.
The operations are summarized in the following table:

| Operation                                                                 | Snippet                                                                                    |
| ------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------ |
| [Pretty-printing floats](#::pretty-printing-floating-point-numbers)         | `SELECT (10 / 9)::DECIMAL(15, 3)`{:.language-sql .highlight}                               |
| [Copying the schema](#::copying-the-schema-of-a-table)                      | `CREATE TABLE tbl AS FROM example LIMIT 0`{:.language-sql .highlight}                      |
| [Shuffling data](#::shuffling-data)                                         | `FROM example ORDER BY hash(rowid + 42)`{:.language-sql .highlight}                        |
| [Specifying types when reading CSVs](#::specifying-types-in-the-csv-loader) | `FROM read_csv('example.csv', types = {'x': 'DECIMAL(15, 3)'})`{:.language-sql .highlight} |
| [Updating CSV files in-place](#::updating-csv-files-in-place)               | `COPY (SELECT s FROM 'example.csv') TO 'example.csv'`{:.language-sql .highlight}           |

#### Creating the Example Data Set

We start by creating a data set that we'll use in the rest of the blog post. To this end, we define a table, populate it with some data and export it to a CSV file.

```sql
CREATE TABLE example (s STRING, x DOUBLE);
INSERT INTO example VALUES ('foo', 10/9), ('bar', 50/7), ('qux', 9/4);
COPY example TO 'example.csv';
```

Wait a bit, that’s way too verbose! DuckDB’s syntax has several SQL shorthands including the [“friendly SQL” clauses](#docs:lts:sql:dialect:friendly_sql).
Here, we combine the [`VALUES` clause](#docs:lts:sql:query_syntax:values) with the [`FROM`-first syntax](#docs:lts:sql:query_syntax:from::from-first-syntax), which makes the `SELECT` clause optional.
With these, we can compress the data creation script to ~60% of its original size.
The new formulation omits the schema definition and creates the CSV with a single command:

```sql
COPY (FROM VALUES ('foo', 10/9), ('bar', 50/7), ('qux', 9/4) t(s, x))
TO 'example.csv';
```

Regardless of which script we run, the resulting CSV file will look like this:

```csv
s,x
foo,1.1111111111111112
bar,7.142857142857143
qux,2.25
```

Let’s continue with the code snippets and their explanations.

#### Pretty-Printing Floating-Point Numbers

When printing a floating-point number to the output, the fractional parts can be difficult to read and compare. For example, the following query returns three numbers between 1 and 8 but their printed widths are very different due to their fractional parts.

```sql
SELECT x
FROM 'example.csv';
```

```text
┌────────────────────┐
│         x          │
│       double       │
├────────────────────┤
│ 1.1111111111111112 │
│  7.142857142857143 │
│               2.25 │
└────────────────────┘
```

By casting a column to a `DECIMAL` with a fixed number of digits after the decimal point, we can pretty-print it as follows:

```sql
SELECT x::DECIMAL(15, 3) AS x
FROM 'example.csv';
```

```text
┌───────────────┐
│       x       │
│ decimal(15,3) │
├───────────────┤
│         1.111 │
│         7.143 │
│         2.250 │
└───────────────┘
```

A typical alternative solution is to use the [`printf`](#docs:lts:sql:functions:text::printf-syntax) or [`format`](#docs:lts:sql:functions:text::fmt-syntax) functions, e.g.:

```sql
SELECT printf('%.3f', x)
FROM 'example.csv';
```

However, these approaches require us to specify a formatting string that's easy to forget.
What's worse, the statement above returns string values, which makes subsequent operations (e.g., sorting) more difficult.
Therefore, unless keeping the full precision of the floating-point numbers is a concern, casting to `DECIMAL` values should be the preferred solution for most use cases.

#### Copying the Schema of a Table

To copy the schema from a table without copying its data, we can use `LIMIT 0`.

```sql
CREATE TABLE example AS
    FROM 'example.csv';
CREATE TABLE tbl AS
    FROM example
    LIMIT 0;
```

This will result in an empty table with the same schema as the source table:

```sql
DESCRIBE tbl;
```

```text
┌─────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name │ column_type │  null   │   key   │ default │  extra  │
│   varchar   │   varchar   │ varchar │ varchar │ varchar │ varchar │
├─────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤
│ s           │ VARCHAR     │ YES     │         │         │         │
│ x           │ DOUBLE      │ YES     │         │         │         │
└─────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┘
```

Alternatively, in the CLI client, we can run the `.schema` [dot command](#docs:lts:clients:cli:dot_commands):

```sql
.schema
```

This will return the schema of the table.

```sql
CREATE TABLE example (s VARCHAR, x DOUBLE);
```

After editing the table’s name (e.g., `example` to `tbl`), this query can be used to create a new table with the same schema.

#### Shuffling Data

Sometimes, we need to introduce some entropy into the ordering of the data by shuffling it.
To shuffle _non-deterministically_, we can simply sort on a random value provided the [`random()` function](#docs:lts:sql:functions:numeric::random):

```sql
FROM 'example.csv' ORDER BY random();
```

Shuffling _deterministically_ is a bit more tricky. To achieve this, we can order on the [hash](#docs:lts:sql:functions:utility::hashvalue), of the [`rowid` pseudocolumn](#docs:lts:sql:statements:select::row-ids). Note that this column is only available in physical tables, so we first have to load the CSV in a table, then perform the shuffle operation as follows:

```sql
CREATE OR REPLACE TABLE example AS FROM 'example.csv';
FROM example ORDER BY hash(rowid + 42);
```

The result of this shuffle operation is deterministic – if we run the script repeatedly, it will always return the following table:

```text
┌─────────┬────────────────────┐
│    s    │         x          │
│ varchar │       double       │
├─────────┼────────────────────┤
│ bar     │  7.142857142857143 │
│ qux     │               2.25 │
│ foo     │ 1.1111111111111112 │
└─────────┴────────────────────┘
```

Note that the `+ 42` is only necessary to nudge the first row from its position – as `hash(0)` returns `0`, the smallest possible value, using it for ordering leaves the first row in its place.

#### Specifying Types in the CSV Loader

DuckDB’s CSV loader auto-detects types from a [short list](#docs:lts:data:csv:auto_detection::type-detection) of `BOOLEAN`, `BIGINT`, `DOUBLE`, `TIME`, `DATE`, `TIMESTAMP` and `VARCHAR`.
In some cases, it’s desirable to override the detected type of a given column with a type outside of this list.
For example, we may want to treat column `x` as a `DECIMAL` value from the get-go.
We can do this on a per-column basis with the `types` argument of the `read_csv` function:

```sql
CREATE OR REPLACE TABLE example AS
    FROM read_csv('example.csv', types = {'x': 'DECIMAL(15, 3)'});
```

Then, we can simply query the table to see the result:

```sql
FROM example;
```

```text
┌─────────┬───────────────┐
│    s    │       x       │
│ varchar │ decimal(15,3) │
├─────────┼───────────────┤
│ foo     │         1.111 │
│ bar     │         7.143 │
│ qux     │         2.250 │
└─────────┴───────────────┘
```

#### Updating CSV Files In-Place

In DuckDB, it is possible to read, process and write CSV files in-place. For example, to project the column `s` into the same file, we can simply run:

```sql
COPY (SELECT s FROM 'example.csv') TO 'example.csv';
```

The resulting `example.csv` file will have the following content:

```csv
s
foo
bar
qux
```

Note that this trick is not possible in Unix shells without a workaround.
One might be tempted to run the following command on the `example.csv` file and expect the same result:

```bash
cut -d, -f1 example.csv > example.csv
```

However, due to the intricacies of Unix pipelines, executing this command leaves us with an empty `example.csv` file.
The solution is to use different file names, then perform a rename operation:

```bash
cut -d, -f1 example.csv > tmp.csv && mv tmp.csv example.csv
```

#### Closing Thoughts

That’s it for today. The tricks shown in this post are available on [duckdbsnippets.com](https://duckdbsnippets.com/page/1/most-recent). If you have a trick that would like to share, please submit it there, or send it to us via social media or [Discord](https://discord.duckdb.org/). Happy hacking!

## Announcing DuckDB 1.1.0

**Publication date:** 2024-09-09

**Author:** The DuckDB team

**TL;DR:** The DuckDB team is happy to announce that today we're releasing DuckDB version 1.1.0, codenamed “Eatoni”.

To install the new version, please visit the [installation guide](https://duckdb.org/install/index.html).
For the release notes, see the [release page](https://github.com/duckdb/duckdb/releases/tag/v1.1.0).

> Some packages (R, Java) take a few extra days to release due to the reviews required in the release pipelines.

We are proud to release DuckDB 1.1.0, our first release since we released version 1.0.0 three months ago.
This release is codenamed “Eatoni” after the [Eaton's pintail (Anas eatoni)](https://en.wikipedia.org/wiki/Eaton%27s_pintail),
a dabbling duck that occurs only on two very remote island groups in the southern Indian Ocean.

#### What's New in 1.1.0

There have been far too many changes to discuss them each in detail, but we would like to highlight several particularly exciting features!
Below is a summary of those new features with examples.

#### Breaking SQL Changes

[**IEEE-754 semantics for division by zero.**](https://github.com/duckdb/duckdb/pull/13493) The [IEEE-754 floating point standard](https://en.wikipedia.org/wiki/IEEE_754) states that division by zero returns `inf`. Previously, DuckDB would return `NULL` when dividing by zero, also for floating point division. Starting with this release, DuckDB will return `inf` instead.

```sql
SELECT 1 / 0 AS division_by_zero;
```

```text
┌──────────────────┐
│ division_by_zero │
│      double      │
├──────────────────┤
│              inf │
└──────────────────┘
```

The `ieee_floating_point_ops` can be set to `false` to revert this behavior:

```sql
SET ieee_floating_point_ops = false;
SELECT 1 / 0 AS division_by_zero;
```

```text
┌──────────────────┐
│ division_by_zero │
│      double      │
├──────────────────┤
│             NULL │
└──────────────────┘
```

[**Error when scalar subquery returns multiple values.**](https://github.com/duckdb/duckdb/pull/13514) Scalar subqueries can only return a single value per input row. Previously, DuckDB would match SQLite's behavior and select an arbitrary row to return when multiple rows were returned. In practice this behavior often led to confusion. Starting with this release, an error is returned instead, matching the behavior of Postgres. The subquery can be wrapped with `array` to collect all of the results of the subquery in a list.

```sql
SELECT (SELECT unnest(range(10)));
```

```console
Invalid Input Error: More than one row returned by a subquery used as
an expression - scalar subqueries can only return a single row.
```

```sql
SELECT array(SELECT unnest(range(10))) AS subquery_result;
```

```text
┌────────────────────────────────┐
│        subquery_result         │
│            int64[]             │
├────────────────────────────────┤
│ [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] │
└────────────────────────────────┘
```

The `scalar_subquery_error_on_multiple_rows` setting can be set to `false` to revert this behavior.

```sql
SET scalar_subquery_error_on_multiple_rows = false;
SELECT (SELECT unnest(range(10))) AS result;
```

```text
┌────────┐
│ result │
│ int64  │
├────────┤
│      0 │
└────────┘
```

#### Community Extensions

Recently we introduced [Community Extensions](https://duckdb.org/2024/07/05/community-extensions). Community extensions allow anyone to build extensions for DuckDB, that are then built and distributed by us. The [list of community extensions](#community_extensions:list_of_extensions) has been growing since then.

In this release, we have been working towards making community extensions easier to build and produce. This release includes a new method of registering extensions [using the C API](https://github.com/duckdb/duckdb/pull/12682) in addition to a lot of extensions to the C API allowing [scalar functions](https://github.com/duckdb/duckdb/pull/11786), [aggregate functions](https://github.com/duckdb/duckdb/pull/13229) and [custom types](https://github.com/duckdb/duckdb/pull/13499) to be defined. These changes will enable building extensions against a stable API, that are smaller in size, that will work across different DuckDB versions. In addition, these changes will enable building extensions in other programming languages in the future.

#### Friendly SQL

[**Histogram.**](https://github.com/duckdb/duckdb/pull/12590) This version introduces the `histogram` function that can be used to compute histograms over columns of a dataset. The histogram function works for columns of any type, and allows for various different binning strategies and a custom amount of bins.

```sql
FROM histogram(
    'https://blobs.duckdb.org/data/ontime.parquet',
    UniqueCarrier,
    bin_count := 5
);
```

```text
┌────────────────┬────────┬──────────────────────────────────────────────────────────────────────────────────┐
│      bin       │ count  │                                       bar                                        │
│    varchar     │ uint64 │                                     varchar                                      │
├────────────────┼────────┼──────────────────────────────────────────────────────────────────────────────────┤
│ AA             │ 677215 │ ██████████████████████████████████████████████████████▏                          │
│ DL             │ 696931 │ ███████████████████████████████████████████████████████▊                         │
│ OO             │ 521956 │ █████████████████████████████████████████▊                                       │
│ UA             │ 435757 │ ██████████████████████████████████▉                                              │
│ WN             │ 999114 │ ████████████████████████████████████████████████████████████████████████████████ │
│ (other values) │ 945484 │ ███████████████████████████████████████████████████████████████████████████▋     │
└────────────────┴────────┴──────────────────────────────────────────────────────────────────────────────────┘
```

[**SQL variables.**](https://github.com/duckdb/duckdb/pull/13084) This release introduces support for variables that can be defined in SQL. Variables can hold a single value of any type – including nested types like lists or structs. Variables can be set as literals, or from scalar subqueries.

The value stored within variables can be read using `getvariable`. When used in a query, `getvariable` is treated as a literal during query planning and optimization. This allows variables to be used in places where we normally cannot read values from within tables, for example, when specifying which CSV files to read:

```sql
SET VARIABLE list_of_files = (SELECT LIST(file) FROM csv_files);
SELECT * FROM read_csv(getvariable('list_of_files'), filename := true);
```

```text
┌───────┬───────────┐
│   a   │ filename  │
│ int64 │  varchar  │
├───────┼───────────┤
│    42 │ test.csv  │
│    84 │ test2.csv │
└───────┴───────────┘
```

##### Unpacked Columns

The [`COLUMNS` expression](#docs:lts:sql:expressions:star::columns-expression) allows users to write dynamic SQL over a set of columns without needing to explicitly list the columns in the SQL string. Instead, the columns can be selected through either a regex or computed with a [lambda function](https://duckdb.org/2024/08/08/friendly-lists-and-their-buddies-the-lambdas).

This release expands this capability by [allowing the `COLUMNS` expression to be *unpacked* into a function](https://github.com/duckdb/duckdb/pull/11872).
This is especially useful when combined with nested functions like `struct_pack` or `list_value`.

```sql
CREATE TABLE many_measurements (
    id INTEGER, m1 INTEGER, m2 INTEGER, m3 INTEGER
);
INSERT INTO many_measurements VALUES (1, 10, 100, 20);

SELECT id, struct_pack(*COLUMNS('m\d')) AS measurements
FROM many_measurements;
```

```text
┌───────┬────────────────────────────────────────────┐
│  id   │                measurements                │
│ int32 │ struct(m1 integer, m2 integer, m3 integer) │
├───────┼────────────────────────────────────────────┤
│     1 │ {'m1': 10, 'm2': 100, 'm3': 20}            │
└───────┴────────────────────────────────────────────┘
```

##### `query` and `query_table` Functions

The [`query` and `query_table` functions](https://github.com/duckdb/duckdb/pull/10586) take a string literal, and convert it into a `SELECT` subquery or a table reference. Note that these functions can only take literal strings. As such, they are not as powerful (or dangerous) as a generic `eval`.

These functions are conceptually simple, but enable powerful and more dynamic SQL. For example, they allow passing in a table name as a prepared statement parameter:

```sql
CREATE TABLE my_table (i INTEGER);
INSERT INTO my_table VALUES (42);

PREPARE select_from_table AS SELECT * FROM query_table($1);
EXECUTE select_from_table('my_table');
```

```text
┌───────┐
│   i   │
│ int32 │
├───────┤
│    42 │
└───────┘
```

When combined with the `COLUMNS` expression, we can write very generic SQL-only macros. For example, below is a custom version of `SUMMARIZE` that computes the `min` and `max` of every column in a table:

```sql
CREATE OR REPLACE MACRO my_summarize(table_name) AS TABLE
SELECT
    unnest([*COLUMNS('alias_.*')]) AS column_name,
    unnest([*COLUMNS('min_.*')]) AS min_value,
    unnest([*COLUMNS('max_.*')]) AS max_value
FROM (
    SELECT
        any_value(alias(COLUMNS(*))) AS "alias_\0",
        min(COLUMNS(*))::VARCHAR AS "min_\0",
        max(COLUMNS(*))::VARCHAR AS "max_\0"
    FROM query_table(table_name::VARCHAR)
);

SELECT *
FROM my_summarize('https://blobs.duckdb.org/data/ontime.parquet')
LIMIT 3;
```

```text
┌─────────────┬───────────┬───────────┐
│ column_name │ min_value │ max_value │
│   varchar   │  varchar  │  varchar  │
├─────────────┼───────────┼───────────┤
│ year        │ 2017      │ 2017      │
│ quarter     │ 1         │ 3         │
│ month       │ 1         │ 9         │
└─────────────┴───────────┴───────────┘
```

#### Performance

##### Dynamic Filter Pushdown from Joins

This release adds a *very cool* optimization for joins: DuckDB now [automatically creates filters](https://github.com/duckdb/duckdb/pull/12908) for the larger table in the join during execution. Say we are joining two tables `A` and `B`. `A` has 100 rows, and `B` has one million rows. We are joining on a shared key `i`. If there were any filter on `i`, DuckDB would already push that filter into the scan, greatly reducing the cost to complete the query. But we are now filtering on another column from `A`, namely `j`:

```sql
CREATE TABLE A AS
    SELECT range AS i, range AS j
    FROM range(100);

CREATE TABLE B AS
    SELECT t1.range AS i
    FROM range(100) t1, range(10_000) t2;

SELECT count(*)
FROM A
JOIN B
USING (i) WHERE j > 90;
```

DuckDB will execute this join by building a hash table on the smaller table `A`, and then probe said hash table with the contents of `B`. DuckDB will now observe the values of `i` during construction of the hash table on `A`. It will then create a min-max range filter of those values of `i` and then *automatically* apply that filter to the values of `i` in `B`! That way, we early remove (in this case) 90% of data from the large table before even looking at the hash table. In this example, this leads to a roughly 10× improvement in query performance. The optimization can also be observed in the output of `EXPLAIN ANALYZE`.

##### Automatic CTE Materialization

Common Table Expressions (CTE) are a convenient way to break up complex queries into manageable pieces without endless nesting of subqueries. Here is a small example for a CTE:

```sql
WITH my_cte AS (SELECT range AS i FROM range(10))
SELECT i FROM my_cte WHERE i > 5;
```

Sometimes, the same CTE is referenced multiple times in the same query. Previously, the CTE would be “copied” wherever it appeared. This creates a potential performance problem: if computing the CTE is computationally expensive, it would be better to cache (“materialize”) its results instead of computing the result multiple times in different places within the same query. However, different filter conditions might apply for different instantiations of the CTE, which could drastically reduce their computation cost. A classical no-win scenario in databases. It was [already possible](#docs:lts:sql:query_syntax:with) to explicitly mark a CTE as materialized using the `MATERIALIZED` keyword, but that required manual intervention.

This release adds a feature where DuckDB [automatically decides](https://github.com/duckdb/duckdb/pull/12290) whether a CTE result should be materialized or not using a heuristic. The heuristic currently is that if the CTE performs aggregation and is queried more than once, it should be materialized. We plan to expand that heuristic in the future.

##### Parallel Streaming Queries

DuckDB has two different methods for fetching results: *materialized* results and *streaming* results. Materialized results fetch all of the data that is present in a result at once, and return it. Streaming results instead allow iterating over the data in incremental steps. Streaming results are critical when working with large result sets as they do not require the entire result set to fit in memory. However, in previous releases, the final streaming phase was limited to a single thread.

Parallelism is critical for obtaining good query performance on modern hardware, and this release adds support for [parallel streaming of query results](https://github.com/duckdb/duckdb/pull/11494). The system will use all available threads to fill up a query result buffer of a limited size (a few megabytes). When data is consumed from the result buffer, the threads will restart and start filling up the buffer again. The size of the buffer can be configured through the `streaming_buffer_size` parameter.

Below is a small benchmark using [`ontime.parquet`](https://blobs.duckdb.org/data/ontime.parquet) to illustrate the performance benefits that can be obtained using the Python streaming result interface:

```python
import duckdb
duckdb.sql("SELECT * FROM 'ontime.parquet' WHERE flightnum = 6805;").fetchone()
```

|   v1.0 |   v1.1 |
| -----: | -----: |
| 1.17 s | 0.12 s |

##### Parallel `union_by_name`

The `union_by_name` parameter allows combination of – for example – CSV files that have the same columns in them but not in the same order. This release [adds support for parallelism](https://github.com/duckdb/duckdb/pull/12957) when using `union_by_name`. This greatly improves reading performance when using the union by name feature on multiple files.

##### Nested ART Rework (Foreign Key Load Speed-Up)

We have [greatly improved](https://github.com/duckdb/duckdb/pull/13373) index insertion and deletion performance for foreign keys. Normally, we directly inline row identifiers into the tree structure. However, this is impossible for indexes that contain a lot of duplicates, as is the case with foreign keys. Instead, we now actually create another index entry for each key that is itself another “recursive” index tree in its own right. That way, we can achieve good insertion and deletion performance inside index entries. The performance results of this change are drastic, consider the following example where `a` has 100 rows and `b` has one million rows that all reference `a`:

```sql
CREATE TABLE a (i INTEGER, PRIMARY KEY (i));
CREATE TABLE b (i INTEGER, FOREIGN KEY (i) REFERENCES a(i));

INSERT INTO a FROM range(100);
INSERT INTO b SELECT a.range FROM range(100) a, range(10_000) b;
```

On the previous version, this would take ca. 10 seconds on a MacBook to complete. It now takes 0.2 seconds thanks to the new index structure, a ca. 50× improvement!

##### Window Function Improvements

Window functions see a lot of use in DuckDB, which is why we are continuously improving performance of executing Window functions over large datasets.

The [`DISTINCT`](https://github.com/duckdb/duckdb/pull/12311) and [`FILTER`](https://github.com/duckdb/duckdb/pull/12250) window function modifiers can now be executed in streaming mode. Streaming mode means that the input data for the operator does not need to be completely collected and buffered before the operator can execute. For large intermediate results, this can have a very large performance impact. For example, the following query will now use the streaming window operator:

```sql
SELECT
    sum(DISTINCT i)
        FILTER (i % 3 = 0)
        OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM range(10) tbl(i);
```

We have [also implemented streaming mode](https://github.com/duckdb/duckdb/pull/12685) for positive `lead` offsets.

We can now [push filters on columns through window functions that are partitioned by the same column](https://github.com/duckdb/duckdb/pull/10932). For example, consider the following scenario:

```sql
CREATE TABLE tbl2 AS SELECT range i FROM range(10);
SELECT i
FROM (SELECT i, SUM(i) OVER (PARTITION BY i) FROM tbl)
WHERE i > 5;
```

Previously, the filter on `i` could not be pushed into the scan on `tbl`. But we now recognize that pushing this filter “through” the window is safe and the optimizer will do so. This can be verified through `EXPLAIN`:

```text
┌─────────────────────────────┐
│┌───────────────────────────┐│
││       Physical Plan       ││
│└───────────────────────────┘│
└─────────────────────────────┘
              …
┌─────────────┴─────────────┐
│           WINDOW          │
│    ────────────────────   │
│        Projections:       │
│ sum(i) OVER (PARTITION BY │
│             i)            │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         SEQ_SCAN          │
│    ────────────────────   │
│            tbl            │
│                           │
│       Projections: i      │
│                           │
│          Filters:         │
│   i>5 AND i IS NOT NULL   │
│                           │
│          ~2 Rows          │
└───────────────────────────┘
```

The blocking (non-streaming) version of the window operator [now processes input data in parallel](https://github.com/duckdb/duckdb/pull/12907). This greatly reduces the footprint of the window operator.

See also [Richard's talk on the topic](https://www.youtube.com/watch?v=QubE0u8Kq7Y&list=PLzIMXBizEZjhbacz4PWGuCUSxizmLei8Y&index=8) at [DuckCon #5](#_events:2024-08-15-duckcon5) in Seattle a few weeks ago.

#### Spatial Features

##### GeoParquet

GeoParquet is an extension format of the ubiquitous Parquet format that standardizes how to encode vector geometries and their metadata in Parquet files. This can be used to store geographic data sets in Parquet files efficiently. When the [`spatial` extension](#docs:lts:core_extensions:spatial:overview) is installed and loaded, reading from a GeoParquet file through DuckDB's normal Parquet reader will now [automatically convert geometry columns to the `GEOMETRY` type](https://github.com/duckdb/duckdb/pull/12503), for example:

```sql
INSTALL spatial;
LOAD spatial;

FROM 'https://blobs.duckdb.org/data/geoparquet-example.parquet'
SELECT GEOMETRY g
LIMIT 10;
```

```text
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                       g                                                        │
│                                                    geometry                                                    │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ MULTIPOLYGON (((180 -16.067132663642447, 180 -16.555216566639196, 179.36414266196414 -16.801354076946883, 17…  │
│ POLYGON ((33.90371119710453 -0.95, 34.07261999999997 -1.059819999999945, 37.69868999999994 -3.09698999999994…  │
│ POLYGON ((-8.665589565454809 27.656425889592356, -8.665124477564191 27.589479071558227, -8.684399786809053 2…  │
│ MULTIPOLYGON (((-122.84000000000003 49.000000000000114, -122.97421000000001 49.00253777777778, -124.91024 49…  │
│ MULTIPOLYGON (((-122.84000000000003 49.000000000000114, -120 49.000000000000114, -117.03121 49, -116.04818 4…  │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
```

##### R-Tree

The spatial extension accompanying this release also implements initial support for creating “R-Tree” spatial indexes. An R-Tree index stores the approximate bounding boxes of each geometry in a column into an auxiliary hierarchical tree-like data structure where every “node” contains a bounding box covering all of its child nodes. This makes it really fast to check what geometries intersect a specific region of interest as you can quickly prune out a lot of candidates by recursively moving down the tree.

Support for spatial indexes has been a long-requested feature on the spatial extension roadmap, and now that we have one, a ton of new use cases and directions for further development are opening up. However, as of now they are only used to accelerate simple  queries that select from a table with a filter using one out of a hardcoded set of spatial predicate functions applied on an indexed geometry column and a constant geometry. This makes R-Tree indexes useful when you have a very large table of geometries that you repeatedly query, but you don't want to perform a full table scan when you're only interested in the rows whose geometries intersect or fit within a certain region anyway. Here is an example where we can see that the `RTREE_INDEX_SCAN` operator is used:

```sql
INSTALL spatial;
LOAD spatial;

-- Create a table with 10_000_000 random points
CREATE TABLE t1 AS SELECT point::GEOMETRY AS geom
FROM st_generatepoints(
        {min_x: 0, min_y: 0, max_x: 10_000, max_y: 10_000}::BOX_2D,
        10_000_000,
        1337
    );

-- Create an index on the table
CREATE INDEX my_idx ON t1 USING RTREE (geom);

-- Perform a query with a "spatial predicate" on the indexed geometry
-- column. Note how the second argument in this case,
-- the ST_MakeEnvelope call is a "constant"
SELECT count(*)
FROM t1
WHERE ST_Within(geom, ST_MakeEnvelope(450, 450, 650, 650));
```

```text
3986
```

R-Tree indexes mostly share the same feature-set as DuckDB's built-in ART index. They are buffer-managed, persistent, lazily-loaded from disk and support inserts, updates and deletes to the base table. Although they can not be used to enforce constraints.

#### Final Thoughts

These were a few highlights – but there are many more features and improvements in this release. The full release notes can be [found on GitHub](https://github.com/duckdb/duckdb/releases/tag/v1.1.0).

We would like to thank again our amazing community for using DuckDB, building cool projects on DuckDB and improving DuckDB by providing us feedback. Your contributions truly mean a lot!

## Changing Data with Confidence and ACID

**Publication date:** 2024-09-25

**Authors:** Hannes Mühleisen, Mark Raasveldt

**TL;DR:** Transactions are key features in database management systems and are also beneficial for data analysis workloads. DuckDB supports fully ACID transactions, confirmed by the TPC-H benchmark's test suite.

The great quote “Everything changes and nothing stays the same” from [Heraclitus, according to Socrates, according to Plato](https://latin.stackexchange.com/a/9473) is not very controversial: change is as old as the universe. Yet somehow, when dealing with data, we often consider change as merely an afterthought.

Static datasets are split-second snapshots of whatever the world looked like at one moment. But very quickly, the world moves on, and the dataset needs to catch up to remain useful. In the world of tables, new rows can be added, old rows may be deleted and sometimes rows have to be changed to reflect a new situation. Often, changes are interconnected. A row in a table that maps orders to customers is not very useful without the corresponding entry in the `orders` table. Most, if not all, datasets eventually get changed. As a data management system, managing change is thus not optional. However, managing changes properly is difficult.

#### ACID Guarantees

Early data management systems researchers invented a concept called “transactions”, the notions of which were [first formalized](https://dl.acm.org/doi/abs/10.5555/48751.48761) [in the 1980s](https://dl.acm.org/doi/10.1145/289.291). In essence, transactionality and the well-known ACID principles describe a set of guarantees that a data management system has to provide in order to be considered safe. ACID is an acronym that stands for Atomicity, Consistency, Isolation and Durability.

The ACID principles are not a theoretical exercise. Much like the rules governing airplanes or trains, they have been “written in blood” – they are hard-won lessons from decades of data management practice. It is very hard for an application to reason correctly when dealing with non-ACID systems. The end result of such problems is often corrupted data or data that no longer reflects reality accurately. For example, rows can be duplicated or missing.

DuckDB provides full ACID guarantees by default without additional configuration. In this blog post, we will describe in detail what that means together with concrete examples, and show how you can take advantage of this functionality.

##### Atomicity

**Atomicity** means that *either all changes in a set of updates happen or none of them happen*. Consider the example below, where we insert two rows in two separate tables. The inserts themselves are separate statements, but they can be made atomic by wrapping them in a transaction:

```sql
CREATE TABLE customer (id INTEGER, name VARCHAR);
CREATE TABLE orders (customer_id INTEGER, item VARCHAR);

BEGIN TRANSACTION;
INSERT INTO customer VALUES (42, 'DuckDB Labs');
INSERT INTO orders VALUES (42, 'stale bread');
COMMIT;

SELECT * FROM orders;
```

```text
┌─────────────┬─────────────┐
│ customer_id │    item     │
│    int32    │   varchar   │
├─────────────┼─────────────┤
│          42 │ stale bread │
└─────────────┴─────────────┘
```

By wrapping the changes in a transaction, we can be sure that *either both rows are written, or none of them are written*. The `BEGIN TRANSACTION` statement signifies all following statements belong to that transaction. The `COMMIT` signifies the end of the transaction – and will persist the changes to disk.

It is also possible to undo a set of changes by issuing a `ROLLBACK` at the end of a transaction. This will ensure that none of the changes made in the transaction are persisted.

```sql
BEGIN TRANSACTION;
INSERT INTO orders VALUES (42, 'iceberg lettuce');
INSERT INTO orders VALUES (42, 'dried worms');
ROLLBACK;
SELECT * FROM orders;
```

```text
┌─────────────┬─────────────┐
│ customer_id │    item     │
│    int32    │   varchar   │
├─────────────┼─────────────┤
│          42 │ stale bread │
└─────────────┴─────────────┘
```

As we can see, the two new rows have not been inserted permanently.

Atomicity is great to have because it allows the application to move the database from one consistent state to another consistent state without ever having to worry about intermediate states being visible to an application.

We should note that queries by default run in the so-called “auto-commit” mode, where each query will automatically be run in its own transaction. That said, even for these single-statement queries, transactions are very useful. For example, when bulk loading data into a table using an `INSERT` or `COPY` command, either _all_ of the data is loaded, or _none_ of the data is loaded. The system will not partially load a CSV file into a table.

We should also note that in DuckDB *schema changes are also transactional*. This means that you can create or delete tables, as well as alter the schema of a table, all within the safety of a transaction. It also means that you can undo any of these operations by issuing a `ROLLBACK`.

##### Consistency

**Consistency** means that all of [the constraints that are defined in the database](#docs:lts:sql:constraints) must always hold, both before and after a transaction. The constraints can never be violated. Examples of constraints are `PRIMARY KEY` or `FOREIGN KEY` constraints.

```sql
CREATE TABLE customer (id INTEGER, name VARCHAR, PRIMARY KEY (id));

INSERT INTO customer VALUES (42, 'DuckDB Labs');
INSERT INTO customer VALUES (42, 'Wilbur the Duck');
```

In the example above, the `customer` table requires the `id` column to be unique for all entries, otherwise multiple customers would be associated with the same orders. We can enforce this constraint by defining a so-called `PRIMARY KEY` on that column. When we insert two entries with the same id, the consistency check fails, and we get an error message:

```console
Constraint Error: Duplicate key "id: 42" violates primary key
constraint. (...)
```

Having these kinds of constraints in place is a great way to make sure data *remains* consistent even after many updates have taken place.

##### Isolation

**Isolation** means that concurrent transactions are isolated from one another. A database can have many clients interacting with it *at the same time,* causing many transactions to happen all at once. An easy way of isolating these transactions is to execute them one after another. However, that would be prohibitively slow. Thousands of requests might have to wait for one particularly slow one.

To avoid this problem, transactions are typically executed *interleaved*. However, as those transactions change data, one must ensure that each transaction is logically *isolated* – it only ever sees a consistent state of the database and can – for example – never read data from a transaction that has not yet committed.

DuckDB does not have connections in the typical sense – as it is not a client/server database that allows separate applications to connect to it. However, DuckDB has [full multi-client support](#docs:lts:connect:concurrency) within a single application. The user can create multiple clients that all connect to the same DuckDB instance. The transactions can be run concurrently and they are isolated using [Snapshot Isolation](https://jepsen.io/consistency/models/snapshot-isolation).

The way that multiple connections are created differs per client. Below is an example where we showcase the transactionality of the system using the Python client.

```python
import duckdb

con1 = duckdb.connect(":memory:mydb")
con1.sql("CREATE TABLE customer (id INTEGER, name VARCHAR)")

con1.sql("INSERT INTO customer VALUES (42, 'DuckDB Labs')")

con1.begin()
con1.sql("INSERT INTO customer VALUES (43, 'Wilbur the Duck')")
# no commit!

# start a new connection
con2 = duckdb.connect(":memory:mydb")
con2.sql("SELECT name FROM customer").show()

# ┌─────────────┐
# │    name     │
# │   varchar   │
# ├─────────────┤
# │ DuckDB Labs │
# └─────────────┘

# commit from the first connection
con1.commit()

# now the changes are visible
con2.sql("SELECT name FROM customer").show()

# ┌─────────────────┐
# │      name       │
# │     varchar     │
# ├─────────────────┤
# │ DuckDB Labs     │
# │ Wilbur the Duck │
# └─────────────────┘
```

As you can see, we have two connections to the same database, and the first connection inserts the `Wilbur the Duck` customer but *does not yet commit the change*. Meanwhile, the second connection reads from the customer table. The result does not yet show the new entry, because the two transactions are isolated from each other with regards to uncommitted changes. After the first connection commits, the second connection can read its changes.

##### Durability

Finally, **durability** is the behavior of a system under failure. This is important as a process might crash or power to a computer may be lost. A database system now needs to ensure that _all committed transactions_ are durable, meaning their effects will be visible after restarting the database. Transactions that have not yet completed cannot leave any visible traces behind. Databases typically guarantee this property by keeping close tabs on the various caches, for example by using `fsync` to force changes to disk as transactions complete. Skipping the `fsync` is a common “optimization” that endangers durability.

Here is an example, again using Python:

```python
import duckdb
import os
import signal

con = duckdb.connect("mydb.duckdb")
con.sql("CREATE TABLE customer (id INTEGER, name VARCHAR)")
con.sql("INSERT INTO customer VALUES (42, 'DuckDB Labs')")

# begin a transaction
con.begin()
con.sql("INSERT INTO customer VALUES (43, 'Wilbur the Duck')")
# no commit!

os.kill(os.getpid(), signal.SIGKILL)
```

After restarting, we can check the `customer` table:

```python
import duckdb

con = duckdb.connect("mydb.duckdb")
con.sql("SELECT name FROM customer").show()
```

```text
┌─────────────┐
│    name     │
│   varchar   │
├─────────────┤
│ DuckDB Labs │
└─────────────┘
```

In this example, we first create the customer table in the database file `mydb.duckdb`. We then insert a single row with DuckDB Labs as a first transaction. Then, we begin but *do not commit* a second transaction that adds the `Wilbur the Duck` entry. If we then kill the process and with it the database, we can see that upon restart only the `DuckDB Labs` entry has survived. This is because the second transaction was not committed and hence not subject to durability. Of course, this gets more complicated when non-clean exits such as operating system crashes have to be considered. DuckDB also guarantees durability in those circumstances, some more on this below.

#### Why ACID in OLAP?

There are two main classes of data management systems, transactional systems (OLTP) and analytical systems (OLAP). As the name implies, transactional systems are far more concerned with guaranteeing the ACID properties than analytical ones. Systems like the venerable PostgreSQL deservedly pride themselves on doing the “right thing” with regard to providing transactional guarantees by default. Even NoSQL transactional systems such as MongoDB that swore off guaranteeing the ACID principles “for performance” early on had to eventually [“roll back” to offering ACID guarantees](https://www.mongodb.com/resources/basics/databases/acid-transactions) with [one or two hurdles along the way](https://jepsen.io/analyses/mongodb-4.2.6).

Analytical systems such as DuckDB – in principle – have less of an imperative to provide strong transactional guarantees. They are often not the so-called “system of record”, which is the data management system that is considered the source truth. In fact, DuckDB offers various connectors to load data from systems of record, like the [PostgreSQL scanner](#docs:lts:core_extensions:postgres). If an OLAP database would become corrupted, it is often possible to recover from that source of truth. Of course, that first requires that users notice that something has gone wrong, which is not always simple to detect. For example, a common mistake is ingesting data from the same CSV file twice into a database because the first attempt went wrong at some point. This can lead to duplicate rows causing incorrect aggregate results. ACID prevents these kinds of problems. ACID properties enable  useful functionality in OLAP systems. For example:

**Concurrent Ingestion and Reporting.** As change is continuous, we often have data ingestion streams adding new data to a database system. In analytical systems, it is common to have a single connection append new data to a database, while other connections read from the database in order to e.g., generate graphs and reports. If these connections are isolated, then the generated graphs and aggregates will always be executed over a complete and consistent snapshot of the database, ensuring that the generated graphs and aggregates are correct.

**Rolling Back Incorrect Transformations.** When analyzing data, a common pattern is loading data from data sets stored in flat files followed by performing a number of transformations on that data. For example, we might load a data set from a CSV file, followed by cleaning up `NULL` values and then deleting incomplete rows. If we make an incorrect transformation, it is possible we accidentally delete too many rows.

This is not the end of the world, as we can recover by re-reading from the original CSV files. However, we can save ourselves a lot of time by wrapping the transformations in a transaction and rolling back when something goes wrong. For example:

```sql
CREATE TABLE people AS SELECT * FROM 'people.csv';

BEGIN TRANSACTION;
UPDATE people SET age = NULL WHERE age = -99;
-- oops, we deleted all rows!
DELETE FROM people WHERE name <> 'non-existent name';
-- we can recover our original table by rolling back the delete
ROLLBACK;
```

**SQL Assertions.** When a (non-syntax) error occurs in a transaction, the transaction is automatically aborted, and the changes cannot be committed. We can use this property of transactions to add assertions to our transactions. When one of these assertions is triggered, an error is raised, and the transaction cannot be committed. We can use the `error` function to define our own `assert` macro:

```sql
CREATE MACRO assert(condition, message) AS
    CASE WHEN NOT condition THEN error(message) END;
```

We can then use this `assert` macro to assert that the `people` table is not empty:

```sql
CREATE TABLE people AS SELECT * FROM 'people.csv';

BEGIN TRANSACTION;
UPDATE people SET age = NULL WHERE age = -99;
-- oops, we deleted all rows!
DELETE FROM people WHERE name <> 'non-existent name';
SELECT assert(
           (SELECT count(*) FROM people) > 0,
           'People should not be empty'
       );
COMMIT;
```

When the assertion triggers, the transaction is automatically aborted, and the changes are rolled back.

#### Full TPC-H Benchmark Implementation

The [Transaction Processing Performance Council (TPC)](https://www.tpc.org/tpch/) is an industry association of data management systems and hardware vendors. TPC publishes database benchmark specifications and oversees auditing of benchmark results, which it then publishes on its website. There are various benchmarks aimed at different use cases. The [TPC-H decision support benchmark](https://www.tpc.org/tpch/) is specifically aimed at analytical query processing on large volumes of data. Its famous 22 SQL queries and data generator specifics have been thourougly analyzed by both database vendors and [academics](https://homepages.cwi.nl/~boncz/snb-challenge/chokepoints-tpctc.pdf) ad nauseam.

It is less well known that the official TPC-H benchmark includes *data modification transactions* that require ACID compliance, which is not too-surprising given the name of the organization. For one-off performance shoot-outs, the updates are typically ignored and only the run-times of the 22 queries on a static dataset are reported. Such results are purely informational and cannot be audited or formally published by the TPC. But as we have argued above, change is inevitable, so let's perform the TPC-H experiments *with updates* with DuckDB.

TPC-H generates data for a fictional company selling things. The largest tables are `orders` and `lineitem`, which contains elements of each order. The benchmark can generate data of different sizes, the size is controlled by a so-called “scale factor” (SF). The specification defines two “refresh functions”, that modify the database. The first refresh function will add `SF * 1500` new rows into the `orders` table, and randomly between 1 and 7 new entries for each order into the `lineitem` table. The second refresh function will delete `SF * 1500` entries from the `orders` table along with the associated `lineitem` entries. The benchmark data generator `dbgen` can generate an arbitrary amount of refresh function CSV files with new entries for `orders` and `lineitem` along with rows to be deleted.

##### Metrics

TPC-H's main benchmark metric is combined from both a “power” and a “throughput” test result.

The power test will run the first refresh function and time it, then run the 22 queries, then run the second refresh function, and calculate the geometric mean of all timings. With a scale factor of 100 and DuckDB 1.1.1 on a MacBook Pro with an M3 Max CPU and 64 GB of RAM, we get a *Power@Size value of 650&nbsp;536*.

The throughput test will run a number of concurrent query “streams” that execute the 22 benchmark queries in shuffled order in parallel. In addition, a single refresh stream will run both refresh functions a number of times. The number of query streams and refresh sets is derived from the scale factor. For SF100, there are 5 query streams and 10 refresh sets. For our experiment, we get a *Throughput@Size of 452&nbsp;571*. Results are hard to compare, but the result does not look too shabby when compared with the [official result list](https://www.tpc.org/tpch/results/tpch_results5.asp?print=false&orderby=tpm&sortby=desc&version=3%).

##### ACID Tests

Section 3 of the TPC-H benchmark specification discusses the ACID properties in detail. The specification defines a set of tests to stress the ACID guarantees of a data management system. The spec duly notes that no test can prove that the ACID properties are fully supported, passing them is a “necessary but not sufficient condition” of compliance.  Below, we will give an overview of what is tested.

The tests specify an “ACID Transaction”, which modifies the `lineitem` and `orders` tables in such a way that an invariant holds: the `orders` table contains a total sum of all the prices of all the lineitems that belong to this order. The transaction picks a random order, and modifies the last lineitem to have a new price. It then re-calculates the order total price and updates the `orders` table with that. Finally, the transaction inserts information about which row was updated when and the price delta used in a `history` table.

To test *atomicity*, the ACID transaction is ran for a random order and then committed. It is verified that the database has been changed accordingly with the specified values. The test is repeated but this time the transaction is aborted. It is verified that the database has not been changed.

For *consistency*, a number of threads run the ACID transaction in parallel 100 times on random orders. Before and after the test, a consistency condition is checked, which essentially makes sure that the sum of all lineitem prices for an order is consistent with the sum in the order.

To test *isolation*, one thread will run the transaction, but not commit or rollback yet. Another connection will make sure the changes are not visible to it. Another set of tests will have two threads running transactions on the same order, and ensure that one of them is aborted by the system due to the conflict.

Finally, to test *durability*, a number of threads run the ACID transaction and log the results. They are allowed to complete at least 100 transactions each. Then, a failure is caused, in our case, we simply killed the process (using `SIGKILL`). Then, the database system is allowed to recover the committed changes from the [write-ahead log](https://en.wikipedia.org/wiki/Write-ahead_logging). The log is checked to ensure that there are no log entries that are not reflected in the `history` table and there are no history entries that don't have log entries, minus very few that might have been lost in flight (i.e., persisted by the database but not yet logged by the benchmark driver). Finally, the consistency is checked again.

**We're happy to report that DuckDB passed all tests.**

Our scripts to run the benchmark are [available on GitHub](https://github.com/hannes/duckdb-tpch-power-test). We are planning to perform a formal audit of our results in the future. We will update this post when that happens.

#### Conclusion

Change in datasets is inevitable, and data management systems need to be able to safely manage change. DuckDB supports strong ACID guarantees that allow for safe and concurrent data modification. We have run extensive experiments with TPC-H's transactional validation tests and found that they pass.

## Creating a SQL-Only Extension for Excel-Style Pivoting in DuckDB

**Publication date:** 2024-09-27

**Author:** Alex Monahan

**TL;DR:** Easily create sharable extensions using only SQL macros that can apply to any table and any columns. We demonstrate the power of this capability with the pivot_table extension that provides Excel-style pivoting.

#### The Power of SQL-Only Extensions

SQL is not a new language.
As a result, it has historically been missing some of the modern luxuries we take for granted.
With version 1.1, DuckDB has launched community extensions, bringing the incredible power of a package manager to the SQL language.
A bold goal of ours is for DuckDB to become a convenient way to wrap any C++ library, much the way that Python does today, but across any language with a DuckDB client.

For extension builders, compilation and distribution are much easier.
For the user community, installation is as simple as two commands:

```sql
INSTALL pivot_table FROM community;
LOAD pivot_table;
```

The extension can then be used in any query through SQL functions.

However, **not all of us are C++ developers**!
Can we, as a SQL community, build up a set of SQL helper functions?
What would it take to build these extensions with *just SQL?*

##### Reusability

Traditionally, SQL is highly customized to the schema of the database on which it was written.
Can we make it reusable?
Some techniques for reusability were discussed in the [SQL Gymnasics post](https://duckdb.org/2024/03/01/sql-gymnastics), but now we can go even further.
With version 1.1, DuckDB's world-class friendly SQL dialect makes it possible to create macros that can be applied:

* To any tables
* On any columns
* Using any functions

The new ability to work **on any tables** is thanks to the [`query` and `query_table` functions](https://duckdb.org/2024/09/09/announcing-duckdb-110#query-and-query_table-functions)!
The `query` function is a safe way to execute `SELECT` statements defined by SQL strings, while `query_table` is a way to make a `FROM` clause pull from multiple tables at once.
They are very powerful when used in combination with other friendly SQL features like the `COLUMNS` expression and  `LIST` lambda functions.

##### Community Extensions as a Central Repository

Traditionally, there has been no central repository for SQL functions across databases, let alone across companies!
DuckDB's community extensions can be that knowledge base.
DuckDB extensions can be used across all languages with a DuckDB client, including Python, NodeJS, Java, Rust, Go, and even WebAssembly (Wasm)!

If you are a DuckDB fan and a SQL user, you can share your expertise back to the community with an extension.
This post will show you how!
No C++ knowledge is needed – just a little bit of copy/paste and GitHub Actions handles all the compilation. 
If I can do it, you can do it!

##### Powerful SQL

All that said, just how valuable can a SQL `MACRO` be?
Can we do more than make small snippets?
I'll make the case that you can do quite complex and powerful operations in DuckDB SQL using the `pivot_table` extension as an example.
The `pivot_table` function allows for Excel-style pivots, including `subtotals`, `grand_totals`, and more.
It is also very similar to the Pandas `pivot_table` function, but with all the scalability and speed benefits of DuckDB.
It contains over **250 tests**, so it is intended to be useful beyond just an example!

To achieve this level of flexibility, the `pivot_table` extension uses many friendly and advanced SQL features:

* The [`query` function](https://duckdb.org/2024/09/09/announcing-duckdb-110#query-and-query_table-functions) to execute a SQL string
* The [`query_table` function](https://duckdb.org/2024/09/09/announcing-duckdb-110#query-and-query_table-functions) to query a list of tables
* The [`COLUMNS` expression](#docs:lts:sql:expressions:star::columns-expression) to select a dynamic list of columns
* [List lambda functions](#docs:lts:sql:functions:lambda) to build up the SQL statement passed into `query`
    * [`list_transform`](#docs:lts:sql:functions:lambda::list_transformlist-lambda) for string manipulation like quoting
    * [`list_reduce`](#docs:lts:sql:functions:lambda::list_reducelist-lambda) to concatenate strings together
    * [`list_aggregate`](#docs:lts:sql:functions:list::list_aggregatelist-name) to sum multiple columns and identify subtotal and grand total rows
* [Bracket notation for string slicing](#docs:lts:sql:functions:text::stringbeginend)
* [`UNION ALL BY NAME`](#docs:lts:sql:query_syntax:setops::union-all-by-name) to stack data by column name for subtotals and grand totals
* [`SELECT * REPLACE`](#docs:lts:sql:expressions:star::replace-clause) to dynamically clean up subtotal columns
* [`SELECT * EXCLUDE`](#docs:lts:sql:expressions:star::exclude-clause) to remove internally generated columns from the final result
* [`GROUPING SETS` and `ROLLUP`](#docs:lts:sql:query_syntax:grouping_sets) to generate subtotals and grand totals
* [`UNNEST`](#docs:lts:sql:query_syntax:unnest) to convert lists into separate rows for `values_axis := 'rows'`
* [`MACRO`s](#docs:lts:sql:statements:create_macro) to modularize the code
* [`ORDER BY ALL`](#docs:lts:sql:query_syntax:orderby::order-by-all) to order the result dynamically
* [`ENUM`s](#docs:lts:sql:statements:create_type) to determine what columns to pivot horizontally
* And of course the [`PIVOT` function](#docs:lts:sql:statements:pivot) for horizontal pivoting!

DuckDB's innovative syntax makes this extension possible!

So, we now have all 3 ingredients we will need: a central package manager, reusable macros, and enough syntactic flexibility to do valuable work.

#### Create Your Own SQL Extension

Let's walk through the steps to creating your own SQL-only extension.

##### Writing the Extension

###### Extension Setup

The first step is to create your own GitHub repo from the [DuckDB Extension Template for SQL](https://github.com/duckdb/extension-template-sql) by clicking _Use this template._

Then clone your new repository onto your local machine using the terminal:

```bash
git clone --recurse-submodules \
    https://github.com/⟨your_github_username⟩/⟨your_extension_repo⟩.git
```

Note that `--recurse-submodules` will ensure DuckDB is pulled which is required to build the extension.

Next, replace the name of the example extension with the name of your extension in all the right places by running the Python script below.

> **Note.** If you don't have Python installed, head to [python.org](https://python.org) and follow those instructions.
> This script doesn't require any libraries, so Python is all you need! (No need to set up any environments.)

```python
python3 ./scripts/bootstrap-template.py ⟨extension_name_you_want⟩
```

###### Initial Extension Test

At this point, you can follow the directions in the README to build and test locally if you would like.
However, even easier, you can simply commit your changes to git and push them to GitHub, and GitHub Actions can do the compilation for you!
GitHub Actions will also run tests on your extension to validate it is working properly.

> **Note.** The instructions are not written for a Windows audience, so we recommend GitHub Actions in that case!

```bash
git add -A
git commit -m "Initial commit of my SQL extension!"
git push
```

###### Write Your SQL Macros

It is likely a bit faster to iterate if you test your macros directly in DuckDB. 
After you have written your SQL, we will move it into the extension.
The example we will use demonstrates how to pull a dynamic set of columns from a dynamic table name (or a view name!).

```sql
CREATE OR REPLACE MACRO select_distinct_columns_from_table(table_name, columns_list) AS TABLE (
    SELECT DISTINCT
        COLUMNS(lambda column_name: list_contains(columns_list, column_name))
    FROM query_table(table_name)
    ORDER BY ALL
);

FROM select_distinct_columns_from_table('duckdb_types', ['type_category']);
```

| type_category |
| ------------- |
| BOOLEAN       |
| COMPOSITE     |
| DATETIME      |
| NUMERIC       |
| STRING        |
| NULL          |

###### Add SQL Macros

Technically, this is the C++ part, but we are going to do some copy/paste and use GitHub Actions for compiling so it won't feel that way!

DuckDB supports both scalar and table macros, and they have slightly different syntax.
The extension template has an example for each (and code comments too!) inside the file named `⟨your_extension_name⟩.cpp`.
Let's add a table macro here since it is the more complex one.
We will copy the example and modify it!

{% raw %}
```cpp
static const DefaultTableMacro ⟨your_extension_name⟩_table_macros[] = {
    {DEFAULT_SCHEMA, "times_two_table", {"x", nullptr}, {{"two", "2"}, {nullptr, nullptr}}, R"(SELECT x * two AS output_column;)"},
    {
        DEFAULT_SCHEMA, // Leave the schema as the default
        "select_distinct_columns_from_table", // Function name
        {"table_name", "columns_list", nullptr}, // Parameters
        {{nullptr, nullptr}}, // Optional parameter names and values (we choose not to have any here)
        // The SQL text inside of your SQL Macro, wrapped in R"( )", which is a raw string in C++
        R"(
            SELECT DISTINCT
                COLUMNS(lambda column_name: list_contains(columns_list, column_name))
            FROM query_table(table_name)
            ORDER BY ALL
        )"
    },
    {nullptr, nullptr, {nullptr}, {{nullptr, nullptr}}, nullptr}
    };
```
{% endraw %}

That's it!
All we had to provide were the name of the function, the names of the parameters, and the text of our SQL macro.

##### Testing the Extension

We also recommend adding some tests for your extension to the `⟨your_extension_name⟩.test` file.
This uses [sqllogictest](#docs:lts:dev:sqllogictest:intro) to test with just SQL!
Let's add the example from above.

> **Note.** In sqllogictest, `query I` indicates that there will be 1 column in the result.
> We then add `----` and the resultset in tab separated format with no column names.

```sql
query I
FROM select_distinct_columns_from_table('duckdb_types', ['type_category']);
----
BOOLEAN
COMPOSITE
DATETIME
NUMERIC
STRING
NULL
```

Now, just add, commit, and push your changes to GitHub like before, and GitHub Actions will compile your extension and test it!

If you would like to do further ad-hoc testing of your extension, you can download the extension from your GitHub Actions run's artifacts and then [install it locally using these steps](#docs:lts:core_extensions:overview::unsigned-extensions).

##### Uploading to the Community Extensions Repository

Once you are happy with your extension, it's time to share it with the DuckDB community!
Follow the steps in the [Community Extensions post](https://duckdb.org/2024/07/05/community-extensions#developer-experience).
A summary of those steps is:

1. Send a PR with a metadata file `description.yml` that contains the description of the extension. For example, the [`h3` community extension](#community_extensions:extensions:h3) uses the following YAML configuration:

   ```yaml
   extension:
     name: h3
     description: Hierarchical hexagonal indexing for geospatial data
     version: 1.0.0
     language: C++
     build: cmake
     license: Apache-2.0
     maintainers:
       - isaacbrodsky

   repo:
     github: isaacbrodsky/h3-duckdb
     ref: 3c8a5358e42ab8d11e0253c70f7cc7d37781b2ef
   ```

2. Wait for approval from the maintainers.

And there you have it!
You have created a shareable DuckDB community extension.
Now let's have a look at the `pivot_table` extension as an example of just how powerful a SQL-only extension can be.

#### Capabilities of the `pivot_table` Extension

The `pivot_table` extension supports advanced pivoting functionality that was previously only available in spreadsheets, dataframe libraries, or custom host language functions.
It uses the Excel pivoting API: `values`, `rows`, `columns`, and `filters` – handling 0 or more of each of those parameters.
However, not only that, but it supports `subtotals` and `grand_totals`.
If multiple `values` are passed in, the `values_axis` parameter allows the user to choose if each value should get its own column or its own row.

Why is this a good example of how DuckDB moves beyond traditional SQL?
The Excel pivoting API requires dramatically different SQL syntax depending on which parameters are in use.
If no `columns` are pivoted outward, a `GROUP BY` is all that is needed.
However, once `columns` are involved, a `PIVOT` is required.

This function can operate on one or more `table_names` that are passed in as a parameter.
Any set of tables (or views!) will first be vertically stacked and then pivoted.

#### Example Using `pivot_table`



[Check out a live example using the extension in the DuckDB Wasm shell here](https://shell.duckdb.org/#queries=v0,CREATE-OR-REPLACE-TABLE-business_metrics-(-----product_line-VARCHAR%2C-product-VARCHAR%2C-year-INTEGER%2C-quarter-VARCHAR%2C-revenue-integer%2C-cost-integer-)~,INSERT-INTO-business_metrics-VALUES-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2022%2C-'Q1'%2C-100%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2022%2C-'Q2'%2C-200%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2022%2C-'Q3'%2C-300%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2022%2C-'Q4'%2C-400%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2023%2C-'Q1'%2C-500%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2023%2C-'Q2'%2C-600%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2023%2C-'Q3'%2C-700%2C-100)%2C-----('Waterfowl-watercraft'%2C-'Duck-boats'%2C-2023%2C-'Q4'%2C-800%2C-100)%2C------('Duck-Duds'%2C-'Duck-suits'%2C-2022%2C-'Q1'%2C-10%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2022%2C-'Q2'%2C-20%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2022%2C-'Q3'%2C-30%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2022%2C-'Q4'%2C-40%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2023%2C-'Q1'%2C-50%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2023%2C-'Q2'%2C-60%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2023%2C-'Q3'%2C-70%2C-10)%2C-----('Duck-Duds'%2C-'Duck-suits'%2C-2023%2C-'Q4'%2C-80%2C-10)%2C------('Duck-Duds'%2C-'Duck-neckties'%2C-2022%2C-'Q1'%2C-1%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2022%2C-'Q2'%2C-2%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2022%2C-'Q3'%2C-3%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2022%2C-'Q4'%2C-4%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2023%2C-'Q1'%2C-5%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2023%2C-'Q2'%2C-6%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2023%2C-'Q3'%2C-7%2C-1)%2C-----('Duck-Duds'%2C-'Duck-neckties'%2C-2023%2C-'Q4'%2C-8%2C-1)%2C~,FROM-business_metrics~,INSTALL-pivot_table-from-community~,LOAD-'pivot_table'~,DROP-TYPE-IF-EXISTS-columns_parameter_enum~,CREATE-TYPE-columns_parameter_enum-AS-ENUM-(FROM-build_my_enum(['business_metrics']%2C-['year'%2C-'quarter']%2C-[]))~,FROM-pivot_table(['business_metrics']%2C['sum(revenue)'%2C-'sum(cost)']%2C-['product_line'%2C-'product']%2C-['year'%2C-'quarter']%2C-[]%2C-subtotals-%3A%3D-1%2C-grand_totals-%3A%3D-1%2C-values_axis-%3A%3D-'rows')~)!



<details markdown='1'>
<summary markdown='span'>
    First we will create an example data table. We are a duck product distributor, and we are tracking our fowl finances.
</summary>

```sql
CREATE OR REPLACE TABLE business_metrics (
    product_line VARCHAR,
    product VARCHAR,
    year INTEGER,
    quarter VARCHAR,
    revenue INTEGER,
    cost INTEGER
);

INSERT INTO business_metrics VALUES
    ('Waterfowl watercraft', 'Duck boats', 2022, 'Q1', 100, 100),
    ('Waterfowl watercraft', 'Duck boats', 2022, 'Q2', 200, 100),
    ('Waterfowl watercraft', 'Duck boats', 2022, 'Q3', 300, 100),
    ('Waterfowl watercraft', 'Duck boats', 2022, 'Q4', 400, 100),
    ('Waterfowl watercraft', 'Duck boats', 2023, 'Q1', 500, 100),
    ('Waterfowl watercraft', 'Duck boats', 2023, 'Q2', 600, 100),
    ('Waterfowl watercraft', 'Duck boats', 2023, 'Q3', 700, 100),
    ('Waterfowl watercraft', 'Duck boats', 2023, 'Q4', 800, 100),

    ('Duck Duds', 'Duck suits', 2022, 'Q1', 10, 10),
    ('Duck Duds', 'Duck suits', 2022, 'Q2', 20, 10),
    ('Duck Duds', 'Duck suits', 2022, 'Q3', 30, 10),
    ('Duck Duds', 'Duck suits', 2022, 'Q4', 40, 10),
    ('Duck Duds', 'Duck suits', 2023, 'Q1', 50, 10),
    ('Duck Duds', 'Duck suits', 2023, 'Q2', 60, 10),
    ('Duck Duds', 'Duck suits', 2023, 'Q3', 70, 10),
    ('Duck Duds', 'Duck suits', 2023, 'Q4', 80, 10),

    ('Duck Duds', 'Duck neckties', 2022, 'Q1', 1, 1),
    ('Duck Duds', 'Duck neckties', 2022, 'Q2', 2, 1),
    ('Duck Duds', 'Duck neckties', 2022, 'Q3', 3, 1),
    ('Duck Duds', 'Duck neckties', 2022, 'Q4', 4, 1),
    ('Duck Duds', 'Duck neckties', 2023, 'Q1', 5, 1),
    ('Duck Duds', 'Duck neckties', 2023, 'Q2', 6, 1),
    ('Duck Duds', 'Duck neckties', 2023, 'Q3', 7, 1),
    ('Duck Duds', 'Duck neckties', 2023, 'Q4', 8, 1),
;

FROM business_metrics;
```
</details>

| product_line         | product       | year | quarter | revenue | cost |
| -------------------- | ------------- | ---: | ------- | ------: | ---: |
| Waterfowl watercraft | Duck boats    | 2022 | Q1      |     100 |  100 |
| Waterfowl watercraft | Duck boats    | 2022 | Q2      |     200 |  100 |
| Waterfowl watercraft | Duck boats    | 2022 | Q3      |     300 |  100 |
| Waterfowl watercraft | Duck boats    | 2022 | Q4      |     400 |  100 |
| Waterfowl watercraft | Duck boats    | 2023 | Q1      |     500 |  100 |
| Waterfowl watercraft | Duck boats    | 2023 | Q2      |     600 |  100 |
| Waterfowl watercraft | Duck boats    | 2023 | Q3      |     700 |  100 |
| Waterfowl watercraft | Duck boats    | 2023 | Q4      |     800 |  100 |
| Duck Duds            | Duck suits    | 2022 | Q1      |      10 |   10 |
| Duck Duds            | Duck suits    | 2022 | Q2      |      20 |   10 |
| Duck Duds            | Duck suits    | 2022 | Q3      |      30 |   10 |
| Duck Duds            | Duck suits    | 2022 | Q4      |      40 |   10 |
| Duck Duds            | Duck suits    | 2023 | Q1      |      50 |   10 |
| Duck Duds            | Duck suits    | 2023 | Q2      |      60 |   10 |
| Duck Duds            | Duck suits    | 2023 | Q3      |      70 |   10 |
| Duck Duds            | Duck suits    | 2023 | Q4      |      80 |   10 |
| Duck Duds            | Duck neckties | 2022 | Q1      |       1 |    1 |
| Duck Duds            | Duck neckties | 2022 | Q2      |       2 |    1 |
| Duck Duds            | Duck neckties | 2022 | Q3      |       3 |    1 |
| Duck Duds            | Duck neckties | 2022 | Q4      |       4 |    1 |
| Duck Duds            | Duck neckties | 2023 | Q1      |       5 |    1 |
| Duck Duds            | Duck neckties | 2023 | Q2      |       6 |    1 |
| Duck Duds            | Duck neckties | 2023 | Q3      |       7 |    1 |
| Duck Duds            | Duck neckties | 2023 | Q4      |       8 |    1 |

Next, we install the extension from the community repository:

```sql
INSTALL pivot_table FROM community;
LOAD pivot_table;
```

Now we can build pivot tables like the one below. 
There is a little bit of boilerplate required, and the details of how this works will be explained shortly.

```sql
DROP TYPE IF EXISTS columns_parameter_enum;

CREATE TYPE columns_parameter_enum AS ENUM (
    FROM build_my_enum(['business_metrics'],    -- table_names
                       ['year', 'quarter'],     -- columns
                       [])                      -- filters
);

FROM pivot_table(['business_metrics'],          -- table_names
                 ['sum(revenue)', 'sum(cost)'], -- values
                 ['product_line', 'product'],   -- rows
                 ['year', 'quarter'],           -- columns
                 [],                            -- filters
                 subtotals := 1,
                 grand_totals := 1,
                 values_axis := 'rows'
                 );
```

| product_line&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | product&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | value_names  | 2022_Q1 | 2022_Q2 | 2022_Q3 | 2022_Q4 | 2023_Q1 | 2023_Q2 | 2023_Q3 | 2023_Q4 |
| :----------------------------------------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------ | :----------- | :------ | :------ | :------ | :------ | :------ | :------ | :------ | :------ |
| Duck Duds                                                                                                                            | Duck neckties                                                                                                                   | sum(cost)    | 1       | 1       | 1       | 1       | 1       | 1       | 1       | 1       |
| Duck Duds                                                                                                                            | Duck neckties                                                                                                                   | sum(revenue) | 1       | 2       | 3       | 4       | 5       | 6       | 7       | 8       |
| Duck Duds                                                                                                                            | Duck suits                                                                                                                      | sum(cost)    | 10      | 10      | 10      | 10      | 10      | 10      | 10      | 10      |
| Duck Duds                                                                                                                            | Duck suits                                                                                                                      | sum(revenue) | 10      | 20      | 30      | 40      | 50      | 60      | 70      | 80      |
| Duck Duds                                                                                                                            | Subtotal                                                                                                                        | sum(cost)    | 11      | 11      | 11      | 11      | 11      | 11      | 11      | 11      |
| Duck Duds                                                                                                                            | Subtotal                                                                                                                        | sum(revenue) | 11      | 22      | 33      | 44      | 55      | 66      | 77      | 88      |
| Waterfowl watercraft                                                                                                                 | Duck boats                                                                                                                      | sum(cost)    | 100     | 100     | 100     | 100     | 100     | 100     | 100     | 100     |
| Waterfowl watercraft                                                                                                                 | Duck boats                                                                                                                      | sum(revenue) | 100     | 200     | 300     | 400     | 500     | 600     | 700     | 800     |
| Waterfowl watercraft                                                                                                                 | Subtotal                                                                                                                        | sum(cost)    | 100     | 100     | 100     | 100     | 100     | 100     | 100     | 100     |
| Waterfowl watercraft                                                                                                                 | Subtotal                                                                                                                        | sum(revenue) | 100     | 200     | 300     | 400     | 500     | 600     | 700     | 800     |
| Grand Total                                                                                                                          | Grand Total                                                                                                                     | sum(cost)    | 111     | 111     | 111     | 111     | 111     | 111     | 111     | 111     |
| Grand Total                                                                                                                          | Grand Total                                                                                                                     | sum(revenue) | 111     | 222     | 333     | 444     | 555     | 666     | 777     | 888     |

#### How the `pivot_table` Extension Works

The `pivot_table` extension is a collection of multiple scalar and table SQL macros.
This allows the logic to be modularized.
You can see below that the functions are used as building blocks to create more complex functions.
This is typically difficult to do in SQL, but it is easy in DuckDB!

The functions and a brief description of each follows.

##### Building Block Scalar Functions

* `nq`: “No quotes” – Escape semicolons in a string to prevent SQL injection
* `sq`: “Single quotes” – Wrap a string in single quotes and escape embedded single quotes
* `dq`: “Double quotes” – Wrap in double quotes and escape embedded double quotes
* `nq_list`: Escape semicolons for each string in a list. Uses `nq`.
* `sq_list`: Wrap each string in a list in single quotes. Uses `sq`.
* `dq_list`: Wrap each string in a list in double quotes. Uses `dq`.
* `nq_concat`: Concatenate a list of strings together with semicolon escaping. Uses `nq_list`.
* `sq_concat`: Concatenate a list of strings together, wrapping each in single quotes. Uses `sq_list`.
* `dq_concat`: Concatenate a list of strings together, wrapping each in double quotes. Uses `dq_list`.

##### Functions Creating During Refactoring for Modularity

* `totals_list`: Build up a list as a part of enabling `subtotals` and `grand_totals`.
* `replace_zzz`: Rename `subtotal` and `grand_total` indicators after sorting so they are more friendly.

##### Core Pivoting Logic Functions

* `build_my_enum`: Determine which new columns to create when pivoting horizontally. Returns a table. See below for details.
* `pivot_table`: Based on inputs, decide whether to call `no_columns`, `columns_values_axis_columns` or `columns_values_axis_rows`. Execute `query` on the SQL string that is generated. Returns a table. See below for details.
    * `no_columns`: Build up the SQL string for `query` to execute when no `columns` are pivoted out.
    * `columns_values_axis_columns`: Build up the SQL string for `query` to execute when pivoting horizontally with each entry in `values` receiving a separate column.
    * `columns_values_axis_rows`: Build up the SQL string for `query` to execute when pivoting horizontally with each entry in `values` receiving a separate row.
* `pivot_table_show_sql`: Return the SQL string that would have been executed by `query` for debugging purposes.

##### The `build_my_enum` Function

The first step in using the `pivot_table` extension's capabilities is to define an `ENUM` (a user-defined type) containing all of the new column names to create when pivoting horizontally called `columns_parameter_enum`.
DuckDB's automatic `PIVOT` syntax can automatically define this, but in our case, we need 2 explicit steps.
The reason for this is that automatic pivoting runs 2 statements behind the scenes, but a `MACRO` must only be a single statement.
If the `columns` parameter is not in use, this step is essentially a no-op, so it can be omitted or included for consistency (recommended).

The `query` and `query_table` functions only support `SELECT` statements (for security reasons), so the dynamic portion of the `ENUM` creation occurs in the function `build_my_enum`.
If this type of usage becomes common, features could be added to DuckDB to enable a `CREATE OR REPLACE` syntax for `ENUM` types, or possibly even temporary enums.
That would reduce this pattern from 3 statements down to 2.
Please let us know!

The `build_my_enum` function uses a combination of `query_table` to pull from multiple input tables, and the `query` function so that double quotes (and correct character escaping) can be completed prior to passing in the list of table names.
It uses a similar pattern to the core `pivot_table` function: build up a SQL query as a string, then call it with `query`.
The SQL string is constructed using list lambda functions and the building block functions for quoting.

##### The `pivot_table` Function

At its core, the `pivot_table` function determines the SQL required to generate the desired pivot based on which parameters are in use.

Since this SQL statement is a string at the end of the day, we can use a hierarchy of scalar SQL macros rather than a single large macro.
This is a common traditional issue with SQL – it tends to not be very modular or reusable, but we are able to compartmentalize our logic with DuckDB's syntax.

> **Note.** If a non-optional parameter is not in use, an empty list (` []`) should be passed in.

* `table_names`: A list of table or view names to aggregate or pivot. Multiple tables are combined with `UNION ALL BY NAME` prior to any other processing.
* `values`: A list of aggregation metrics in the format `['agg_fn_1(col_1)', 'agg_fn_2(col_2)', ...]`.
* `rows`: A list of column names to `SELECT` and `GROUP BY`.
* `columns`: A list of column names to `PIVOT` horizontally into a separate column per value in the original column. If multiple column names are passed in, only unique combinations of data that appear in the dataset are pivoted.
    * Ex: If passing in a `columns` parameter like `['continent', 'country']`, only valid `continent` / `country` pairs will be included.
    * (no `Europe_Canada` column would be generated).
* `filters`: A list of `WHERE` clause expressions to be applied to the raw dataset prior to aggregating in the format `['col_1 = 123', 'col_2 LIKE ''woot%''', ...]`.
    * The `filters` are combined with `AND`.
* `values_axis` (Optional): If multiple `values` are passed in, determine whether to create a separate row or column for each value. Either `rows` or `columns`, defaulting to `columns`.
* `subtotals` (Optional): If enabled, calculate the aggregate metric at multiple levels of detail based on the `rows` parameter. Either 0 or 1, defaulting to 0.
* `grand_totals` (Optional): If enabled, calculate the aggregate metric across all rows in the raw data in addition to at the granularity defined by `rows`. Either 0 or 1, defaulting to 0.

###### No Horizontal Pivoting (No `columns` in Use)

If not using the `columns` parameter, no columns need to be pivoted horizontally.
As a result, a `GROUP BY` statement is used.
If `subtotals` are in use, the `ROLLUP` expression is used to calculate the `values` at the different levels of granularity.
If `grand_totals` are in use, but not `subtotals`, the `GROUPING SETS` expression is used instead of `ROLLUP` to evaluate across all rows.

In this example, we build a summary of the `revenue` and `cost` of each `product_line` and `product`.

```sql
FROM pivot_table(['business_metrics'],
                 ['sum(revenue)', 'sum(cost)'],
                 ['product_line', 'product'],
                 [],
                 [],
                 subtotals := 1,
                 grand_totals := 1,
                 values_axis := 'columns'
                 );
```

| product_line         | product       | sum(revenue) | sum("cost") |
| -------------------- | ------------- | -----------: | ----------: |
| Duck Duds            | Duck neckties |           36 |           8 |
| Duck Duds            | Duck suits    |          360 |          80 |
| Duck Duds            | Subtotal      |          396 |          88 |
| Waterfowl watercraft | Duck boats    |         3600 |         800 |
| Waterfowl watercraft | Subtotal      |         3600 |         800 |
| Grand Total          | Grand Total   |         3996 |         888 |

###### Pivot Horizontally, One Column per Metric in `values`

Build up a `PIVOT` statement that will pivot out all valid combinations of raw data values within the `columns` parameter. 
If `subtotals` or `grand_totals` are in use, make multiple copies of the input data, but replace appropriate column names in the `rows` parameter with a string constant.
Pass all expressions in `values` to the `PIVOT` statement's `USING` clause so they each receive their own column.

We enhance our previous example to pivot out a separate column for each `year` / `value` combination:

```sql
DROP TYPE IF EXISTS columns_parameter_enum;

CREATE TYPE columns_parameter_enum AS ENUM (
    FROM build_my_enum(['business_metrics'],
                       ['year'],
                       [])
);

FROM pivot_table(['business_metrics'],
                 ['sum(revenue)', 'sum(cost)'],
                 ['product_line', 'product'],
                 ['year'],
                 [],
                 subtotals := 1,
                 grand_totals := 1,
                 values_axis := 'columns'
                 );
```

| product_line         | product       | 2022_sum(revenue) | 2022_sum("cost") | 2023_sum(revenue) | 2023_sum("cost") |
| -------------------- | ------------- | ----------------: | ---------------: | ----------------: | ---------------: |
| Duck Duds            | Duck neckties |                10 |                4 |                26 |                4 |
| Duck Duds            | Duck suits    |               100 |               40 |               260 |               40 |
| Duck Duds            | Subtotal      |               110 |               44 |               286 |               44 |
| Waterfowl watercraft | Duck boats    |              1000 |              400 |              2600 |              400 |
| Waterfowl watercraft | Subtotal      |              1000 |              400 |              2600 |              400 |
| Grand Total          | Grand Total   |              1110 |              444 |              2886 |              444 |

###### Pivot Horizontally, One Row per Metric in `values`

Build up a separate `PIVOT` statement for each metric in `values` and combine them with `UNION ALL BY NAME`. 
If `subtotals` or `grand_totals` are in use, make multiple copies of the input data, but replace appropriate column names in the `rows` parameter with a string constant.

To simplify the appearance slightly, we adjust one parameter in our previous query and set `values_axis := 'rows'`:

```sql
DROP TYPE IF EXISTS columns_parameter_enum;

CREATE TYPE columns_parameter_enum AS ENUM (
    FROM build_my_enum(['business_metrics'],
                       ['year'],
                       [])
);

FROM pivot_table(['business_metrics'],
                 ['sum(revenue)', 'sum(cost)'],
                 ['product_line', 'product'],
                 ['year'],
                 [],
                 subtotals := 1,
                 grand_totals := 1,
                 values_axis := 'rows'
                 );
```

| product_line         | product       | value_names  | 2022 | 2023 |
| -------------------- | ------------- | ------------ | ---: | ---: |
| Duck Duds            | Duck neckties | sum(cost)    |    4 |    4 |
| Duck Duds            | Duck neckties | sum(revenue) |   10 |   26 |
| Duck Duds            | Duck suits    | sum(cost)    |   40 |   40 |
| Duck Duds            | Duck suits    | sum(revenue) |  100 |  260 |
| Duck Duds            | Subtotal      | sum(cost)    |   44 |   44 |
| Duck Duds            | Subtotal      | sum(revenue) |  110 |  286 |
| Waterfowl watercraft | Duck boats    | sum(cost)    |  400 |  400 |
| Waterfowl watercraft | Duck boats    | sum(revenue) | 1000 | 2600 |
| Waterfowl watercraft | Subtotal      | sum(cost)    |  400 |  400 |
| Waterfowl watercraft | Subtotal      | sum(revenue) | 1000 | 2600 |
| Grand Total          | Grand Total   | sum(cost)    |  444 |  444 |
| Grand Total          | Grand Total   | sum(revenue) | 1110 | 2886 |

#### Conclusion

With DuckDB 1.1, sharing your SQL knowledge with the community has never been easier!
DuckDB's community extension repository is truly a package manager for the SQL language.
Macros in DuckDB are now highly reusable (thanks to `query` and `query_table`), and DuckDB's SQL syntax provides plenty of power to accomplish complex tasks.

Please let us know if the `pivot_table` extension is helpful to you – we are open to both contributions and feature requests!
Together we can write the ultimate pivoting capability just once and use it everywhere.

In the future, we have plans to further simplify the creation of SQL extensions.
Of course, we would love your feedback!
[Join us on Discord](https://discord.duckdb.org/) in the `community-extensions` channel.

Happy analyzing!

## DuckDB in Python in the Browser with Pyodide, PyScript, and JupyterLite

**Publication date:** 2024-10-02

**Author:** Alex Monahan

**TL;DR:** Run DuckDB in an in-browser Python environment to enable simple querying on remote files, interactive documentation, and easy to use training materials.



<script>
    document.addEventListener("DOMContentLoaded", async function() {
        for (let i=0; i<150; i++) {
            window.scrollTo({
                top: 0,
                left: 0,
                behavior: 'instant',
            });
            await new Promise(r => setTimeout(r, 10));
        }
    });
</script>

{:/nomarkdown}

#### Time to “Hello World”

The first time that you are using a new library, the most important thing is how quickly you can get to “Hello World”.

> **Note.** Want to see “Hello World”?
> [Jump to the **fully interactive** examples!]({% post_url 2024-10-02-pyodide %}#pyscript-editor)

Likewise, if someone is visiting any documentation you have written, you want them to quickly and easily get your tool up and running.
When you are giving a demo, you want to avoid "demo hell" and have it work the first try!

If you want to try “expert mode”, try leading an entire conference room of people through those setup steps!
The classroom or conference workshop environment makes it far more critical that installation be bulletproof.

Python is one of our favorite ways to use DuckDB, but Python is notoriously difficult to set up – doubly so for a novice programmer.
What the heck is a virtual environment?
Are you on Windows, Linux, or Mac?
Pip or Conda?
The new kid on the block uv?

Experienced Pythonistas are not immune either!
Many, like me, have been forced to celebrate the time honored and [xkcd-chronicled](https://xkcd.com/1987/) tradition of just wiping everything related to Python and starting from scratch.

How can we make it as easy and as fast as possible to test out DuckDB in Python?

#### Difficulties of Server-Side Python

One response to this challenge is to host a Python environment on a server for each of your users.
This has a number of issues.

Hosting Python on a server yourself is not free.
If you have many users, it can be far from free.

If you want to use a free solution like Google Colab, each visitor will need a Google account, and you'll need to be comfortable with Google accessing your data.
Plus, it is hard to embed within an existing web page for a seamless experience.

#### Enter Pyodide

[Pyodide](https://pyodide.org/) is a way to run Python directly in your browser with no installation and no setup, thanks to the power of WebAssembly.
That makes it the easiest and fastest way to get a Python environment up and running – just load a web page!
All computation happens locally, so it can be served like any static website with tools like GitHub Pages.
No server-side Python required!

Another benefit is that Pyodide is nicely sandboxed in the browser environment.
Each user gets their own workspace, and since it is all local, it is nice and secure.

Part of what sets Pyodide apart from other in-browser-Python approaches is that it can even run libraries that are written in C, C++, or even Fortran, including much of the Python data science stack.
This means that now you can use DuckDB in Pyodide as well!
You can even combine it with NumPy, SciPy, and Pandas (in addition to many pure-Python libraries).
PyArrow and Ibis have experimental support also.

#### Use Cases for Pyodide DuckDB

**Want to quickly analyze some remote data using either Python or DuckDB?**

Pyodide is the fastest way to get your questions answered using Python.

**Want to quickly analyze some local data?**

Pyodide can also [query local files](https://pyodide.org/en/stable/usage/accessing-files.html)!

**Want to make your documentation interactive?**

Let your users test out your DuckDB-powered library with ease.
We will see an example below that demonstrates the [`magic-duckdb` Jupyter plugin](https://github.com/iqmo-org/magic_duckdb) to enable SQL cells.

**Leading a training session with DuckDB and Python?**

Skip the hassles of local installation.
There is no need to work 1:1 with the 15% of folks in the audience with some quirky setup!
Everyone will get this to work on the first try, in seconds, so you can get to the content you want to teach.
Plus, it is free, with no signup required of any kind!

#### Pyodide Examples

We will cover multiple ways to embed Pyodide-powered-Python directly into your site, so your users can try out your new DuckDB-backed tool with a single click!

* **PyScript Editor:** An editor with nice syntax highlighting
* **JupyterLite Notebook:** A classic notebook environment
* **JupyterLite Lab IDE:** A full development environment

##### PyScript Editor

This HTML snippet will embed a runnable PyScript editor into any page!

```html
<script type="module" src="https://pyscript.net/releases/2024.8.2/core.js"></script>
<script type="py-editor" config='{"packages":["duckdb"]}'>
    import duckdb
    print(duckdb.sql("SELECT '42 in an editor' AS s").fetchall())
</script>
```

Just click the play button and you can execute a DuckDB query directly in the browser.
You can edit the code, add new lines, etc.
Try it out!



<script type="module" src="https://pyscript.net/releases/2024.8.2/core.js"></script>
<script type="py-editor" config='{"packages":["duckdb"]}'>
    import duckdb
    print(duckdb.sql("SELECT '42 in an editor' AS s").fetchall())
</script>

{:/nomarkdown}

##### JupyterLite Notebook

Here is an example of using an `iframe` that points to a JupyterLite environment that was deployed to GitHub Pages!

```html

```

This is a fully interactive Python notebook environment, with DuckDB running inside. 
Feel free to give it a run!





{:/nomarkdown}

Configuring a full JupyterLite environment is only a few steps!
The JupyterLite folks have built a demo page that serves as a template and have some [great documentation](https://jupyterlite.readthedocs.io/en/latest/quickstart/deploy.html).
The main steps are to:

1. Use the JupyterLite Demo Template to create your own repo
2. Enable GitHub Pages for that repo
3. Add and commit a .ipynb file in the `content` folder
4. Visit `https://⟨your_github_username⟩.github.io/⟨YOUR_REPOSITORY_NAME⟩/notebooks/index.html?path=⟨your_notebook_name⟩.ipynb`{:.language-sql .highlight}

Note that it can take a couple of minutes for GitHub Pages to deploy.
You can monitor the progress on GitHub's Actions tab.

##### JupyterLite Lab IDE

After following the steps in the JupterLite Notebook setup, if you change your URL from `/notebooks/` to `/lab/`, you can have a full IDE experience instead!
This form factor is a bit harder to embed in another page, but great for interactive use.

This example uses the [`magic-duckdb` Jupyter extension](https://github.com/iqmo-org/magic_duckdb) that allows us to create SQL cells using `%%dql`.

[Follow this link to see the Lab IDE interface](https://alex-monahan.github.io/jupyterlite_duckdb_demo/lab/index.html?path=magic_duckdb.ipynb), or experiment with the Notebook-style version below.

```html

```





{:/nomarkdown}

#### Architecture of DuckDB in Pyodide

So how does DuckDB work in Pyodide exactly?
The DuckDB Python client is compiled to WebAssembly (Wasm) in its entirety.
This is different than the existing [DuckDB Wasm](https://github.com/duckdb/duckdb-wasm) approach, since that is compiling the C++ side of the library only and wrapping it with a JavaScript API.
Both approaches use the Emscripten toolchain to do the Wasm compilation.
It is DuckDB's design decision to avoid dependencies and the prior investments in DuckDB-Wasm that made this feasible to build in such a short period of time!

The Pyodide team has added DuckDB to their hosted repository of libraries, and even set up DuckDB to run as a part of their CI/CD workflow.
That is what enables JupyterLite to simply run `%pip install duckdb`, and PyScript to specify DuckDB as a package in the `py-editor config` parameter or in the `<py-config>` tag.
Pyodide then downloads the Wasm-compiled version of the DuckDB library from Pyodide's repository.
We want to send a big thank you to the Pyodide team, including [Hood Chatham](https://github.com/hoodmane) and [Gyeongjae Choi](https://github.com/ryanking13), as well as the Voltron Data team including [Phillip Cloud](https://github.com/cpcloud) for leading the effort to get this to work.

##### Limitations

Running in the browser is a more restrictive environment (for security purposes), so there are some limitations when using DuckDB in Pyodide.
There is no free lunch!

* Single-threaded
    * Pyodide currently limits execution to a single thread
* A few extra steps to query remote files
    * Remote files can't be accessed by DuckDB directly
    * Instead, pull the files locally with Pyodide first
    * DuckDB-Wasm has custom enhancements to make this possible, but these are not present in DuckDB's Python client
* No runtime-loaded extensions
    * Several extensions are automatically included: `parquet`, `json`, `icu`, `tpcds`, and `tpch`.
* Release cadence aligned with Pyodide
    * At the time of writing, duckdb-pyodide is at 1.0.0 rather than 1.1.1

#### Conclusion

Pyodide is now the fastest way to use Python and DuckDB together!
It is also an approach that scales to an arbitrary number of users because Pyodide's computations happen entirely locally.

We have seen how to embed Pyodide in a static site in multiple ways, as well as how to read remote files.

If you are excited about DuckDB in Pyodide, feel free to join us on Discord.
We have a `#show-and-tell` channel where you can share what you build with the community.
You are also welcome to explore the [duckdb-pyodide repo](https://github.com/duckdb/duckdb-pyodide) and report any issues you find.
We would also really love some help with enabling runtime-loaded extensions – please reach out if you can help!

Happy quacking about!

## DuckDB User Survey Analysis

**Publication date:** 2024-10-04

**Author:** Gábor Szárnyas

**TL;DR:** We share the findings from a survey of 500+ DuckDB users.

Earlier this year, we conducted a survey in the DuckDB community.
We were mostly curious about the following topics:

1. How do people use DuckDB?
2. Where do people use DuckDB?
3. What do they like about DuckDB?
4. What improvements would they like to see in future releases?

The survey was open for about three weeks. More than 500 people submitted their answers, and we raffled 20 t-shirts and hoodies among the participants.

#### Summary

We summarize the key findings of the survey below:

* Users run DuckDB most often on laptops but servers are also very popular.
* The most popular clients are the Python API and the standalone CLI client.
* Most users don't have huge data sets but they appreciate high performance very much.
* Users would like to see performance optimizations related to time series and partitioned data.
* DuckDB is popular among data engineers, analysts and scientists, and also among software engineers.

Let's dive into the details!

#### Using DuckDB

##### Environments

We asked users about the environment where DuckDB is deployed and found that most of them, 87%, run DuckDB on their laptops.
This is in line with the vision that originally drove the creation of DuckDB: creating a system that harnesses the power of hardware available in modern end-user devices.
29% run DuckDB on desktop workstations, and 58% run it on servers (see the breakdown later in the [“Server Types” section](#::server-types)).

![DuckDB environments](../images/blog/survey/environments.svg)

##### Clients

[Unsurprisingly](https://www.tiobe.com/tiobe-index/python/), DuckDB is most often used from Python (73%), followed by the [standalone command-line application](#docs:lts:clients:cli:overview) (47%).
The third spot is hotly contested with R, WebAssembly (!) and Java all achieving around 14%, followed by Node.js (Javascript) at 9%.

![DuckDB clients](../images/blog/survey/clients.svg)

The next few places, with 6-7% each, are occupied by ODBC, Rust, and Go.
Finally, Arrow (ADBC) rounds off the top 10 with 5%.

##### Operating Systems

We found that most users, 61%, run DuckDB on Linux servers.
These deployments include cloud instances, on-premises installations, and continuous integration (CI) runners.
Windows desktop and macOS have a similar share of users, 41–45%.
A further 9% run DuckDB on Windows servers.

![DuckDB platforms](../images/blog/survey/platforms.svg)

We found the number of Linux desktop users quite striking.
While the overall [market share of Linux desktop is around 4.5%](https://gs.statcounter.com/os-market-share/desktop/worldwide/2024),
_29% of respondents indicated that they run DuckDB on Linux desktop!_
We suspect that this is thanks to DuckDB's [popularity among data engineers](#::user-roles),
who often use Linux desktop due to its customizability and similarity to the Linux server-based deployment environments.

##### Server Types

As we discussed in the [“Environments” section](#::environments), DuckDB is often run on servers.
But how big are these servers, and where are they operated?
Both small servers (less than 16 GB of memory) and medium-sized servers (16-512 GB of memory) are popular, with 56% and 61% of users reporting that they run DuckDB on these.
About 14% of respondents run DuckDB on servers with more than 0.5 TB of memory.

![Server size](../images/blog/survey/server-sizes.svg)

Regarding _where_ the servers run, on-premises deployments and AWS are neck-and-neck with 27%.
They are followed by two other clouds, Microsoft Azure and the Google Cloud Platform.
Finally, about 4% of users run DuckDB on Hetzner servers.

![Server premises](../images/blog/survey/server-premises.svg)

#### Data

##### Data Formats

We inquired about the data formats used when working with DuckDB.
Parquet is the most popular format: 79% of users reporting to use it.
CSV is a close second with 73%.
JSON is also popular with vanilla JSON achieving 42% and NDJSON 11%.
About ⅓ reported to use Arrow.

![Data formats](../images/blog/survey/data-formats.svg)

##### Dataset Sizes

We asked users about the size of the largest dataset they processed with DuckDB. We defined _dataset size_ as the size of the data when stored in uncompressed CSV format.
For Parquet files and DuckDB database files, we asked users to approximate the CSV size by multiplying their file sizes by 5.

The responses showed that only a few respondents use DuckDB to process [Big Data](https://motherduck.com/blog/big-data-is-dead/).
For ¾ of users, their largest dataset size was less than 100 GB data,
20% of users processed a dataset between 100 GB and 1 TB, and approximately 5% of the users ventured into the 1 TB+ territory.
About 1% processed 10 TB+ datasets.
These findings are in line with [statistics derived from a recent RedShift usage dataset](https://motherduck.com/blog/redshift-files-hunt-for-big-data/#whos-got-big-data) by [Jordan Tigani of MotherDuck](https://motherduck.com/authors/jordan-tigani/), and the recent analysis of the [Snowflake and RedShift datasets](https://www.fivetran.com/blog/how-do-people-use-snowflake-and-redshift) by [George Fraser of Fivetran](https://www.fivetran.com/people/george-fraser).

![Dataset sizes](../images/blog/survey/dataset_sizes.svg)

While these results obviously are somewhat biased – users who need to crunch through huge datasets may not work with DuckDB (yet!) –, the skew towards smaller datasets is quite significant and shows that many real-world use cases can be tackled using small to medium-sized datasets. The results also show that DuckDB *can* solve many problems on datasets larger than 1 TB.

#### Features

##### Most Liked Features

We were curious: what do users like most about DuckDB? The plot shows the most frequent responses:
![Most liked DuckDB features](../images/blog/survey/most_liked_features.svg)

The most liked feature is **high performance**.
Users also enjoy **file format support** (CSV, Parquet, JSON, etc.),
**ease of use**,
**extensive SQL support** (including [friendly SQL](#docs:lts:sql:dialect:friendly_sql))
and **in-memory integrations** such as support for Pandas, Arrow and NumPy.
Finally, users mentioned low memory usage, protocol support (e.g., HTTPS, S3), database integrations, and portability.

##### Feature Requests

We asked users about the features that they'd most like to see in future DuckDB versions. The most popular requests are listed in the table below:

| Feature                                                                 | Percentage |
| :---------------------------------------------------------------------- | ---------: |
| Improved partitioning and optimizations related to partitioning         |        39% |
| Improved support for time series and optimizations for pre-sorted data  |        35% |
| Support for materialized views                                          |        28% |
| Support for vector search                                               |        24% |
| Support for attaching to database systems via ODBC                      |        24% |
| Support for time travel queries (query the database as of a given time) |        23% |
| Support for the Delta Lake format                                       |        22% |
| Improved support for Iceberg (including writes)                         |        17% |

We are happy to report that, since the survey was conducted pre-v1.0.0 and DuckDB is now at version 1.1.1, some of these requests are already a reality:

* Reading Delta Lake is now possible via the [`delta` extension](https://duckdb.org/2024/06/10/delta).
* Vector search is now supported via the [`vss` extension](https://duckdb.org/2024/05/03/vector-similarity-search-vss).

For the rest of the requested features, several ones are in the making at DuckDB Labs. Stay tuned!

#### User Roles

We asked respondents to indicate their main roles in their organization. The top-5 answers were as follows:

![User roles](../images/blog/survey/roles.svg)

It's no surprise that DuckDB is popular in the “data” roles: 26% of the respondents are data engineers, 14% are data scientists, and 9% are data analysts.
The form had a surprisingly high share of software engineers, 23%.
Finally, about 2% of respondents indicated that their primary role is DBA.

#### Conclusion

We would like to thank all participants for taking the time to complete the survey.
We will use the answers to guide the future development of DuckDB, and we hope that readers of this analysis find it informative as well.

## Analyzing Open Government Data with duckplyr

**Publication date:** 2024-10-09

**Author:** Hannes Mühleisen

**TL;DR:** We use the duckplyr R library to clean and analyze an Open Data set published by the government of New Zealand.

> For the duckplyr documentation, visit [`duckplyr.tidyverse.org`](https://duckplyr.tidyverse.org/).

![](../images/blog/duckplyr/duckplyr-logo-light.svg)



Wrangling data by throwing SQL strings at it is not the most ergonomic way to perform interactive data analysis in R. For a while now, we have been working with the dplyr project team at [Posit](https://posit.co/) (formerly RStudio) and Kirill Müller to develop _duckplyr_. [duckplyr](https://duckplyr.tidyverse.org) is a high-performance drop-in replacement for dplyr, powered by DuckDB. You can read more about duckplyr in the [announcement blog post](https://duckdb.org/2024/04/02/duckplyr). In this post, we are going to walk through a challenging real-world use case with duckplyr. For those of you wishing to follow along, we have prepared a [Google Colab notebook](https://colab.research.google.com/drive/1PxvkZ4FpMNtP-CpKpz5hvH-xKgaYC3-S) with all the code snippets in this post. Timings reported below are also from Colab.

Like many government statistics agencies, New Zealand's “Stats NZ Tatauranga Aotearoa” thankfully provides some of the datasets they maintain as [Open Data for download](https://www.stats.govt.nz/large-datasets/csv-files-for-download/). The largest file available for download on that page contains “Age and sex by ethnic group (grouped total responses), for census usually resident population counts, 2006, 2013, and 2018 Censuses”, [CSV zipped file](https://www3.stats.govt.nz/2018census/Age-sex-by-ethnic-group-grouped-total-responses-census-usually-resident-population-counts-2006-2013-2018-Censuses-RC-TA-SA2-DHB.zip).

We can download that file (mirrored from our CDN, we don't want to DDoS poor Stats NZ) and unzip like so:

```R
download.file("https://blobs.duckdb.org/nzcensus.zip", "nzcensus.zip")
unzip("nzcensus.zip")
```

Let's explore the CSV files in the zip and what their sizes are:

```R
file.info(Sys.glob("*.csv"))["size"]
```

```text
                               size
Data8277.csv              857672667
DimenLookupAge8277.csv         2720
DimenLookupArea8277.csv       65400
DimenLookupEthnic8277.csv       272
DimenLookupSex8277.csv           74
DimenLookupYear8277.csv          67
```

As we can see, there is one large (~800 MB) `Data` file and a bunch of `Dimen...` dimension files. This is a fairly common data layout, sometimes called a [“star schema”](https://en.wikipedia.org/wiki/Star_schema). From this, it's clear there are some joins in our future. But first lets focus on the main file, `Data8277.csv`. Reading sizeable CSV files is not trivial and can be very frustrating. But enough whinging, as the Kiwis would say.

To start with, let's just have a quick look what the file looks like:

```R
cat(paste(readLines("Data8277.csv", n=10), collapse="\n"))
```

```text
Year,Age,Ethnic,Sex,Area,count
2018,000,1,1,01,795
2018,000,1,1,02,5067
2018,000,1,1,03,2229
2018,000,1,1,04,1356
2018,000,1,1,05,180
2018,000,1,1,06,738
2018,000,1,1,07,630
2018,000,1,1,08,1188
2018,000,1,1,09,2157
```

So far this looks rather tame, there seem to be five columns. Thankfully, they have names. From just eyeballing the column values, it looks like they are all numeric and even integer values. However, looks can be deceiving, and the columns `Age`, `Area`, `count` contain character values somewhere down the line. Fun fact: we have to wait till line 431&nbsp;741 until the `Area` column contains a non-integer value. Clearly we need a good CSV parser. R has no shortage of CSV readers, for example the `readr` package contains a flexible CSV parser. Reading this file with `readr` takes about a minute (on Colab).

But let's now start using DuckDB and duckplyr. First, we install duckplyr (and DuckDB, which is a dependency):

```R
install.packages("duckplyr")
duckdb:::sql("SELECT version()")
```

This command prints out the installed DuckDB version, as of this writing the latest version on [CRAN](https://cran.r-project.org/web/packages/duckdb/index.html) is 1.1.0. We can now use DuckDB's advanced data wrangling capabilities. First off, DuckDB contains probably the [world's most advanced CSV parser](https://duckdb.org/2023/10/27/csv-sniffer.html). For the extra curious, [here is a presentation on DuckDB's CSV parser](https://www.youtube.com/watch?v=YrqSp8m7fmk). We use DuckDB's CSV reader to only read the first 10 rows from the CSV file:

```R
duckdb:::sql("FROM Data8277.csv LIMIT 10")
```

```text
   Year Age Ethnic Sex Area count
1  2018 000      1   1   01   795
2  2018 000      1   1   02  5067
3  2018 000      1   1   03  2229
4  2018 000      1   1   04  1356
5  2018 000      1   1   05   180
6  2018 000      1   1   06   738
7  2018 000      1   1   07   630
8  2018 000      1   1   08  1188
9  2018 000      1   1   09  2157
10 2018 000      1   1   12   177
```

This only takes a few milliseconds because DuckDB's CSV reader produces results in a streaming fashion, and because we have only requested 10 rows we are done fairly quickly.

DuckDB can also print out the schema it detected from the CSV file using the `DESCRIBE` keyword:

```R
duckdb:::sql("DESCRIBE FROM Data8277.csv")
```

```text
  column_name column_type ...
1        Year      BIGINT ...
2         Age     VARCHAR ...
3      Ethnic      BIGINT ...
4         Sex      BIGINT ...
5        Area     VARCHAR ...
6       count     VARCHAR ...
```

We can see that we have correctly detected the various data types for the columns. We can use the `SUMMARIZE` keyword to compute various summary statistics for all the columns in the file:

```R
duckdb:::sql("SUMMARIZE FROM Data8277.csv")
```

This will take a little bit longer, but the results are very interesting:

```text
# A tibble: 6 × 12
  column_name column_type min   max     approx_unique avg      std   q25   q50
  <chr>       <chr>       <chr> <chr>           <dbl> <chr>    <chr> <chr> <chr>
1 Year        BIGINT      2006  2018                3 2012.33… 4.92… 2006  2013
2 Age         VARCHAR     000   999999            149 NA       NA    NA    NA
3 Ethnic      BIGINT      1     9999               11 930.545… 2867… 3     6
4 Sex         BIGINT      1     9                   3 4.0      3.55… 1     2
5 Area        VARCHAR     001   DHB9999          2048 NA       NA    NA    NA
6 count       VARCHAR     ..C   9999            16825 NA       NA    NA    NA
# ℹ 3 more variables: q75 <chr>, count <dbl>, null_percentage <dbl>
```

This will show again the column names and their types, but also the summary statistics for minimum and maximum value, approximate count of unique values, average, standard deviations, 25, 50, and 75 quantiles, and percentage of NULL/NA values. So one gets a pretty good overview of what the data is like.

But we're not here to ogle summary statistics, we want to do actual analysis of the data. In this use case, we would like to compute the number of non-Europeans between 20 and 40 that live in the Auckland area using the 2018 census data and the results should be grouped by sex. To do so, we need to join the dimension CSV files with the main data file in order to properly filter the dimension values. In SQL, the lingua franca of large-scale data analysis, this looks like this:

We first join everything together:

```sql
FROM 'Data8277.csv' data
JOIN 'DimenLookupAge8277.csv' age ON data.Age = age.Code
JOIN 'DimenLookupArea8277.csv' area ON data.Area = area.Code
JOIN 'DimenLookupEthnic8277.csv' ethnic ON data.Ethnic = ethnic.Code
JOIN 'DimenLookupSex8277.csv' sex ON data.Sex = sex.Code
JOIN 'DimenLookupYear8277.csv' year ON data.Year = year.Code
```

Next, we use the `SELECT` projection to perform some basic renames and data cleaning:

```sql
SELECT
    year.Description AS year_,
    area.Description AS area_,
    ethnic.Description AS ethnic_,
    sex.Description AS sex_,
    TRY_CAST(replace(age.Description, ' years', '') AS INTEGER) AS age_,
    TRY_CAST(data.count AS INTEGER) AS count_
```

The data set contains various totals, so we remove them before proceeding:

```sql
WHERE count_ > 0
  AND age_ IS NOT NULL
  AND area_ NOT LIKE 'Total%'
  AND ethnic_ NOT LIKE 'Total%'
  AND sex_ NOT LIKE 'Total%'
```

We wrap the previous statements as a common-table-expression `expanded_cleaned_data`, and we can then compute the actual aggregation using DuckDB

```sql
SELECT sex_, sum(count_) AS group_count
FROM expanded_cleaned_data
WHERE age_ BETWEEN 20 AND 40
  AND area_ LIKE 'Auckland%'
  AND ethnic_ <> 'European'
  AND year_ = 2018
GROUP BY sex_
ORDER BY sex_
```

This takes ca. 20 s on the limited Colab free tier compute. The result is:

```text
    sex_ group_count
1 Female      398556
2   Male      397326
```

So far, so good. However, writing SQL queries is not for everyone. The ergonomics of creating SQL strings in an interactive data analysis environment like R are questionable to say the least. Frameworks like `dplyr` have shown how data wrangling ergonomics can be massively improved. Let's express our analysis using dplyr then after first reading the data into RAM from CSV:

```R
library(dplyr)

data   <- readr::read_csv("Data8277.csv")
age    <- readr::read_csv("DimenLookupAge8277.csv")
area   <- readr::read_csv("DimenLookupArea8277.csv")
ethnic <- readr::read_csv("DimenLookupEthnic8277.csv")
sex    <- readr::read_csv("DimenLookupSex8277.csv")
year   <- readr::read_csv("DimenLookupYear8277.csv")

expanded_cleaned_data <- data |>
  filter(grepl("^\\d+$", count)) |>
  mutate(count_ = as.integer(count)) |>
  filter(count_ > 0) |>
  inner_join(
    age |>
      filter(grepl("^\\d+ years$", Description)) |>
      mutate(age_ = as.integer(Code)),
    join_by(Age == Code)
  ) |>
  inner_join(area |>
    mutate(area_ = Description) |>
    filter(!grepl("^Total", area_)), join_by(Area == Code)) |>
  inner_join(ethnic |>
    mutate(ethnic_ = Description) |>
    filter(!grepl("^Total", ethnic_)), join_by(Ethnic == Code)) |>
  inner_join(sex |>
    mutate(sex_ = Description) |>
    filter(!grepl("^Total", sex_)), join_by(Sex == Code)) |>
  inner_join(year |> mutate(year_ = Description), join_by(Year == Code))

# create final aggregation, still completely lazily
twenty_till_forty_non_european_in_auckland_area <-
  expanded_cleaned_data |>
  filter(
    age_ >= 20, age_ <= 40,
    grepl("^Auckland", area_),
    year_ == "2018",
    ethnic_ != "European"
  ) |>
  summarise(group_count = sum(count_), .by = sex_) |> arrange(sex_)

print(twenty_till_forty_non_european_in_auckland_area)
```

This looks nicer and completes in ca. one minute, but there are several hidden issues. First, we read the _entire_ dataset into RAM. While for this dataset this is likely possible because most computers have more than 1 GB of RAM, this will of course not work for larger datasets. Then, we execute a series of dplyr verbs. However, dplyr executes those eagerly, meaning it does not holistically optimize the sequence of verbs. For example, it cannot see that we are filtering out all non-European ethnicities in the last step and happily computes all of those for the intermediate result. The same happens with survey years that are not 2018, only in the last step we filter those out. We have computed an expensive join on all other years for nothing. Depending on data distributions, this can be extremely wasteful. And yes, it is possible to manually move the filters around but this is tedious and error-prone. At least the result is exactly the same as the SQL version above:

```text
# A tibble: 2 × 2
  sex_   group_count
  <chr>        <int>
1 Female      398556
2 Male        397326
```

Now we switch the exact same script over to duckplyr. Instead of reading the CSV files into RAM entirely using `readr`, we instead use the `duckplyr_df_from_csv` function from `duckplyr`:

```R
library("duckplyr")

data   <- duckplyr_df_from_csv("Data8277.csv")
age    <- duckplyr_df_from_csv("DimenLookupAge8277.csv")
area   <- duckplyr_df_from_csv("DimenLookupArea8277.csv")
ethnic <- duckplyr_df_from_csv("DimenLookupEthnic8277.csv")
sex    <- duckplyr_df_from_csv("DimenLookupSex8277.csv")
year   <- duckplyr_df_from_csv("DimenLookupYear8277.csv")
```

This takes exactly 0 seconds, because duckplyr is not actually doing much. We detect the schema of the CSV files using our award-winning “sniffer”, and create the six placeholder objects for each of those files. Part of the unique design of duckplyr is that those objects are “Heisenbergian”, they behave like completely normal R `data.frame`s once they are treated as such, but they can _also_ act as lazy evaluation placeholders when they are passed to downstream analysis steps. This is made possible by a little-known R feature known as `ALTREP` which allows R vectors to be computed on-demand among other things.

Now we re-run the exact same dplyr pipeline as above. Only this time we are “done” in less than a second. This is because all we have done is _lazily_ constructing a so-called relation tree which encapsulates the entirety of the transformations. This allows _holistic_ optimization, for example pushing the year and ethnicity all the way down to the reading of the CSV file _before_ joining. We can also eliminate the reading of columns that are not used in the query at all.

Only when we finally print the result

```R
print(twenty_till_forty_non_european_in_auckland_area)
```

actual computation is triggered. This finishes in the same time as the hand-rolled SQL query above, only that this time we had a much more pleasant experience from using the dplyr syntax. And, thankfully, the result is still exactly the same.

This use case was also presented as part of [my keynote at this year's posit::conf](https://www.youtube.com/watch?v=GELhdezYmP0):



Finally, we should note that duckplyr is still being developed. We have taken great care in not breaking anything and will fall back on the existing dplyr implementation if anything cannot be run in DuckDB (yet). But we would love to [hear from you](https://github.com/tidyverse/duckplyr/issues) if anything does not work as expected.

## DuckDB Tricks – Part 2

**Publication date:** 2024-10-11

**Author:** Gábor Szárnyas

**TL;DR:** We continue our “DuckDB tricks” series, focusing on queries that clean, transform and summarize data.

#### Overview

This post is the latest installment of the [DuckDB Tricks series](https://duckdb.org/2024/08/19/duckdb-tricks-part-1), where we show you nifty SQL tricks in DuckDB.
Here’s a summary of what we’re going to cover:

| Operation                                                                 | SQL instructions                                                                                                            |
| ------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------- |
| [Fixing timestamps in CSV files](#::fixing-timestamps-in-csv-files)         | `regexp_replace()`{:.language-sql .highlight} and `strptime()`{:.language-sql .highlight}                                   |
| [Filling in missing values](#::filling-in-missing-values)                   | `CROSS JOIN`{:.language-sql .highlight}, `LEFT JOIN`{:.language-sql .highlight} and `coalesce()`{:.language-sql .highlight} |
| [Repeated transformation steps](#::repeated-data-transformation-steps)      | `CREATE OR REPLACE TABLE t AS ... FROM t ...`{:.language-sql .highlight}                                                    |
| [Computing checksums for columns](#::computing-checksums-for-columns)       | `bit_xor(md5_number(COLUMNS(*)::VARCHAR))`{:.language-sql .highlight}                                                       |
| [Creating a macro for checksum](#::creating-a-macro-for-the-checksum-query) | `CREATE MACRO checksum(tbl) AS TABLE ...`{:.language-sql .highlight}                                                        |

#### Dataset

For our example dataset, we’ll use `schedule.csv`, a hand-written CSV file that encodes a conference schedule. The schedule contains the timeslots, the locations and the events scheduled.

```csv
timeslot,location,event
2024-10-10 9am,room Mallard,Keynote
2024-10-10 10.30am,room Mallard,Customer stories
2024-10-10 10.30am,room Fusca,Deep dive 1
2024-10-10 12.30pm,main hall,Lunch
2024-10-10 2pm,room Fusca,Deep dive 2
```

#### Fixing Timestamps in CSV Files

As usual in real use case, the input CSV is messy with irregular timestamps such as `2024-10-10 9am`.
Therefore, if we load the `schedule.csv` file using DuckDB’s CSV reader, the CSV sniffer will detect the first column as a `VARCHAR` field:

```sql
CREATE TABLE schedule_raw AS
    SELECT * FROM 'https://duckdb.org/data/schedule.csv';

SELECT * FROM schedule_raw;
```

```text
┌────────────────────┬──────────────┬──────────────────┐
│      timeslot      │   location   │      event       │
│      varchar       │   varchar    │     varchar      │
├────────────────────┼──────────────┼──────────────────┤
│ 2024-10-10 9am     │ room Mallard │ Keynote          │
│ 2024-10-10 10.30am │ room Mallard │ Customer stories │
│ 2024-10-10 10.30am │ room Fusca   │ Deep dive 1      │
│ 2024-10-10 12.30pm │ main hall    │ Lunch            │
│ 2024-10-10 2pm     │ room Fusca   │ Deep dive 2      │
└────────────────────┴──────────────┴──────────────────┘
```

Ideally, we would like the `timeslot` column to have the type `TIMESTAMP` so we can treat it as a timestamp in the queries later. To achieve this, we can use the table we just loaded and fix the problematic entities by using a regular expression-based search and replace operation, which unifies the format to `hours.minutes` followed by `am` or `pm`. Then, we convert the string to timestamps using [`strptime`](#docs:lts:sql:functions:dateformat::strptime-examples) with the `%p` format specifier capturing the `am`/`pm` part of the string.

```sql
CREATE TABLE schedule_cleaned AS
    SELECT
        timeslot
            .regexp_replace(' (\d+)(am|pm)$', ' \1.00\2')
            .strptime('%Y-%m-%d %H.%M%p') AS timeslot,
        location,
        event
    FROM schedule_raw;
```

Note that we use the [dot operator for function chaining](#docs:lts:sql:functions:overview::function-chaining-via-the-dot-operator) to improve readability. For example, `regexp_replace(string, pattern, replacement)` is formulated as `string.regexp_replace(pattern, replacement)`. The result is the following table:

```text
┌─────────────────────┬──────────────┬──────────────────┐
│      timeslot       │   location   │      event       │
│      timestamp      │   varchar    │     varchar      │
├─────────────────────┼──────────────┼──────────────────┤
│ 2024-10-10 09:00:00 │ room Mallard │ Keynote          │
│ 2024-10-10 10:30:00 │ room Mallard │ Customer stories │
│ 2024-10-10 10:30:00 │ room Fusca   │ Deep dive 1      │
│ 2024-10-10 12:30:00 │ main hall    │ Lunch            │
│ 2024-10-10 14:00:00 │ room Fusca   │ Deep dive 2      │
└─────────────────────┴──────────────┴──────────────────┘
```

#### Filling in Missing Values

Next, we would like to derive a schedule that includes the full picture: *every timeslot* for *every location* should have its line in the table. For the timeslot-location combinations, where there is no event specified, we would like to explicitly add a string that says `<empty>`.

To achieve this, we first create a table `timeslot_location_combinations` containing all possible combinations using a `CROSS JOIN`. Then, we can connect the original table on the combinations using a `LEFT JOIN`. Finally, we replace `NULL` values with the `<empty>` string using the [`coalesce` function](#docs:lts:sql:functions:utility::coalesceexpr-).

> The `CROSS JOIN` clause is equivalent to simply listing the tables in the `FROM` clause without specifying join conditions. By explicitly spelling out `CROSS JOIN`, we communicate that we intend to compute a Cartesian product – which is an expensive operation on large tables and should be avoided in most use cases.

```sql
CREATE TABLE timeslot_location_combinations AS 
    SELECT timeslot, location
    FROM (SELECT DISTINCT timeslot FROM schedule_cleaned)
    CROSS JOIN (SELECT DISTINCT location FROM schedule_cleaned);

CREATE TABLE schedule_filled AS
    SELECT timeslot, location, coalesce(event, '<empty>') AS event
    FROM timeslot_location_combinations
    LEFT JOIN schedule_cleaned
        USING (timeslot, location)
    ORDER BY ALL;

SELECT * FROM schedule_filled;
```

```text
┌─────────────────────┬──────────────┬──────────────────┐
│      timeslot       │   location   │      event       │
│      timestamp      │   varchar    │     varchar      │
├─────────────────────┼──────────────┼──────────────────┤
│ 2024-10-10 09:00:00 │ main hall    │ <empty>          │
│ 2024-10-10 09:00:00 │ room Fusca   │ <empty>          │
│ 2024-10-10 09:00:00 │ room Mallard │ Keynote          │
│ 2024-10-10 10:30:00 │ main hall    │ <empty>          │
│ 2024-10-10 10:30:00 │ room Fusca   │ Deep dive 1      │
│ 2024-10-10 10:30:00 │ room Mallard │ Customer stories │
│ 2024-10-10 12:30:00 │ main hall    │ Lunch            │
│ 2024-10-10 12:30:00 │ room Fusca   │ <empty>          │
│ 2024-10-10 12:30:00 │ room Mallard │ <empty>          │
│ 2024-10-10 14:00:00 │ main hall    │ <empty>          │
│ 2024-10-10 14:00:00 │ room Fusca   │ Deep dive 2      │
│ 2024-10-10 14:00:00 │ room Mallard │ <empty>          │
├─────────────────────┴──────────────┴──────────────────┤
│ 12 rows                                     3 columns │
└───────────────────────────────────────────────────────┘
```

We can also put everything together in a single query using a [`WITH` clause](#docs:lts:sql:query_syntax:with):

```sql
WITH timeslot_location_combinations AS (
    SELECT timeslot, location
    FROM (SELECT DISTINCT timeslot FROM schedule_cleaned)
    CROSS JOIN (SELECT DISTINCT location FROM schedule_cleaned)
)
SELECT timeslot, location, coalesce(event, '<empty>') AS event
FROM timeslot_location_combinations
LEFT JOIN schedule_cleaned
    USING (timeslot, location)
ORDER BY ALL;
```

#### Repeated Data Transformation Steps

Data cleaning and transformation usually happens as a sequence of transformations that shape the data into a form that’s best fitted to later analysis.
These transformations are often done by defining newer and newer tables using [`CREATE TABLE ... AS SELECT` statements](#docs:lts:sql:statements:create_table::create-table--as-select-ctas).

For example, in the sections above, we created `schedule_raw`, `schedule_cleaned`, and `schedule_filled`. If, for some reason, we want to skip the cleaning steps for the timestamps, we have to reformulate the query computing `schedule_filled` to use `schedule_raw` instead of `schedule_cleaned`. This can be tedious and error-prone, and it results in a lot of unused temporary data – data that may accidentally get picked up by queries that we forgot to update!

In interactive analysis, it’s often better to use the same table name by running [`CREATE OR REPLACE` statements](#docs:lts:sql:statements:create_table::create-or-replace):

```sql
CREATE OR REPLACE TABLE ⟨table_name⟩ AS
    ...
    FROM ⟨table_name⟩
    ...;
```

Using this trick, we can run our analysis as follows:

```sql
CREATE OR REPLACE TABLE schedule AS
    SELECT * FROM 'https://duckdb.org/data/schedule.csv';

CREATE OR REPLACE TABLE schedule AS
    SELECT
        timeslot
            .regexp_replace(' (\d+)(am|pm)$', ' \1.00\2')
            .strptime('%Y-%m-%d %H.%M%p') AS timeslot,
        location,
        event
    FROM schedule;

CREATE OR REPLACE TABLE schedule AS
    WITH timeslot_location_combinations AS (
        SELECT timeslot, location
        FROM (SELECT DISTINCT timeslot FROM schedule)
        CROSS JOIN (SELECT DISTINCT location FROM schedule)
    )
    SELECT timeslot, location, coalesce(event, '<empty>') AS event
    FROM timeslot_location_combinations
    LEFT JOIN schedule
        USING (timeslot, location)
    ORDER BY ALL;

SELECT * FROM schedule;
```

Using this approach, we can skip any step and continue the analysis without adjusting the next one.

What’s more, our script can now be re-run from the beginning without explicitly deleting any tables: the `CREATE OR REPLACE` statements will automatically replace any existing tables.

#### Computing Checksums for Columns

It’s often beneficial to compute a checksum for each column in a table, e.g., to see whether a column’s content has changed between two operations.
We can compute a checksum for the `schedule` table as follows:

```sql
SELECT bit_xor(md5_number(COLUMNS(*)::VARCHAR))
FROM schedule;
```

What’s going on here?
We first list columns ([`COLUMNS(*)`](#docs:lts:sql:expressions:star::columns-expression)) and cast all of them to `VARCHAR` values.
Then, we compute the numeric MD5 hashes with the [`md5_number` function](#docs:lts:sql:functions:utility::md5_numberstring) and aggregate them using the [`bit_xor` aggregate function](#docs:lts:sql:functions:aggregates::bit_xorarg).
This produces a single `HUGEINT` (` INT128`) value per column that can be used to compare the content of tables.

If we run this query in the script above, we get the following results:

```text
┌──────────────────────────────────────────┬────────────────────────────────────────┬─────────────────────────────────────────┐
│                 timeslot                 │                location                │                  event                  │
│                  int128                  │                 int128                 │                 int128                  │
├──────────────────────────────────────────┼────────────────────────────────────────┼─────────────────────────────────────────┤
│ -134063647976146309049043791223896883700 │ 85181227364560750048971459330392988815 │ -65014404565339851967879683214612768044 │
└──────────────────────────────────────────┴────────────────────────────────────────┴─────────────────────────────────────────┘
```

```text
┌────────────────────────────────────────┬────────────────────────────────────────┬─────────────────────────────────────────┐
│                timeslot                │                location                │                  event                  │
│                 int128                 │                 int128                 │                 int128                  │
├────────────────────────────────────────┼────────────────────────────────────────┼─────────────────────────────────────────┤
│ 62901011016747318977469778517845645961 │ 85181227364560750048971459330392988815 │ -65014404565339851967879683214612768044 │
└────────────────────────────────────────┴────────────────────────────────────────┴─────────────────────────────────────────┘
```

```text
┌──────────────────────────────────────────┬──────────┬──────────────────────────────────────────┐
│                 timeslot                 │ location │                  event                   │
│                  int128                  │  int128  │                  int128                  │
├──────────────────────────────────────────┼──────────┼──────────────────────────────────────────┤
│ -162418013182718436871288818115274808663 │        0 │ -135609337521255080720676586176293337793 │
└──────────────────────────────────────────┴──────────┴──────────────────────────────────────────┘
```

#### Creating a Macro for the Checksum Query

We can turn the [checksum query](#::computing-checksums-for-columns) into a [table macro](#docs:lts:sql:statements:create_macro::table-macros) with the new [`query_table` function](#docs:lts:guides:sql_features:query_and_query_table_functions):

```sql
CREATE MACRO checksum(table_name) AS TABLE
    SELECT bit_xor(md5_number(COLUMNS(*)::VARCHAR))
    FROM query_table(table_name);
```

This way, we can simply invoke it on the `schedule` table as follows (also leveraging DuckDB’s [`FROM`-first syntax](#docs:lts:sql:query_syntax:from)):

```sql
FROM checksum('schedule');
```

```text
┌──────────────────────────────────────────┬────────────────────────────────────────┬─────────────────────────────────────────┐
│                 timeslot                 │                location                │                  event                  │
│                  int128                  │                 int128                 │                 int128                  │
├──────────────────────────────────────────┼────────────────────────────────────────┼─────────────────────────────────────────┤
│ -134063647976146309049043791223896883700 │ 85181227364560750048971459330392988815 │ -65014404565339851967879683214612768044 │
└──────────────────────────────────────────┴────────────────────────────────────────┴─────────────────────────────────────────┘
```

#### Closing Thoughts

That’s it for today!
We’ll be back soon with more DuckDB tricks and case studies.
In the meantime, if you have a trick that would like to share, please share it with the DuckDB team on our social media sites, or submit it to the [DuckDB Snippets site](https://duckdbsnippets.com/) (maintained by our friends at MotherDuck).

## Driving CSV Performance: Benchmarking DuckDB with the NYC Taxi Dataset

**Publication date:** 2024-10-16

**Author:** Pedro Holanda

**TL;DR:** DuckDB's benchmark suite now includes the NYC Taxi Benchmark. We explain how our CSV reader performs on the Taxi Dataset and provide steps to reproduce the benchmark.

The [NYC taxi dataset](https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page) is a collection of many years of taxi rides that occurred in New York City. It is a very influential dataset, used for [database benchmarks](https://tech.marksblogg.com/benchmarks.html), [machine learning](https://www.r-bloggers.com/2018/01/new-york-city-taxi-limousine-commission-tlc-trip-data-analysis-using-sparklyr-and-google-bigquery-2/), [data visualization](https://www.kdnuggets.com/2017/02/data-science-nyc-taxi-trips.html), and more.

In 2022, the data provider has decided to distribute the dataset as a series of Parquet files instead of CSV files. Performance-wise, this is a wise choice, as Parquet files are much smaller than CSV files, and their native columnar format allows for fast execution directly on them. However, this change hinders the number of systems that can natively load the files.

In the [“Billion Taxi Rides in Redshift”](https://tech.marksblogg.com/billion-nyc-taxi-rides-redshift.html) blog post, a new database benchmark is proposed to evaluate the performance of aggregations over the taxi dataset. The dataset is also joined and denormalized with other datasets that contain information about the weather, cab types, and pickup/dropoff locations. It is then stored as multiple compressed, gzipped CSV files, each containing 20 million rows.

#### The Taxi Data Set as CSV Files

Since DuckDB is well-known for its [CSV reader performance](https://x.com/jmduke/status/1820593783005667459), we were intrigued to explore whether the loading process of this benchmark could help us identify new performance bottlenecks in our CSV loader. This curiosity led us on a journey to generate these datasets and analyze their performance in DuckDB. According to the recent study conducted on the AWS RedShift fleet, [CSV files are the most used external source data type in S3](https://assets.amazon.science/24/3b/04b31ef64c83acf98fe3fdca9107/why-tpc-is-not-enough-an-analysis-of-the-amazon-redshift-fleet.pdf), and 99% of them are gzipped. Therefore, the fact that the proposed benchmark also used split gzipped files caught my attention.

In this blog post, we'll guide you through how to run this benchmark in DuckDB and discuss some lessons learned and future ideas for our CSV Reader. The dataset used in this benchmark is [publicly available](https://github.com/pdet/taxi-benchmark/blob/0.1/files.txt). The dataset is partitioned and distributed as a collection of 65 gzipped CSV files, each containing 20 million rows and totaling up to 1.8 GB per file. The total dataset is 111 GB compressed and 518 GB uncompressed. We also provide more details on how we generated this dataset and highlight the differences between the dataset we distribute and the original one described in the [“Billion Taxi Rides in Redshift”](https://tech.marksblogg.com/billion-nyc-taxi-rides-redshift.html) blog post.

#### Reproducing the Benchmark

Doing fair benchmarking is a [difficult problem](https://pdet.github.io/assets/papers/benchmarking.pdf), especially when the data, queries, and results used for the benchmark are not easy to access and run. We have made the benchmark discussed in this blog post easy to run by providing scripts available in the [`taxi-benchmark` GitHub repository](https://github.com/pdet/taxi-benchmark).

This repository contains three main Python scripts:

1. `generate_prepare_data.py`: Downloads all necessary files and prepares them for the benchmark.
2. `benchmark.py`: Runs the benchmark and performs result verification.
3. `analyse.py`: Analyzes the benchmark results and produces some of the insights discussed in this blog post.

The benchmark is not intended to be flawless – no benchmark is. However, we believe that sharing these scripts is a positive step, and we welcome any contributions to make them cleaner and more efficient.

The repository also includes a README file with detailed instructions on how to use it.
This repository will serve as the foundation for the experiments conducted in this blog post.

##### Preparing the Dataset

To start, you first need to download and prepare the files by executing [`python generate_prepare_data.py`](https://github.com/pdet/taxi-benchmark/blob/0.1/generate_prepare_data.py). This will download all 65 files to the `./data` folder. Additionally, the files will be uncompressed and combined into a single large file.

As a result, the `./data` folder will have 65 gzipped CSV files (i.e., from `trips_xaa.csv.gz` to `trips_xcm.csv.gz`) and a single large uncompressed CSV file containing the full data (i.e., `decompressed.csv`).

Our benchmark then run in two different settings:

1. Over 65 compressed files.
2. Over a single uncompressed file.

Once the files have been prepared, you can run the benchmark by running [`python benchmark.py`](https://github.com/pdet/taxi-benchmark/blob/0.1/benchmark.py).

##### Loading

The loading phase of the benchmark runs six times for each benchmark setting. From the first five runs, we take the median loading time. During the sixth run, we collect resource usage data (e.g., CPU usage and disk reads/writes).

Loading is performed using an in-memory DuckDB instance, meaning the data is not persisted to DuckDB storage and only exists while the connection is active. This is important to note because, as the dataset does not fit in memory and is spilled into a temporary space on disk. The decision to not persist the data has a substantial impact on performance: it makes loading the dataset significantly faster, while querying it will be somewhat slower as [DuckDB will use an uncompressed representation](#docs:lts:guides:performance:how_to_tune_workloads::persistent-vs-in-memory-tables). We made this choice for the benchmark since our primary focus is on testing the CSV loader rather than the queries.

Our table schema is defined in [`schema.sql`](https://github.com/pdet/taxi-benchmark/blob/0.1/sql/schema.sql).

<details markdown='1'>
<summary markdown='span'>
[`schema.sql`](https://github.com/pdet/taxi-benchmark/blob/0.1/sql/schema.sql).
</summary>

```sql
CREATE TABLE trips (
    trip_id                 BIGINT,
    vendor_id               VARCHAR,
    pickup_datetime         TIMESTAMP,
    dropoff_datetime        TIMESTAMP,
    store_and_fwd_flag      VARCHAR,
    rate_code_id            BIGINT,
    pickup_longitude        DOUBLE,
    pickup_latitude         DOUBLE,
    dropoff_longitude       DOUBLE,
    dropoff_latitude        DOUBLE,
    passenger_count         BIGINT,
    trip_distance           DOUBLE,
    fare_amount             DOUBLE,
    extra                   DOUBLE,
    mta_tax                 DOUBLE,
    tip_amount              DOUBLE,
    tolls_amount            DOUBLE,
    ehail_fee               DOUBLE,
    improvement_surcharge   DOUBLE,
    total_amount            DOUBLE,
    payment_type            VARCHAR,
    trip_type               VARCHAR,
    pickup                  VARCHAR,
    dropoff                 VARCHAR,
    cab_type                VARCHAR,
    precipitation           BIGINT,
    snow_depth              BIGINT,
    snowfall                BIGINT,
    max_temperature         BIGINT,
    min_temperature         BIGINT,
    average_wind_speed      BIGINT,
    pickup_nyct2010_gid     BIGINT,
    pickup_ctlabel          VARCHAR,
    pickup_borocode         BIGINT,
    pickup_boroname         VARCHAR,
    pickup_ct2010           VARCHAR,
    pickup_boroct2010       BIGINT,
    pickup_cdeligibil       VARCHAR,
    pickup_ntacode          VARCHAR,
    pickup_ntaname          VARCHAR,
    pickup_puma             VARCHAR,
    dropoff_nyct2010_gid    BIGINT,
    dropoff_ctlabel         VARCHAR,
    dropoff_borocode        BIGINT,
    dropoff_boroname        VARCHAR,
    dropoff_ct2010          VARCHAR,
    dropoff_boroct2010      BIGINT,
    dropoff_cdeligibil      VARCHAR,
    dropoff_ntacode         VARCHAR,
    dropoff_ntaname         VARCHAR,
    dropoff_puma            VARCHAR);
```
</details>

The loader for the 65 files uses the following query:

```sql
COPY trips FROM 'data/trips_*.csv.gz' (HEADER false);
```

The loader for the single uncompressed file uses this query:

```sql
COPY trips FROM 'data/decompressed.csv' (HEADER false);
```

##### Querying

After loading, the benchmark script will run each of the [benchmark queries](https://github.com/pdet/taxi-benchmark/tree/0.1/sql/queries) five times to measure their execution time. It is also important to note that the results of the queries are validated against their corresponding [answers](https://github.com/pdet/taxi-benchmark/tree/0.1/sql/answers). This allows us to verify the correctness of the benchmark. Additionally, the queries are identical to those used in the original [“Billion Taxi Rides”](https://tech.marksblogg.com/benchmarks.html) benchmark.

#### Results

##### Loading Time

Although we are talking about many rows of a CSV file with 51 columns, DuckDB can ingest them rather fast.

Note that, by default, DuckDB preserves the insertion order of the data, which negatively impacts performance. In the following results, all datasets have been loaded with this option set to `false`.

```sql
SET preserve_insertion_order = false;
```

All experiments were run on my Apple M1 Max with 64 GB of RAM, and we compare the loading times for a single uncompressed CSV file, and the 65 compressed CSV files.

| Name                        | Time (min) | Avg deviation of CPU usage from 100% |
| --------------------------- | ---------: | -----------------------------------: |
| Single File – Uncompressed  |      11:52 |                                31.57 |
| Multiple Files – Compressed |      13:52 |                                27.13 |

Unsurprisingly, loading data from multiple compressed files is more CPU-efficient than loading from a single uncompressed file. This is evident from the lower average deviation in CPU usage for multiple compressed files, indicating fewer wasted CPU cycles. There are two main reasons for this: (1) The compressed files are approximately eight times smaller than the uncompressed file, drastically reducing the amount of data that needs to be loaded from disk and, consequently, minimizing CPU stalls while waiting for data to be processed. (2) It is much easier to parallelize the loading of multiple files than a single file, as each thread can handle on a single file.

The difference in CPU efficiency is also reflected in execution times: reading from a single uncompressed file is 2 minutes faster than reading from multiple compressed files. The reason for this lies in our decompression algorithm, which is admittedly not optimally designed. Reading a compressed file involves three tasks: (1) loading data from disk into a compressed buffer, (2) decompressing that data into a decompressed buffer, and (3) processing the decompressed buffer. In our current implementation, tasks 1 and 2 are combined into a single operation, meaning we cannot continue reading until the current buffer is fully decompressed, resulting in idle cycles.

##### Under the Hood

We can also see what happens under the hood to verify our conclusion regarding the loading time.

In the figure below, you can see a snapshot of CPU and disk utilization for the “Single File – Uncompressed” run. We observe that achieving 100% CPU utilization is challenging, and we frequently experience stalls due to data writes to disk, as we are creating a table from a dataset that does not fit into our memory. Another key point is that CPU utilization is closely tied to disk reads, indicating that our threads often wait for data before processing it. Implementing async IO for the CSV Reader/Writer could significantly improve performance for parallel processing, as a single thread could handle most of our disk I/O without negatively affecting CPU utilization.

<a href="/images/blog/taxi/utilization_uncompressed_unset.png" target="_blank">
![](../images/blog/taxi/utilization_uncompressed_unset.png)

</a>

Below, you can see a similar snapshot for loading the 65 compressed files. We frequently encounter stalls during data writes; however, CPU utilization is significantly better because we wait less time for the data to load (remember, the data is approximately 8 times smaller than in the uncompressed case). In this scenario, parallelization is also much easier. Like in the uncompressed case, these gaps in CPU utilization could be mitigated by async I/O, with the addition of a decomposed decompression algorithm.

<a href="/images/blog/taxi/utilization_compressed_unset.png" target="_blank">
![](../images/blog/taxi/utilization_compressed_unset.png)

</a>

##### Query Times

For completeness, we also provide the results of the four queries on a MacBook Pro with an M1 Pro CPU. This comparison demonstrates the time differences between querying a database that does not fit in memory using a purely in-memory connection (i.e., without storage) versus one where the data is first loaded and persisted in the database.

| Name | Time – without storage (s) | Time – with storage (s) |
| ---- | -------------------------: | ----------------------: |
| Q 01 |                       2.45 |                    1.45 |
| Q 02 |                       3.89 |                    0.80 |
| Q 03 |                       5.21 |                    2.20 |
| Q 04 |                       11.2 |                    3.12 |

The main difference between these times is that when DuckDB uses a storage file, the data is [highly compressed](https://duckdb.org/2022/10/28/lightweight-compression), resulting in [much faster access when querying the dataset](#docs:lts:guides:performance:how_to_tune_workloads::persistent-vs-in-memory-tables).
In contrast, when we do not use persistent storage, our in-memory database temporarily stores data in an uncompressed `.tmp` file to allow for memory overflow, which increases disk I/O and leads to slower query results. This observation raises a potential area for exploration: determining whether applying compression to temporary data would be beneficial.

#### How This Dataset Was Generated

The original blog post generated the dataset using CSV files distributed by the NYC Taxi and Limousine Commission. Originally, these files included precise latitude and longitude coordinates for pickups and drop-offs. However, starting in mid-2016, these precise coordinates were anonymized using pickup and drop-off geometry objects to address privacy concerns. (There are even stories of broken marriages resulting from checking the actual destinations of taxis.) Furthermore, in recent years, the TLC decided to redistribute the data as Parquet files and to fully anonymize these data points, including data prior to mid-2016.

This is a problem, as the dataset from the “Billion Taxi Rides in Redshift” blog post relies on having this detailed information. Let's take the following snippet of the data:

```csv
649084905,VTS,2012-08-31 22:00:00,2012-08-31 22:07:00,0,1,-73.993908,40.741383000000006,-73.989915,40.75273800000001,1,1.32,6.1,0.5,0.5,0,0,0,0,7.1,CSH,0,0101000020E6100000E6CE4C309C7F52C0BA675DA3E55E4440,0101000020E610000078B471C45A7F52C06D3A02B859604440,yellow,0.00,0.0,0.0,91,69,4.70,142,54,1,Manhattan,005400,1005400,I,MN13,Hudson Yards-Chelsea-Flatiron-Union Square,3807,132,109,1,Manhattan,010900,1010900,I,MN17,Midtown-Midtown South,3807
```

We see precise longitude and latitude data points: `-73.993908, 40.741383000000006, -73.989915, 40.75273800000001`, along with a PostGIS Geometry hex blob created from this longitude and latitude information: `0101000020E6100000E6CE4C309C7F52C0BA675DA3E55E4440, 0101000020E610000078B471C45A7F52C06D3A02B859604440` (generated as `ST_SetSRID(ST_Point(longitude, latitude), 4326)`).

Since this information is essential to the dataset, producing files as described in the “Billion Taxi Rides in Redshift” blog post is no longer feasible due to the missing detailed location data. However, the internet never forgets. Hence, we located instances of the original dataset distributed by various sources, such as [[1]](https://arrow.apache.org/docs/6.0/r/articles/dataset.html), [[2]](https://catalog.data.gov/dataset/?q=Yellow+Taxi+Trip+Data&sort=views_recent+desc&publisher=data.cityofnewyork.us&organization=city-of-new-york&ext_location=&ext_bbox=&ext_prev_extent=), and [[3]](https://datasets.clickhouse.com/trips_mergetree/partitions/trips_mergetree.tar). Using these sources, we combined the original CSV files with weather information from the [scripts](https://github.com/toddwschneider/nyc-taxi-data) referenced in the “Billion Taxi Rides in Redshift” blog post.

##### How Does This Dataset Differ from the Original One?

There are two significant differences between the dataset we distribute and the one from the “Billion Taxi Rides in Redshift” blog post:

1. Our dataset includes data up to the last date that longitude and latitude information was available (June 30, 2016), whereas the original post only included data up to the end of 2015 (understandable, as the post was written in February 2016).
2. We also included Uber trips, which were excluded from the original post.

If you wish to run the benchmark with a dataset as close to the original as possible, you can generate a new table by filtering out the additional data. For example:

```sql
CREATE TABLE trips_og AS
    FROM trips
    WHERE pickup_datetime < '2016-01-01'
      AND cab_type != 'uber';
```

#### Conclusion

In this blog post, we discussed how to run the taxi benchmark on DuckDB, and we've made all scripts available so you can benchmark your preferred system as well. We also demonstrated how this highly relevant benchmark can be used to evaluate our operators and gain insights into areas for further improvement.

## What's New in the Vector Similarity Search Extension?

**Publication date:** 2024-10-23

**Author:** Max Gabrielsson

**TL;DR:** DuckDB is another step closer to becoming a vector database! In this post, we show the new performance optimizations implemented in the vector search extension.

In the [previous blog post](https://duckdb.org/2024/05/03/vector-similarity-search-vss), we introduced the DuckDB [Vector Similarity Search (VSS) extension](#docs:lts:core_extensions:vss). While the extension is still quite experimental, we figured it would be interesting to dive into the details of some of the new features and improvements that we've been working on since the initial release.

#### Indexing Speed Improvements

As previously documented, creating an HNSW (Hierarchical Navigable Small Worlds) index over an already populated table is much more efficient than first creating the index and then inserting into the table. This is because it is much easier to predict how large the index will be if the total amount of rows are known up-front, which makes its possible to divide the work into chunks large enough to distribute over multiple threads. However, in the initial release this work distribution was a bit too coarse-grained as we would only schedule an additional worker thread for each [_row group_](#docs:lts:internals:storage::row-groups) (about 120,000 rows by default) in the table.

We've now introduced an extra buffer step in the index creation pipeline which enables more fine-grained work distribution, smarter memory allocation and less contention between worker threads. This results in much higher CPU saturation and a significant speedup when building HNSW indexes in environments with many threads available, regardless of how big or small the underlying table is.

Another bonus of this change is that we can now emit a progress bar when building the index, which is a nice touch when you still need to wait a while for the index creation to finish (despite the now much better use of system resources!).

#### New Distance Functions

In the initial release of VSS we supported three different distance functions:
[`array_distance`](#docs:lts:sql:functions:array::array_distancearray1-array2),
[`array_cosine_similarity`](#docs:lts:sql:functions:array::array_cosine_similarityarray1-array2) and
[`array_inner_product`](#docs:lts:sql:functions:array::array_inner_productarray1-array2). However, only the `array_distance` function is actually a _distance_ function in that it returns results closer to 0 when the vectors are similar, and close to 1 when they are dissimilar, in contrast to, e.g., `array_cosine_similarity` that returns 1 when the vectors are identical. Oops!

To remedy this we've introduced two new distances:

* `array_cosine_distance`, equivalent to `1 - array_cosine_simililarity`
* `array_negative_inner_product`equivalent to `-array_inner_product`

These will now be accelerated with the use of the HNSW index instead, making the query patterns and ordering consistent for all supported metrics regardless if you make use of the HNSW index or not. Additionally, if you have an HNSW using, e.g., the `cosine` metric and write a top-k style query using `1 - array_cosine_similarity` as the ranking criterium, the optimizer should be able to normalize the expression to `array_cosine_distance` and use the index for this function as well.

For completeness we've also added the equivalent distance functions for the dynamically-sized [`LIST` datatype](#docs:lts:sql:data_types:list) (prefixed with `list_` instead of `array_`) and changed the `<=>` binary operator to now be an alias of `array_cosine_distance`, matching the semantics of the [`pgvector` extension](https://github.com/pgvector/pgvector) for PostgreSQL.

#### Index Accelerated "Top-K" Aggregates

Another cool thing that's happened in core DuckDB since last time is that DuckDB now has extra overloads for the [`min_by`](#docs:lts:sql:functions:aggregates::min_byarg-val-n) and [`max_by`](#docs:lts:sql:functions:aggregates::max_byarg-val-n) aggregate functions (and their aliases `arg_min` and `arg_max`)
These new overloads take an optional third `n` argument that specifies the number of top-k (or top-`n`) elements to keep and outputs them into a sorted `LIST` value. Here's an example:

```sql
-- Create a table with some example data
CREATE OR REPLACE TABLE vecs AS 
    SELECT
        row_number() OVER () AS id, 
        [a, b, c]::FLOAT[3] AS vec 
    FROM
        range(1,4) AS x(a), range(1,4) AS y(b), range(1,4) AS z(c);

-- Find the top 3 rows with the vector closest to [2, 2, 2]
SELECT
    arg_min(vecs, array_distance(vec, [2, 2, 2]::FLOAT[3]), 3)
FROM
    vecs;
```

```text
[{'id': 14, 'vec': [2.0, 2.0, 2.0]}, {'id': 13, 'vec': [2.0, 1.0, 2.0]}, {'id': 11, 'vec': [1.0, 2.0, 2.0]}]
```

Of course, the VSS extension now includes optimizer rules to use to the HNSW index to accelerate these top-k aggregates when the ordering input is a distance function that references an indexed vector column, similarly to the `SELECT a FROM b ORDER BY array_distance(a.vec, query_vec) LIMIT k` query pattern that we discussed in the previous blog post. These new overloads allow you to express the same query in a more concise and readable way, while still avoiding the need for a full scan and sort of the underlying table (as long as the table has a matching HNSW index).

#### Index Accelerated `LATERAL` Joins

After running some benchmarking on the initial version of VSS, we realized that even though index-lookups on our HNSW index is really fast (thanks to the [USearch](https://github.com/unum-cloud/usearch) library that it is based on!), using DuckDB to search for individual vectors at a time has a lot of latency compared to other solutions. The reasons for this are many and nuanced, but we want to be clear that our choice of HNSW implementation, USearch, is not the bottleneck here as profiling revealed only about 2% of the runtime is actually spent inside of usearch.

Instead, most of the per-query overhead comes from the fact that DuckDB is just not optimized for _point queries,_ i.e., queries that only really fetch and process a single row. Because DuckDB is based on a vectorized execution engine, the smallest unit of work is not 1 row but 2,048, and because we expect to crunch through a ton of data, we generally favor spending a lot of time up front to optimize the query plan and pre-allocate large buffers and caches so that everything is as efficient as possible once we start executing. But a lot of this work becomes unnecessary when the actual working set is so small. For example, is it really worthwhile to inspect and hash every single element of a constant 768-long query vector to attempt to look for common subexpressions if you know there is only going to be a handful of rows in the result?

While we have some ideas on how to improve this scenario in the future, we decided to take another approach for now and instead try focus not on our weaknesses, but on our strengths. That is, crunching through a ton of data! So instead of trying to optimize the “1:N”, i.e., “given this one embedding, give me the closes N embeddings” query, what if we instead focused on the “N:M”, “given all these N embeddings, pair them up with the closest M embeddings each”. What would that look like? Well, that would be a [`LATERAL` join](#docs:lts:sql:query_syntax:from::lateral-joins) of course!

Basically, we are now able to make use of HNSW indexes to accelerate `LATERAL` joins where the “inner” query looks just like the top-k style queries we normally target, for example:

```sql
SELECT a
FROM b
ORDER BY array_distance(a.vec, query_vec)
LIMIT k;
```

But where the `query_vec` array is now a reference to an “outer” join table. The only requirement is for the inner table to have an HNSW index on the vector column matching the distance function. Here's an example:

```sql
-- Set the random seed for reproducibility
SELECT setseed(0.42);

-- Create some example tables
CREATE TABLE queries AS
    SELECT
        i AS id,
        [random(), random(), random()]::FLOAT[3] AS embedding 
    FROM generate_series(1, 10_000) r(i);

CREATE TABLE items AS
    SELECT
        i AS id,
        [random(), random(), random()]::FLOAT[3] AS embedding
FROM generate_series(1, 10_000) r(i);

-- Collect the 5 closest items to each query embedding
SELECT queries.id AS id, list(inner_id) AS matches 
    FROM queries, LATERAL (
        SELECT
            items.id AS inner_id,
            array_distance(queries.embedding, items.embedding) AS dist
        FROM items 
        ORDER BY dist 
        LIMIT 5
    )
GROUP BY queries.id;
```

Executing this on my Apple M3 Pro-equipped MacBook with 36 GB memory takes about 10 seconds.

If we `EXPLAIN` this query plan, we'll see a lot of advanced operators:

```sql
PRAGMA explain_output = 'optimized_only';
EXPLAIN ...
```

<details markdown='1'>
<summary markdown='span'>
Vanilla query plan (operators and expected cardinalities)
</summary>

```text
┌───────────────────────────┐
│         PROJECTION        │
│    ────────────────────   │
│       ~5000000 Rows       │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│       HASH_GROUP_BY       │
│    ────────────────────   │
│       ~5000000 Rows       │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│    ────────────────────   │
│       ~10000000 Rows      │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│    ────────────────────   │
│       ~10000000 Rows      │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      RIGHT_DELIM_JOIN     │
│    ────────────────────   │
│       ~10000000 Rows      ├──────────────┐
└─────────────┬─────────────┘              │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         SEQ_SCAN          ││         HASH_JOIN         │
│    ────────────────────   ││    ────────────────────   │
│        ~10000 Rows        ││       ~10000000 Rows      ├──────────────┐
└───────────────────────────┘└─────────────┬─────────────┘              │
                             ┌─────────────┴─────────────┐┌─────────────┴─────────────┐
                             │         PROJECTION        ││         DUMMY_SCAN        │
                             │    ────────────────────   ││                           │
                             │       ~10000000 Rows      ││                           │
                             └─────────────┬─────────────┘└───────────────────────────┘
                             ┌─────────────┴─────────────┐
                             │           FILTER          │
                             │    ────────────────────   │
                             │       ~10000000 Rows      │
                             └─────────────┬─────────────┘
                             ┌─────────────┴─────────────┐
                             │         PROJECTION        │
                             │    ────────────────────   │
                             │       ~50000000 Rows      │
                             └─────────────┬─────────────┘
                             ┌─────────────┴─────────────┐
                             │           WINDOW          │
                             └─────────────┬─────────────┘
                             ┌─────────────┴─────────────┐
                             │         PROJECTION        │
                             │    ────────────────────   │
                             │       ~50000000 Rows      │
                             └─────────────┬─────────────┘
                             ┌─────────────┴─────────────┐
                             │       CROSS_PRODUCT       ├──────────────┐
                             └─────────────┬─────────────┘              │
                             ┌─────────────┴─────────────┐┌─────────────┴─────────────┐
                             │         SEQ_SCAN          ││         DELIM_SCAN        │
                             │    ────────────────────   ││    ────────────────────   │
                             │        ~10000 Rows        ││         ~5000 Rows        │
                             └───────────────────────────┘└───────────────────────────┘
```
</details>

While this plan looks very complicated, the most worrysome among these operators is the `CROSS_PRODUCT` towards the bottom of the plan, which blows up the expected cardinality and is a sign that we are doing a lot of work that we probably don't want to do. However, if we create an HNSW index on the `items` table using

```sql
CREATE INDEX my_hnsw_idx ON items USING HNSW(embedding);
```

and re-run `EXPLAIN`, we get this plan instead:

<details markdown='1'>
<summary markdown='span'>
Query plan with HNSW index (operators and expected cardinalities)
</summary>

```text
┌───────────────────────────┐
│         PROJECTION        │
│    ────────────────────   │
│        ~50000 Rows        │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│       HASH_GROUP_BY       │
│    ────────────────────   │
│        ~50000 Rows        │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│    ────────────────────   │
│        ~50000 Rows        │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│    ────────────────────   │
│        ~50000 Rows        │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│    ────────────────────   │
│        ~50000 Rows        │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      HNSW_INDEX_JOIN      │
│    ────────────────────   │
│        ~50000 Rows        │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         SEQ_SCAN          │
│    ────────────────────   │
│        ~10000 Rows        │
└───────────────────────────┘
```
</details>

We can see that this plan is drastically simplified, but most importantly, the new `HNSW_INDEX_JOIN` operator replaces the `CROSS_PRODUCT` node that was there before and the estimated cardinality went from 5,000,000 to 50,000! Executing this query now takes about 0.15 seconds. That's an almost 66× speedup!

This optimization was just recently added to the VSS extension, so if you've already installed `vss` for DuckDB v1.1.2, run the following command to get the latest version:

```sql
UPDATE EXTENSIONS (vss);
```

#### Conclusion

That's all for this time folks! We hope you've enjoyed this update on the DuckDB Vector Similarity Search extension. While this update has focused a lot on new features and improvements such as faster indexing, additional distance functions and more optimizer rules, we're still working on improving some of the limitations mentioned in the previous blog post. We hope to have more to share related to custom indexes and index-based optimizations soon! If you have any questions or feedback, feel free to reach out to us on the [`duckdb-vss` GitHub repository](https://github.com/duckdb/duckdb-vss) or on the [DuckDB Discord](https://discord.duckdb.org/). Hope to see you around!

## Fast Top N Aggregation and Filtering with DuckDB

**Publication date:** 2024-10-25

**Author:** Alex Monahan

**TL;DR:** Find the top N values or filter to the latest N rows more quickly and easily with the `N` parameter in the `min`, `max`, `min_by`, and `max_by` aggregate functions.

#### Introduction to Top N

A common pattern when analyzing data is to look for the rows of data that are the highest or lowest in a particular metric.
When interested in the highest or lowest `N` rows in an entire dataset, SQL's standard `ORDER BY` and `LIMIT` clauses will sort by the metric of interest and only return `N` rows.
For example, using the scale factor 1 (SF1) data set of the [TPC-H benchmark](#docs:lts:core_extensions:tpch):

```sql
INSTALL tpch;
LOAD tpch;
-- Generate an example TPC-H dataset
CALL dbgen(sf = 1);

-- Return the most recent 3 rows by l_shipdate
FROM lineitem
ORDER BY
    l_shipdate DESC
LIMIT 3;
```

| l_orderkey | l_partkey | ... | l_shipmode | l_comment                           |
| ---------: | --------: | --- | ---------- | ----------------------------------- |
|     354528 |      6116 | ... | MAIL       | wake according to the u             |
|     413956 |     16402 | ... | SHIP       | usual patterns. carefull            |
|     484581 |     10970 | ... | TRUCK      | ccounts maintain. dogged accounts a |

This is useful to quickly get the oldest or newest values in a dataset or to find outliers in a particular metric.

Another common approach is to query the min/max summary statistics of one or more columns.
This can find outliers, but the row that contains the outlier can be different for each column, so it is answering a different question.
DuckDB's helpful `COLUMNS` expression allows us to calculate the maximum value for all columns.

```sql
FROM lineitem
SELECT
    max(COLUMNS(*));
```

> The queries in this post make extensive use of DuckDB's [`FROM`-first syntax](#docs:lts:sql:query_syntax:from::from-first-syntax).
> This allows the `FROM` and `SELECT` clauses to be swapped, and it even allows omitting the latter entirely.

| l_orderkey | l_partkey | ... | l_shipmode | l_comment   |
| ---------: | --------: | --- | ---------- | ----------- |
|     600000 |     20000 | ... | TRUCK      | zzle. slyly |

However, these two approaches can only answer certain kinds of questions.
There are many scenarios where the goal is to understand the top N values _within a group_.
In the first example above, how would we calculate the last 10 shipments from each supplier?
SQL's `LIMIT` clause is not able to handle that situation.
Let's call this type of analysis the top N by group.

This type of analysis is a common tool for exploring new datasets.
Use cases include pulling the most recent few rows for each group or finding the most extreme few values in a group.
Sticking with our shipment example, we could look at the last 10 shipments of each part number, or find the 5 highest priced orders per customer.

#### Traditional Top N by Group

In most databases, the way to filter to the top N within a group is to use a [window function](#docs:lts:sql:functions:window_functions) and a [common table expression (CTE)](#docs:lts:sql:query_syntax:with).
This approach also works in DuckDB.
For example, this query returns the 3 most recent shipments for each supplier:

```sql
WITH ranked_lineitem AS (
    FROM lineitem
    SELECT
        *,
        row_number() OVER
            (PARTITION BY l_suppkey ORDER BY l_shipdate DESC)
            AS my_ranking
)
FROM ranked_lineitem
WHERE
    my_ranking <= 3;
```

| l_orderkey | l_partkey | l_suppkey | ... | l_shipmode | l_comment                                 | my_ranking |
| ---------: | --------: | --------: | --- | ---------- | ----------------------------------------- | ---------: |
|    1310688 |    169532 |      7081 | ... | RAIL       | ully final exc                            |          1 |
|     910561 |    194561 |      7081 | ... | SHIP       | ly bold excuses caj                       |          2 |
|    4406883 |    179529 |      7081 | ... | RAIL       | tions. furious                            |          3 |
|    4792742 |     52095 |      7106 | ... | RAIL       | onic, ironic courts. final deposits sleep |          1 |
|    4010212 |    122081 |      7106 | ... | MAIL       | accounts cajole finally ironic instruc    |          2 |
|    1220871 |     94596 |      7106 | ... | TRUCK      | regular requests above t                  |          3 |
|        ... |       ... |       ... | ... | ...        | ...                                       |        ... |

In DuckDB, this can be simplified using the [`QUALIFY` clause](#docs:lts:sql:query_syntax:qualify).
`QUALIFY` acts like a `WHERE` clause, but specifically operates on the results of window functions.
By making this adjustment, the CTE can be avoided while returning the same results.

```sql
FROM lineitem
SELECT
    *,
    row_number() OVER
        (PARTITION BY l_suppkey ORDER BY l_shipdate DESC)
        AS my_ranking
QUALIFY
    my_ranking <= 3;
```

This is certainly a viable approach!
However, what are its weaknesses?
Even though the query is interested in only the 3 most recent shipments, it must sort every shipment just to retrieve those top 3.
Sorting in DuckDB has a complexity of `O(kn)` due to DuckDB's innovative [Radix sort implementation](https://duckdb.org/2021/08/27/external-sorting), but this is still higher than the `O(n)` of [DuckDB's hash aggregate](https://duckdb.org/2024/03/29/external-aggregation), for example.
Sorting is also a memory intensive operation when compared with aggregation.

#### Top N in DuckDB

[DuckDB 1.1](https://duckdb.org/2024/09/09/announcing-duckdb-110) added a new capability to dramatically simplify and improve performance of top N calculations.
Namely, the functions `min`, `max`, `min_by`, and `max_by` all now accept an optional parameter `N`.
If `N` is greater than 1 (the default), they will return an array of the top values.

As a simple example, let's query the most recent (top 3) shipment dates:

```sql
FROM lineitem
SELECT
    max(l_shipdate, 3) AS top_3_shipdates;
```

| top_3_shipdates                      |
| ------------------------------------ |
| [1998-12-01, 1998-12-01, 1998-12-01] |

#### Top N by Column in DuckDB

The top N selection can become even more useful thanks to the `COLUMNS` expression once again – we can retrieve the 3 top values in each column.
We can call this a _top N by column analysis._
It is particularly messy to try to do this analysis with ordinary SQL!
You would need a subquery or window function for every single column...
In DuckDB, simply:

```sql
FROM lineitem
SELECT
    max(COLUMNS(*), 3) AS "top_3_\0";
```

| top_3_l_orderkey         | top_3_l_partkey       | ... | top_3_l_shipmode      | top_3_l_comment                                                              |
| ------------------------ | --------------------- | --- | --------------------- | ---------------------------------------------------------------------------- |
| [600000, 600000, 599975] | [20000, 20000, 20000] | ... | [TRUCK, TRUCK, TRUCK] | [zzle. slyly, zzle. quickly bold a, zzle. pinto beans boost slyly slyly fin] |

#### Top N by Group in DuckDB

Armed with the new `N` parameter, how can we speed up a top N by group analysis?

Want to cut to the chase and see the final output?
[Feel free to skip ahead!](#the-final-top-n-by-group-query)

We will take advantage of three other DuckDB SQL features to make this possible:

- The [`max_by` function](#docs:lts:sql:functions:aggregates::max_byarg-val-n) (also known as `arg_max`)
- The [`unnest` function](#docs:lts:sql:query_syntax:unnest)
- Automatically packing an entire row into a [`STRUCT` column](#docs:lts:sql:data_types:struct::creating-structs)

The `max` function will return the max (or now the max N!) of a specific column.
In contrast, the `max_by` function will find the maximum value in a column, and then retrieve a value from the same row, but a different column.
For example, this query will return the ids of the 3 most recently shipped orders for each supplier:

```sql
FROM lineitem
SELECT
    l_suppkey,
    max_by(l_orderkey, l_shipdate, 3) AS recent_orders
GROUP BY
    l_suppkey;
```

| l_suppkey | recent_orders               |
| --------: | --------------------------- |
|      2992 | [233573, 3597639, 3060227]  |
|      8516 | [4675968, 5431174, 4626530] |
|      3205 | [3844610, 4396966, 3405255] |
|      2152 | [1672000, 4209601, 3831138] |
|      1880 | [4852999, 2863747, 1650084] |
|       ... | ...                         |

The `max_by` function is an aggregate function, so it takes advantage of DuckDB's fast hash aggregation rather than sorting.
Instead of sorting by `l_shipdate`, the `max_by` function scans through the dataset just once and keeps track of the `N` highest `l_shipdate` values.
It then returns the order id that corresponds with each of the most recent shipment dates.
The radix sort in DuckDB must scan through the dataset once per byte, so scanning only once provides a significant speedup.
For example, if sorting by a 64-bit integer, the sort algorithm must loop through the dataset 8 times vs. 1 with this approach!
A simple micro-benchmark is included in the [Performance Comparisons](#::performance-comparisons) section.

However, this SQL query has a few gaps.
The query returns results as a `LIST` rather than as separate rows.
Thankfully the `unnest` function can split a `LIST` into separate rows:

```sql
FROM lineitem
SELECT
    l_suppkey,
    unnest(
        max_by(l_orderkey, l_shipdate, 3)
    ) AS recent_orders
GROUP BY
    l_suppkey;
```

| l_suppkey | recent_orders |
| --------: | ------------: |
|      2576 |        930468 |
|      2576 |       2248354 |
|      2576 |       3640711 |
|      5559 |       4022148 |
|      5559 |       1675680 |
|      5559 |       4976259 |
|       ... |           ... |

The next gap is that there is no way to easily see the `l_shipdate` associated with the returned `l_orderkey` values.
This query only returns a single column, while typically a top N by group analysis will require the entire row.

Fortunately, DuckDB allows us to refer to the entire contents of a row as if it were just a single column!
By referring to the name of the table itself (here, `lineitem`) instead of the name of a column, the `max_by` function can retrieve all columns.

```sql
FROM lineitem
SELECT
    l_suppkey,
    unnest(
        max_by(lineitem, l_shipdate, 3)
    ) AS recent_orders
GROUP BY
    l_suppkey;
```

| l_suppkey | recent_orders                                                       |
| --------: | ------------------------------------------------------------------- |
|      5411 | {'l_orderkey': 2543618, 'l_partkey': 105410, 'l_suppkey': 5411, ... |
|      5411 | {'l_orderkey': 580547, 'l_partkey': 130384, 'l_suppkey': 5411, ...  |
|      5411 | {'l_orderkey': 3908642, 'l_partkey': 132897, 'l_suppkey': 5411, ... |
|        90 | {'l_orderkey': 4529697, 'l_partkey': 122553, 'l_suppkey': 90, ...   |
|        90 | {'l_orderkey': 4473346, 'l_partkey': 160089, 'l_suppkey': 90, ...   |
|       ... | ...                                                                 |

Let's make that a bit friendlier looking by splitting the `STRUCT` out into separate columns to match our original dataset.

##### The Final Top N by Group Query

Passing in one more argument to `UNNEST` will split this out into separate columns by running recursively.
In this case, that means that `UNNEST` will run twice: once to convert each `LIST` into separate rows, and then again to convert each `STRUCT` into separate columns.
The `l_suppkey` column can also be excluded, since it will automatically be included already.

```sql
FROM lineitem
SELECT
    unnest(
        max_by(lineitem, l_shipdate, 3),
        recursive := 1
    ) AS recent_orders
GROUP BY
    l_suppkey;
```

| l_orderkey | l_partkey | l_suppkey | ... | l_shipinstruct    | l_shipmode | l_comment                             |
| ---------: | --------: | --------: | --- | ----------------- | ---------- | ------------------------------------- |
|    1234726 |      6875 |      6876 | ... | COLLECT COD       | FOB        | cajole carefully slyly fin            |
|    2584193 |     51865 |      6876 | ... | TAKE BACK RETURN  | TRUCK      | fully regular deposits at the q       |
|    2375524 |     26875 |      6876 | ... | DELIVER IN PERSON | AIR        | nusual ideas. busily bold deposi      |
|    5751559 |     95626 |      8136 | ... | NONE              | SHIP       | ers nag fluffily against the spe      |
|    3103457 |    103115 |      8136 | ... | TAKE BACK RETURN  | FOB        | y slyly express warthogs-- unusual, e |
|    5759105 |    178135 |      8136 | ... | COLLECT COD       | TRUCK      | es. regular pinto beans haggle.       |
|        ... |       ... |       ... | ... | ...               | ...        | ...                                   |

> This approach can also be useful for the common task of de-duplicating by finding the latest value within a group.
> One pattern is to find the current state of a dataset by returning the most recent event in an events table.
> Simply use an `N` of 1!

We now have a way to use an aggregate function to calculate the top N rows per group!
So, how much more efficient is it?

#### Performance Comparisons

We will compare the `QUALIFY` approach with the `max_by` approach for solving the top N by group problem.
We have discussed both queries, but for reference they are repeated below.

<details markdown='1'>
<summary markdown='span'>
    `QUALIFY` query:
</summary>

```sql
FROM lineitem
SELECT
    *,
    row_number() OVER
        (PARTITION BY l_suppkey ORDER BY l_shipdate DESC)
        AS my_ranking
QUALIFY
    my_ranking <= 3;
```

</details>

<details markdown='1'>
<summary markdown='span'>
    `max_by` query:
</summary>

```sql
FROM lineitem
SELECT
    unnest(
        max_by(lineitem, l_shipdate, 3),
        recursive := 1
    )
GROUP BY
    l_suppkey;
```

</details>

While the main query is running, we will also kick off a background thread to periodically measure DuckDB's memory use.
This uses the built in table function `duckdb_memory()` and includes information about Memory usage as well as temporary disk usage.
The small Python script used for benchmarking is included below the results.
The machine used for benchmarking was an M1 MacBook Pro with 16 GB RAM.

|   SF | `max_memory` |       Metric | `QUALIFY` | `max_by` | Improvement |
| ---: | -----------: | -----------: | --------: | -------: | ----------: |
|    1 |      Default |   Total time |    0.58 s |   0.24 s |        2.4× |
|    5 |      Default |   Total time |    6.15 s |   1.26 s |        4.9× |
|   10 |        36 GB |   Total time |    36.8 s |   25.4 s |        1.4× |
|    1 |      Default | Memory usage |    1.7 GB |   0.2 GB |        8.5× |
|    5 |      Default | Memory usage |    7.9 GB |   1.5 GB |        5.3× |
|   10 |        36 GB | Memory usage |   15.7 GB |  17.1 GB |        0.9× |

We can see that in each of these situations, the `max_by` approach is faster, in some cases nearly 5× faster!
However, as the data grows larger, the `max_by` approach begins to weaken relative to `QUALIFY`.

In some cases, the memory use is significantly lower with `max_by` also.
However, the memory use of the `max_by` approach becomes more significant as scale increases, because the number of distinct `l_suppkey` values increases linearly with the scale factor.
This increased memory use likely explains the performance decrease, as both algorithms approached the maximum amount of RAM on my machine and began to swap to disk.

In order to reduce the memory pressure, let's re-run the scale factor 10 (SF10) benchmark using fewer threads (4 threads and 1 thread).
We continue to use a `max_memory` setting of 36 GB.
The prior SF10 results with all 10 threads are included for reference.

|   SF | Threads |       Metric | `QUALIFY` | `max_by` | Improvement |
| ---: | ------: | -----------: | --------: | -------: | ----------: |
|   10 |      10 |   Total time |    36.8 s |   25.4 s |        1.4× |
|   10 |       4 |   Total time |    49.0 s |   21.0 s |        2.3× |
|   10 |       1 |   Total time |   115.7 s |   12.7 s |        9.1× |
|   10 |      10 | Memory usage |   15.7 GB |  17.1 GB |        0.9× |
|   10 |       4 | Memory usage |   15.9 GB |  17.3 GB |        0.9× |
|   10 |       1 | Memory usage |   14.5 GB |   1.8 GB |        8.1× |

The `max_by` approach is so computationally efficient that even with 1 thread it is dramatically faster than the `QUALIFY` approach that uses all 10 threads!
Reducing the thread count very effectively lowered the memory use as well (a nearly 10× reduction).

So, when should we use each?
As with all database things, _it depends!_
If memory is constrained, `max_by` may also offer benefits, especially when the thread count is tuned to avoid spilling to disk.
However, if there are approximately as many groups as there are rows, consider `QUALIFY` since we lose some of the memory efficiency of the `max_by` approach.

<details markdown='1'>
<summary markdown='span'>
    Python Benchmarking Script
</summary>

```python
import duckdb
import pandas as pd
from threading import Thread
from time import sleep
from datetime import datetime
from os import remove

def check_memory(stop_function, filepath, sleep_seconds, results_dict):
    print("Starting background thread")
    background_con = duckdb.connect(filepath)
    max_memory = 0
    max_temporary_storage = 0
    while True:
        if stop_function():
            break
        # Profile the memory
        memory_profile = background_con.sql("""
            FROM duckdb_memory()
            SELECT
                tag,
                round(memory_usage_bytes / (1000000), 0)::bigint AS memory_usage_mb,
                round(temporary_storage_bytes / (1000000), 0)::bigint AS temporary_storage_mb;
            """).df()
        print(memory_profile)
        total_memory = background_con.sql("""
            FROM memory_profile
            select
                sum(memory_usage_mb) AS total_memory_usage_mb,
                sum(temporary_storage_mb) AS total_temporary_storage_mb
            """).fetchall()
        print('Current memory:', total_memory[0][0])
        print('Current temporary_storage:', total_memory[0][1])

        if total_memory[0][0] > max_memory:
            max_memory = total_memory[0][0]
        if total_memory[0][1] > max_temporary_storage:
            max_temporary_storage = total_memory[0][1]

        print('Maximum memory:', max_memory)
        print('Maximum temporary_storage:', max_temporary_storage)

        sleep(sleep_seconds)

    results_dict["max_memory"] = max_memory
    results_dict["max_temporary_storage"] = max_temporary_storage
    background_con.close()

    return

def query_and_profile(filepath, sql):
    con = duckdb.connect(filepath)
    con.sql("set max_memory='36GB'")

    results_dict = {}
    stop_threads = False
    background_memory_thread = Thread(target=check_memory,
                                      args=(lambda : stop_threads, filepath, 0.1, results_dict, ))
    background_memory_thread.start()

    print("Starting query:")
    start_time = datetime.now()
    results_df = con.sql(sql).df()
    results_dict["total_time_seconds"] = (datetime.now() - start_time).total_seconds()
    print(results_df.head(10))

    stop_threads = True
    background_memory_thread.join()
    con.close()

    return results_dict

filepath = './arg_max_check_duckdb_memory_v3.duckdb'

con = duckdb.connect(filepath)
print("Begin initial tpch load")
con.sql("""call dbgen(sf=1);""")
con.close()

sql = """
    FROM lineitem
    SELECT
        UNNEST(
            max_by(lineitem, l_shipdate, 3),
            recursive := 1
        )
    GROUP BY
        l_suppkey
;"""

max_by_results = query_and_profile(filepath, sql)

sql = """
    FROM lineitem
    SELECT
        *,
        row_number() OVER
            (PARTITION BY l_suppkey ORDER BY l_shipdate DESC)
            AS my_ranking
    QUALIFY
        my_ranking <= 3
;"""

qualify_results = query_and_profile(filepath, sql)

print('max_by_results:', max_by_results)
print('qualify_results:', qualify_results)

remove(filepath)
```

</details>

#### Conclusion

DuckDB now offers a convenient way to calculate the top N values of both `min` and `max` aggregate functions, as well as their advanced cousins `min_by` and `max_by`.
They are easy to get started with, and also enable more complex analyses like calculating the top N for all columns or the top N by group.
There are also possible performance benefits when compared with a window function approach.

We would love to hear about the creative ways you are able to use this new feature!

Happy analyzing!

## Analytics-Optimized Concurrent Transactions

**Publication date:** 2024-10-30

**Authors:** Mark Raasveldt, Hannes Mühleisen

**TL;DR:** DuckDB employs unique analytics-optimized optimistic multi-version concurrency control techniques. These allow DuckDB to perform large-scale in-place updates efficiently.

> This is the second post on DuckDB's ACID support. If you have not read the first post, [Changing Data with Confidence and ACID](https://duckdb.org/2024/09/25/changing-data-with-confidence-and-acid), it may be a good idea to start there.

In our [previous post](https://duckdb.org/2024/09/25/changing-data-with-confidence-and-acid), we have discussed why changes to data are much saner if the formal “ACID” transaction properties hold. A data system should not allow importing “half” a CSV file into a table because of some unexpected [string in line 431,741](https://duckdb.org/2024/10/09/analyzing-open-government-data-with-duckplyr).

Ensuring the ACID properties of transactions [under concurrency](#docs:lts:connect:concurrency) is very challenging and one of the “holy grails” of databases. DuckDB implements advanced methods for concurrency control and logging. In this post, we describe DuckDB's Multi-Version Concurrency (MVCC) and Write-Ahead-Logging (WAL) schemes that are specifically designed for efficiently ensuring the transactional guarantees for analytical use cases under concurrent workloads.

#### Concurrency Control

**Pessimistic Concurrency Control**. Traditional database systems use locks to manage concurrency. A transaction obtains locks in order to ensure that (a) no other transaction can see its uncommitted changes, and (b) it does not see uncommitted changes of other transactions. Locks need to be obtained both when **reading** (shared locks) and when **writing** (exclusive locks). When a different transaction tries to read data that has been written to by another transaction – it must wait for the other transaction to complete and release its exclusive lock on the data. This type of concurrency control is called **pessimistic**, because locks are always obtained, even if there are no conflicts between transactions.

This strategy works well for transactional workloads. These workloads consist of small transactions that read or modify a few rows. A typical transaction only locks a few rows, and keeps those rows locked only for a short period of time. For analytical workloads, on the other hand, this strategy does not work well. These workloads consist of large transactions that read or modify large parts of the table. An analytical transaction executed in a system that uses pessimistic concurrency control will therefore lock many rows, and keep those rows locked for a long period of time, preventing other transactions from executing.

**Optimistic Concurrency Control**. DuckDB uses a different approach to manage concurrency conflicts. Transactions do not hold locks – they can always read and write to any row in any table. When a conflict occurs and multiple transactions try to write to the same row at the same time – one of the conflicting transactions is instead aborted. The aborted transaction can then be retried if desired. This type of concurrency control is called **optimistic**.

In case there are never any concurrency conflicts – this strategy is very efficient as we have not unnecessarily slowed down transactions by pessimistically grabbing locks. This strategy works well for analytical workloads – as read-only transactions can never conflict with one another, and multiple writers that modify the same rows are rare in these workloads.

#### Multi-Version Concurrency Control

In an optimistic concurrency control model – multiple transactions can read and make changes to the same tables at the same time. We have to ensure these transactions cannot see each others' *half-done* changes in order to maintain ACID isolation. A well-known technique to achieve this is [Multi-Version Concurrency Control (MVCC)](https://en.wikipedia.org/wiki/Multiversion_concurrency_control). MVCC works by keeping **multiple versions** of modified rows. When a transaction modifies a row – we can create a copy of that row and modify that instead. This allows other transactions to keep on reading the original version of the row. This allows for each transaction to see their own, consistent state of the database. Often that state is the "version" that existed when the transaction was started. MVCC is widely used in database systems, for example [PostgreSQL also uses MVCC](https://www.postgresql.org/docs/current/mvcc-intro.html).

DuckDB implements MVCC using a technique inspired by the paper [“Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems”](https://15721.courses.cs.cmu.edu/spring2019/papers/04-mvcc2/p677-neumann.pdf) by the one and only [Thomas Neumann](https://en.wikipedia.org/wiki/Thomas_Neumann). This MVCC implementation works by maintaining a list of previous versions to each row in a table. Transactions will update the table data in-place, but will save the previous version of the updated row in the undo buffers. Below is an illustrated example.

```sql
-- add 5 to Sally's balance
UPDATE Accounts SET Balance = Balance + 5 WHERE Name = 'Sally';
```

![](../images/blog/mvcc/rowbasedmvcc.png)


When reading a row, a transaction will first check if there is version information for that row. If there is none, which is the common case, the transaction can read the original data. If there is version information, the transaction has to compare the transaction number at the transaction's start time with those in the undo buffers and pick the right version to read.

#### Efficient MVCC for Analytics

The above approach works well for transactional workloads where individual rows are changed frequently. For *analytical* use cases, we observe a very different usage pattern: changes are much more “bulky” and they often only affect a subset of columns. For example, we do not usually delete individual rows but instead delete all rows matching a pattern, e.g.:

```sql
DELETE FROM orders WHERE order_time < DATE '2010-01-01';
```

We also commonly bulk update columns, e.g., to fix the evergreen annoyance of people using nonsensical in-domain values to express `NULL`:

```sql
UPDATE people SET age = NULL WHERE age = -99;
```

If every row has version information, such bulk changes create a *huge* amount of entries in the undo buffers, which consume a lot of memory and are inefficient to operate on and read from.

There is also an added complication – the original approach relies on performing *in-place updates*. While we can efficiently perform in-place updates on uncompressed data, this is not possible when data is compressed. As DuckDB [keeps data compressed, both on-disk and in-memory](https://duckdb.org/2022/10/28/lightweight-compression), in-place updates cannot be performed.

In order to address these issues – DuckDB instead stores **bulk version information** on a per-column basis. For every batch of `2048` rows, a single version information entry is stored. The version information stores the changes made to the data, instead of the old data, as we cannot modify the original data in-place. Instead, any changes made to the data are flushed to disk during a checkpoint. Below is an illustrated example.

```sql
-- add 20% interest to all accounts
UPDATE Accounts SET Balance = Balance + Balance / 5;
```

![](../images/blog/mvcc/columnbasedmvcc.png)


One beautiful aspect of this undo buffer scheme is that it is largely performance-transparent: if no changes are made, there are no extra computational cost associated with providing support for transactions. To the best of our knowledge, DuckDB is the *only transactional data management system that is optimized for bulk changes to data* that are common in analytical use cases. But even with changes present, our transaction scheme is very fast for the kind of transactions that we expect for analytical use cases.

##### Benchmarks

Here is a small experiment, comparing DuckDB 1.1.0, [HyPer](https://www.tableau.com/products/new-features/hyper) 9.1.0, SQLite 3.43.2, and PostgreSQL 14.13 on a recent MacBook Pro, showing some of the effects that an OLAP-optimized transaction scheme will have. We should note that HyPer implements the MVCC scheme from the Neumann paper mentioned above. SQLite does not actually implement MVCC, it is mostly included as a comparison point.

We create two tables with either 1 or 100 columns, each with 10 million rows, containing the integer values 1-100 repeating.

```sql
CREATE TABLE mvcc_test_1 (i INTEGER);
INSERT INTO mvcc_test_1
    SELECT s1
    FROM
        generate_series(1, 100) s1(s1),
        generate_series(1, 100_000) s2(s2);

CREATE TABLE mvcc_test_100 (i INTEGER,
    j1 INTEGER, j2 INTEGER, ..., j99 INTEGER);
INSERT INTO mvcc_test_100
    SELECT s1, s1, s1, ..., s1
    FROM
        generate_series(1, 100) s1(s1),
        generate_series(1, 100_000) s2(s2);
```

We then run three transactions on both tables that increment a single column, with an increasing number of affected rows, 1%, 10% and 100%:

```sql
UPDATE mvcc_test_... SET i = i + 1 WHERE i <= 1;
UPDATE mvcc_test_... SET i = i + 1 WHERE i <= 10;
UPDATE mvcc_test_... SET i = i + 1 WHERE i <= 100;
```

For the **single-column case**, there should not be huge differences between using a row-major or a column-major concurrency control scheme, and indeed the results show this:

| 1 Column   |   1% |  10% |  100% |
| ---------- | ---: | ---: | ----: |
| DuckDB     | 0.02 | 0.07 |  0.43 |
| SQLite     | 0.21 | 0.25 |  0.61 |
| HyPer      | 0.66 | 0.28 |  2.37 |
| PostgreSQL | 1.44 | 2.48 | 19.07 |

Changing more rows took more time. The rows are small, each row only contain a single value. DuckDB and HyPer, having more modern MVCC scheme based on undo buffers as outlined above, are generally much faster than PostgreSQL.
SQLite is doing well, but of course it does not have any MVCC. Timings increase roughly 10× as the amount of rows changed is increased tenfold. So far so good.

For the **100 column case**, results look drastically different:

| 100 Columns |   1% |  10% |  100% |
| ----------- | ---: | ---: | ----: |
| DuckDB      | 0.02 | 0.07 |  0.43 |
| SQLite      | 0.51 | 1.79 | 12.93 |
| HyPer       | 0.66 | 6.06 | 61.54 |
| PostgreSQL  | 1.42 | 5.45 | 50.05 |

Recall that here we are changing a single column out of 100, a common use case in wrangling analytical data sets. Because DuckDB's MVCC scheme is *designed* for those use cases, it shows exactly the same runtime as in the single-column experiment above. In SQLite, there is a clear impact of the larger row size on the time taken to complete the updates even without MVCC. HyPer and PostgreSQL also show much larger, up to 100× (!) slowdowns as the amount of changed rows is increased.

This neatly brings us to checkpointing.

#### Write-Ahead Logging and Checkpointing

Any data that's not written to disk but instead still lingers in CPU caches or main memory will be lost in case the operating system crashes or if power is lost. To guarantee durability of changes in the presence of those adverse events, DuckDB needs to *ensure that any committed changes are written to persistent storage*. However, changes in a transaction can be scattered all over potentially large tables, and fully writing them to disk can be quite slow, especially if it has to happen before any transaction can commit. Also, we don't yet know if we actually want to persist a change, we may encounter a failure in the very process of committing.

The traditional approach of transactional data management systems to balance the requirement of writing changes to persistent storage with the requirement of not taking forever is the [write-ahead log (WAL)](https://en.wikipedia.org/wiki/Write-ahead_logging). The WAL can be thought of as a log file of all changes to the database. On each transaction commit, its changes are written to the WAL. On restart, the database files are re-loaded from disk, the changes in the WAL are re-applied (if present), and things happily continue.
After some amount of changes, the changes in the WAL need to be physically applied to the table, a process known as “checkpointing”. Afterward, the WAL entries can be discarded, a process known as “truncating”. This scheme ensures that changes persist even if a crash occurs or power is lost immediately after a commit.

DuckDB implements write-ahead logging and you may have seen a `.wal` file appearing here and there. Checkpointing normally happens *automatically* whenever the WAL file reached a limit, by default 16 MB but this can be adjusted with the `checkpoint_threshold` setting. Checkpoints also automatically happen at database shutdown. Checkpoints can also be [explicitly triggered](#docs:lts:sql:statements:checkpoint) with the `CHECKPOINT` and `FORCE CHECKPOINT` commands, the difference being that the latter will abort (rollback) any active transactions to ensure the checkpoint is happening *right now* while the former will wait.

DuckDB explicitly calls the [`fsync()` system call](https://pubs.opengroup.org/onlinepubs/009695399/functions/fsync.html) to make sure any WAL entries will be forced to be written to persistent storage, ignoring the many caches on the way. This is *necessary* because those caches may also be lost in the event of, e.g., power failure, so it's no use to only write log entries to the WAL if they end up not being actually written to storage because the operating system or the disk decided that it was better to wait for performance reasons. However, `fsync()` does take some time, and while it's generally considered bad practice, there are systems out there that don't do this at all or not by default in order to boast about more transactions per second.

In DuckDB, even bulk loads such as loading large files into tables (e.g., using the [`COPY` statement](#docs:lts:sql:statements:copy)) are fully transactional. This means you can do something like this:

```sql
BEGIN TRANSACTION;
CREATE TABLE people (age INTEGER, ...);
COPY people FROM 'many_people.csv';
UPDATE people SET age = NULL WHERE age = -99;
SELECT
    CASE
        WHEN (SELECT count(*) FROM people) = 1_000_000 THEN true
        ELSE error('expected 1m rows')
    END;
COMMIT;
```

This transaction creates a table, copies a large CSV file into the table, and then updates the table to replace a magic value. Finally, a check is performed to see if there is the expected number of rows in the table. All this is bound together into a *single transaction*. If anything goes wrong at any point in the process or the check fails, the transaction will be aborted and zero changes to the database will have happened, the table will not even exist. This is great because it allows implementing all-or-nothing semantics for complex loading tasks, possibly into many tables.

However, logging large changes is a problem. Imagine the `many_people.csv` file being large, say ten gigabytes. As discussed, all changes are written to the WAL and eventually checkpointed. The changes in the file are large enough to immediately trigger a checkpoint. So now we're first writing ten gigabytes to the WAL, and then reading them again, and then writing them again to the database file. Instead of reading ten and writing ten, we have read twenty and written twenty. This is not ideal, but rather than allowing to bypass transactions for bulk loads, DuckDB will instead *optimistically write large changes to new blocks in the database file directly*, and merely add a reference to the WAL. On commit, these new blocks are added to the table. On rollback, the blocks are marked as free space. So while this can lead to the database file pointlessly increasing in size if transactions are aborted, the common case will benefit greatly. Again, this means that users experience near-zero-cost transactionality.

#### More Experiments

Making concurrency control and write-ahead looking work correctly in the face of failure is very challenging. Software engineers are biased towards the “happy path”, where everything works as intended. The well-known [TPC-H benchmark](https://www.tpc.org/TPC_Documents_Current_Versions/pdf/TPC-H_v3.0.1.pdf) actually contains tests that stress concurrency and logging schemes (Section 3.5.4, “Durability Tests”). Our previous blog post also [implemented this test and DuckDB passed](https://duckdb.org/2024/09/25/changing-data-with-confidence-and-acid#acid-tests).

In addition, we also defined our own, even more challenging [test for durability](https://github.com/hannes/duckdb-tpch-power-test/blob/main/check-invariant.py): we run the TPC-H refresh sets one-by-one, in a sub-process. The sub-process reports the last committed refresh. As they are run, after a random (short) time interval, that sub-process is being killed (using `SIGKILL`). Then, DuckDB is restarted, it will likely start recovering from WAL and then continue with the refresh sets. Because of the random time interval, it is likely that DuckDB also gets killed during WAL recovery. This of course should not have any impact on the contents of the database. Finally, we have pre-computed the correct result after running 4000 refresh sets using DuckDB, and after all is set and done we check if there are any differences. There were none, luckily.

To stress our implementation further, we have repeated this experiment on a special file system, [LazyFS](https://github.com/dsrhaslab/lazyfs). This [FUSE](https://en.wikipedia.org/wiki/Filesystem_in_Userspace) file system is [specifically designed](https://dl.acm.org/doi/10.14778/3681954.3681980) to help uncover bugs in database systems by – among other things – not properly flushing changes to disk using `fsync()`. In our LazyFS configuration, any change that is written to a file is discarded *unless* sync-ed, which also happens if a file is closed. So in our experiment where we kill the database any un-sync-ed entries in the WAL would be lost. We've re-run our durability tests described above on LazyFS and are also **happy to report that no issues were found**.

#### Conclusion

In this post, we described DuckDB's approaches to concurrency control and write-ahead logging. Of course, we are constantly working on improving them. One nasty failure mode that can appear in real-world systems are partial (“torn”) writes to files, where only parts of write requests actually make it to the file. Luckily, LazyFS can be configured to be even more hostile, for example failing read and write system calls entirely, returning partial or wrong data, or only partially writing the data to disk. We plan to expand our experimentation on this, to make sure DuckDB's transaction handling is as bullet-proof as it can be.

And who knows, maybe we even dare to unleash the famous [Kyle](https://aphyr.com/about) of [Jepsen](https://jepsen.io) on DuckDB at some point.

## Optimizers: The Low-Key MVP

**Publication date:** 2024-11-14

**Author:** Tom Ebergen

**TL;DR:** The query optimizer is an important part of any analytical database system as it provides considerable performance improvements compared to hand-optimized queries, even as the state of your data changes.

Optimizers don't often give "main character" energy in the database community. Databases are usually popular because of their performance, ease of integration, or reliability. As someone who mostly works on the optimizer in DuckDB, I have been wanting to write a blog post about how important optimizers are and why they merit more recognition. In this blog post we will analyze queries that fall into one of three categories: unoptimized, hand-optimized, and optimized by the DuckDB query optimizer. I will also explain why built-in optimizers are almost always better than any hand optimizations. Hopefully, by the end of this blog post, you will agree that optimizers play a silent, but vital role when using a database. Let's first start by understanding where in the execution pipeline query optimization happens.

Before any data is read from the database, the given SQL text must be parsed and validated. If this process finishes successfully, a tree-based query plan is created. The query plan produced by the parser is naïve, and can be extremely inefficient depending on the query. This is where the optimizer comes in, the inefficient query plan is passed to the optimizer for modification and, you guessed it, optimization. The optimizer is made up of many optimization rules. Each rule has the ability to reorder, insert, and delete query operations to create a slightly more efficient query plan that is also logically equivalent. Once all the optimization rules are applied, the optimized plan can be much more efficient than the plan produced by the parser.

> In practice an optimization rule can also be called an optimizer. For the rest of this blog post, optimizer rule will be used for a specific optimization, and optimizer will refer to the database optimizer, unless the word optimizer names a specific optimization rule, (i.e., _Join Order Optimizer_).

#### Normal Queries vs. Optimized Queries

To examine the effect of the DuckDB query optimizer, let's use a subset of the NYC taxi dataset. You can create native DuckDB tables with the following commands (note that [`taxi-data-2019.parquet`](https://blobs.duckdb.org/data/taxi-data-2019.parquet) is approximately 1.3 GB):

```sql
CREATE TABLE taxi_data_2019 AS
    FROM 'https://blobs.duckdb.org/data/taxi-data-2019.parquet';
CREATE TABLE zone_lookups AS
    FROM 'https://blobs.duckdb.org/data/zone-lookups.parquet';
```

Now that we have all 2019 data, let's look at the unoptimized vs. optimized plans for a simple query. The following SQL query gets us the most common pickup and drop-off pairs in the Manhattan borough.

```sql
PRAGMA disable_optimizer;
PRAGMA explain_output = 'optimized_only';
EXPLAIN SELECT
    pickup.zone AS pickup_zone,
    dropoff.zone AS dropoff_zone,
    count(*) AS num_trips
FROM
    zone_lookups AS pickup, 
    zone_lookups AS dropoff,
    taxi_data_2019 AS data
WHERE pickup.LocationID = data.pickup_location_id
  AND dropoff.LocationID = data.dropoff_location_id
  AND pickup.Borough = 'Manhattan'
  AND dropoff.Borough = 'Manhattan'
GROUP BY pickup_zone, dropoff_zone
ORDER BY num_trips DESC
LIMIT 5;
```

Running this `EXPLAIN` query gives us the following plan.


```text
┌───────────────────────────┐
│           LIMIT           │
│    ────────────────────   │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│          ORDER_BY         │
│    ────────────────────   │
│        count_star()       │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│    ────────────────────   │
│        Expressions:       │
│             0             │
│             1             │
│         num_trips         │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         AGGREGATE         │
│    ────────────────────   │
│          Groups:          │
│        pickup_zone        │
│        dropoff_zone       │
│                           │
│        Expressions:       │
│        count_star()       │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│           FILTER          │
│    ────────────────────   │
│        Expressions:       │
│       (LocationID =       │
│     pickup_location_id)   │
│       (LocationID =       │
│    dropoff_location_id)   │
│ (Borough = CAST('Manhattan│
│       ' AS VARCHAR))      │
│ (Borough = CAST('Manhattan│
│       ' AS VARCHAR))      │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│       CROSS_PRODUCT       │
│    ────────────────────   ├───────────────────────────────────────────┐
└─────────────┬─────────────┘                                           │
┌─────────────┴─────────────┐                             ┌─────────────┴─────────────┐
│       CROSS_PRODUCT       │                             │          SEQ_SCAN         │
│    ────────────────────   ├──────────────┐              │    ────────────────────   │
│                           │              │              │       taxi_data_2019      │
└─────────────┬─────────────┘              │              └───────────────────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│          SEQ_SCAN         ││          SEQ_SCAN         │
│    ────────────────────   ││    ────────────────────   │
│        zone_lookups       ││        zone_lookups       │
└───────────────────────────┘└───────────────────────────┘
```

The cross products alone make this query extremely inefficient. The cross-products produce `256 * 256 * |taxi_data_2019|` rows of data, which is 5 trillion rows of data. The filter only matches 71 million rows, which is only 0.001% of the data. The aggregate produces 4,373 rows of data, which need to be sorted by the `ORDER BY` operation, which runs in `O(N * log N)`. Producing 5 trillion tuples alone is an enormous amount of data processing, which becomes clear when you try to run the query and notice it doesn't complete. With the optimizer enabled, the query plan produced is much more efficient because the operations are re-ordered to avoid many trillions of rows of intermediate data. Below is the query plan with the optimizer enabled:

```sql
PRAGMA enable_optimizer;
EXPLAIN ...
```


```text
┌───────────────────────────┐
│           TOP_N           │
│    ────────────────────   │
│          ~5 Rows          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│    ────────────────────   │
│        Expressions:       │
│             0             │
│             1             │
│         num_trips         │
│                           │
│         ~265 Rows         │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         AGGREGATE         │
│    ────────────────────   │
│          Groups:          │
│        pickup_zone        │
│        dropoff_zone       │
│                           │
│        Expressions:       │
│        count_star()       │
│                           │
│         ~265 Rows         │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      COMPARISON_JOIN      │
│    ────────────────────   │
│      Join Type: INNER     │
│                           │
│        Conditions:        ├───────────────────────────────────────────┐
│   (pickup_location_id =   │                                           │
│         LocationID)       │                                           │
│                           │                                           │
│       ~1977517 Rows       │                                           │
└─────────────┬─────────────┘                                           │
┌─────────────┴─────────────┐                             ┌─────────────┴─────────────┐
│      COMPARISON_JOIN      │                             │          SEQ_SCAN         │
│    ────────────────────   │                             │    ────────────────────   │
│      Join Type: INNER     │                             │          Filters:         │
│                           │                             │  Borough='Manhattan' AND  │
│        Conditions:        ├──────────────┐              │     Borough IS NOT NULL   │
│   (dropoff_location_id =  │              │              │                           │
│         LocationID)       │              │              │        zone_lookups       │
│                           │              │              │                           │
│       ~12744000 Rows      │              │              │          ~45 Rows         │
└─────────────┬─────────────┘              │              └───────────────────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│          SEQ_SCAN         ││          SEQ_SCAN         │
│    ────────────────────   ││    ────────────────────   │
│       taxi_data_2019      ││          Filters:         │
│                           ││  Borough='Manhattan' AND  │
│                           ││     Borough IS NOT NULL   │
│                           ││                           │
│                           ││        zone_lookups       │
│                           ││                           │
│       ~84393604 Rows      ││          ~45 Rows         │
└───────────────────────────┘└───────────────────────────┘
```

Let's first look at the difference in execution times on my MacBook with an M1 Max and 32 GB of memory before talking about the optimizations that have taken place.

|         | Unoptimized | Optimized |
| ------- | ----------- | --------- |
| Runtime | >24 hours   | 0.769 s   |

Hopefully this performance benefit illustrates how powerful the DuckDB Optimizer is. So what optimization rules are responsible for these drastic performance improvements? For the query above, there are three powerful rules that are applied when optimizing the query: _Filter Pushdown,_ _Join Order Optimization,_ and _TopN Optimization_.

The _Filter Pushdown Optimizer_ is very useful since it reduces the amount of intermediate data being processed. It is an optimization rule that is sometimes easy to miss for humans and will always result in faster execution times if the filter is selective in any way. It takes a filter, like `Borough = 'Manhattan'` and pushes it down to the operator that first introduces the filtered column, in this case the table scan. In addition, it will also detect when a filtered column like `col1` is used in an equality condition (i.e., `WHERE col1 = col2`). In these cases, the filter is duplicated and applied to the other column, `col2`, further reducing the amount of intermediate data being processed.

The _Join Order Optimizer_ recognizes that the filters `pickup.LocationID = data.pickup_location_id` and `dropoff.LocationID = data.dropoff_location_id` can be used as join conditions and rearranges the scans and joins accordingly. This optimizer rule does a lot of heavy lifting to reduce the amount of intermediate data being processed since it is responsible for removing the cross products.

The _TopN Optimizer_ is very useful when aggregate data needs to be sorted. If a query has an `ORDER BY` and a `LIMIT` operator, a TopN operator can replace these two operators. The TopN operator orders only the highest/lowest `N` values, instead of all values. If `N` is 5, then DuckDB only needs to keep 5 rows with the minimum/maximum values in memory and can throw away the rest. So if you are only interested in the top `N` values out of `M`, where `N << M`, the TopN operator can run in `O(M + N * log N)` instead of `O(M * log M)`.

These are just a few of the optimizations DuckDB has. More optimizations are explained in the section [Summary of All Optimizers](#::summary-of-all-optimizers).

#### Hand-Optimized Queries

For the query above, it is possible to achieve almost the same plan by carefully writing the SQL query by hand. To achieve a similar plan as the one generated by DuckDB, you can write the following.

```sql
SELECT 
    pickup.zone AS pickup_zone,
    dropoff.zone AS dropoff_zone,
    count(*) AS num_trips
FROM
    taxi_data_2019 data
INNER JOIN
    (SELECT * FROM zone_lookups WHERE Borough = 'Manhattan') pickup
    ON pickup.LocationID = data.pickup_location_id
INNER JOIN
    (SELECT * FROM zone_lookups WHERE Borough = 'Manhattan') dropoff
    ON dropoff.LocationID = data.dropoff_location_id
GROUP BY pickup_zone, dropoff_zone
ORDER BY num_trips desc
LIMIT 5;
```

Inspecting the runtimes again we get:

|         | Unoptimized | Hand-optimized | Optimized |
| ------- | ----------- | -------------- | --------- |
| Runtime | >24 hours   | 0.926 s        | 0.769 s   |

The SQL above results in a plan similar to the DuckDB optimized plan, but it is wordier and more error-prone to write, which can potentially lead to bugs. In very rare cases, it is possible to hand write a query that produces a more efficient plan than an optimizer. These cases are extreme outliers, and in all other cases the optimizer will produce a better plan. Moreover, a hand-optimized query is optimized for the current state of the data, which can change with many updates over time. Once a sufficient amount of changes are applied to the data, the assumptions of a hand-optimized query may no longer hold, leading to bad performance. Let's take a look at the following example.

Suppose an upstart company has an `orders` and `parts` table and every time some dashboard loads, the most popular ordered parts needs to be calculated. Since the company is still relatively new, they only have a small amount orders, but their catalog of parts is still quite large. A hand-optimized query would look like this:

```sql
CREATE OR REPLACE TABLE orders AS
    SELECT RANGE order_id, range % 10_000 pid FROM range(1_000);
CREATE TABLE parts AS
    SELECT range p_id, range::VARCHAR AS part_name FROM range(10_000);
SELECT
    parts.p_id,
    parts.part_name,
    count(*) AS ordered_amount
FROM parts
INNER JOIN orders 
    ON orders.pid = parts.p_id
GROUP BY ALL;
```

Naturally, the number of orders will increase as this company gains customers and grows in popularity. If the query above continues to run without the use of an optimizer, the performance will slowly decline. This is because the execution engine will build the hash table on the orders table, which potentially will have 100 million rows. If the optimizer is enabled, the [Join Order Optimizer](#::join-order-optimizer) will be able to inspect the statistics of the table during the optimization process and produce a new plan according to the new state of the data.

Here is a breakdown of running the queries with and without the optimizer as the orders table increases.

|                   | Unoptimized | Optimized |
| ----------------- | ----------: | --------: |
| \|orders\| = 1K   |     0.004 s |   0.003 s |
| \|orders\| = 10K  |     0.005 s |   0.005 s |
| \|orders\| = 100K |     0.013 s |   0.008 s |
| \|orders\| = 1M   |     0.055 s |   0.014 s |
| \|orders\| = 10M  |     0.240 s |   0.044 s |
| \|orders\| = 100M |     2.266 s |   0.259 s |

At first the difference in execution time is not really noticeable, so no one would think a query rewrite would be the solution. But once enough orders are reached, waiting 2 seconds every time the dashboard loads becomes tedious. If the optimizer is enabled, the query performance improves by a factor of 10×. So if you ever think you have identified a scenario where you are smarter than the optimizer, make sure you have also thought about all possible updates to the data and have hand-optimized for those as well.

#### Optimizations That Are Impossible by Hand

Some optimization rules are also impossible to write by hand. For example, the TopN optimization can not be optimized by hand.

Another good example is the Join Filter Pushdown optimization. The Join Filter Pushdown optimization works in scenarios where the build side of a hash join has a subset of the join keys. In its current state the join filter pushdown optimization keeps track of the minimum value key and maximum value key and pushes a table filter into the probe side to filter out keys greater than the maximum join value and smaller than the minimum join value.

With a small change, we can use the query from above to demonstrate this. Suppose we first filter our `parts` table to only include parts with a specific prefix in the `part_name`. When the `orders` table has 100 million rows and the `parts` table only has ~20,000 after filtering, then the `orders` table will be the probe side and the `parts` table will be the hash/build side. When the hash table is built, the min and max `p_id` values in the `parts` table are recorded, in this case it could be 20,000 and 80,000. These min and max values get pushed as a filter into the `orders` table scan, filtering out all parts with `p_id > 80,000` and `pid < 20,000`. 40% of the `orders` table has a `pid` greater than 80,000, and less than 20,000 so this optimization does a lot of heavy lifting in join queries.

Imagine trying to express this logic in your favorite data frame API; it would be extremely difficult and error-prone. The library would need to implement this optimization automatically for all hash joins. The Join Filter Pushdown optimization can improve query performance by 10×, so it should be a key factor when deciding what analytical system to use.

If you use a data frame library like [collapse](https://github.com/SebKrantz/collapse), [pandas](https://github.com/pandas-dev/pandas), [data.table](https://github.com/Rdatatable/data.table), [modin](https://github.com/modin-project/modin), then you are most likely not enjoying the benefits of query optimization techniques. This means your optimizations need to be applied by hand, which is not sustainable if your data starts changing. Moreover, you are most likely writing imperatively, using a syntax specific to the dataframe library. This means the scripts responsible for analyzing data are not very portable. SQL, on the other hand, can be much more intuitive to write since it is a declarative language, and can be ported to practically any other database system.

#### Summary of All Optimizers

Below is a non-exhaustive list of all the optimization rules that DuckDB applies.

##### Expression Rewriter

The _Expression Rewriter_ simplifies expressions within each operator. Sometimes queries are written with expressions that are not completely evaluated or they can be rewritten in a way that takes advantage of features within the execution engine. Below is a table of common expression rewrites and the optimization rules that are responsible for them. Many of these rules rewrite expressions to use specialized DuckDB functions so expression evaluation is much faster during execution. If an expression can be evaluated to `true` in the optimizer phase, there is no need to pass the original expression to the execution engine. In addition, the optimized expressions are more likely to allow DuckDB to make further improvements to the query plan. For example, the "Move constants" rule could enable filter pushdown to occur.

| Rewriter rule                  | Original expression                   | Optimized expression       |
| ------------------------------ | ------------------------------------- | -------------------------- |
| Move constants                 | `x + 1 = 6`                           | `x = 5`                    |
| Constant folding               | `2 + 2 = 4`                           | `true`                     |
| Conjunction simplification     | `(1 = 2 AND b)`                       | `false`                    |
| Arithmetic simplification      | `x * 1`                               | `x`                        |
| Case simplification            | `CASE WHEN true THEN x ELSE y END`    | `x`                        |
| Equal or `NULL` simplification | `a = b OR (a IS NULL AND b IS NULL)`  | `a IS NOT DISTINCT FROM b` |
| Distributivity                 | `(x AND b) OR (x AND c) OR (x AND d)` | `x AND (b OR c OR d)`      |
| Like optimization              | `regexp_matches(c, '^Prefix')`        | `LIKE 'Prefix%'`           |

##### Filter Pull-Up & Filter Pushdown

_Filter Pushdown_ was explained briefly above. _Filter Pull-Up_ is also important to identify cases where a filter can be applied on columns in other tables. For example, the query below scans column `a` from both `t1` and `t2`. `t1.a` has a filter, but in the presence of the equality condition, `t2.a` can have the same filter. For example:

```sql
SELECT *
FROM t1, t2
WHERE t1.a = t2.a
  AND t1.a = 50;
```

This can be optimized to:

```sql
SELECT *
FROM t1, t2
WHERE t1.a = t2.a
  AND t1.a = 50
  AND t2.a = 50;
```

_Filter Pull-Up_ pulls up the filter `t1.a = 50` above the join, and when the filter is pushed down again, the optimizer rule recognizes the filter can be applied to both columns `t1.a` and `t2.a`.

##### IN Clause Rewriter

If there is a filter with an `IN` clause, sometimes it can be re-written so execution is more efficient. Some examples are below:

| Original          | Optimized             |
| ----------------- | --------------------- |
| `c1 IN (1)`       | `c1 = 1`              |
| `c1 IN (3, 4, 5)` | `c1 >= 3 AND c1 <= 5` |

In addition, the _IN Clause Rewriter_ will transform expensive `IN` expressions into `MARK` joins. If a query has an expression like `c1 IN (x1, ..., xn)` where `n` is quite large, it can be expensive to evaluate this expression for every row in the table. The runtime would be `O(n * m)` where `n` is the number of rows and `m` is the length of the list. The `IN` clause rewriter will transform the expression into `SELECT c1 FROM t1, VALUES (x1, ..., xn) t(c0) WHERE c1 = c0` turning the expression into a `HASH` join that can complete in `O(n + m)` time!

##### Join Order Optimizer

The _Join Order Optimizer_ can provide an enormous performance benefit by limiting the number of intermediate tuples that are processed between joins. By processing fewer intermediate tuples, the query can execute faster.

##### Statistics Propagation

_Statistics Propagation_ is another optimization that works even when the state of the data changes. By traversing the query plan and keeping note of all equality join conditions, the Statistics Propagation optimizer can create new filters by inspecting the statistics of the columns that are eventually joined. For example, suppose `t1.a` and `t2.a` will be joined with the equality condition `t1.a = t2.a`.  If our internal statistics tell us `t1.a` has a maximum value of `50` and a minimum value of `25`, the optimizer can create a new filter when scanning table `t2`. The filter would be `t2.a >= 25 AND t2.a <= 50`.

##### Reorder Filters

If there are multiple filters on a column, the order in which these filters are executed also becomes important. It's best to execute the most efficient filters first, saving execution of expensive filters for later. For example, DuckDB can evaluate equality very quickly. So for a query like `... WHERE a = 50 AND md5(b) LIKE '%d77%'`, the optimizer will tell DuckDB to evaluate `a = 50` on every column first. If the value in column `a` passes the check `a = 50`, DuckDB will evaluate the `md5` hash for the values in column `b`.

#### Conclusion

A well-written optimizer can provide significant performance improvements when allowed to optimize freely. Not only can the optimizer apply the many optimization rules a human might naturally miss, an optimizer can respond to changes in the data. Some optimizations can result in a performance improvement of 100×, which might be the difference when deciding to use analytical system _A_ vs. analytical system _B_. With DuckDB, all optimization rules are applied automatically to every query, so you can continually enjoy the benefits. Hopefully this blog post has convinced you to consider the optimizer next time you hear about the next database that has everyone's ears burning.

## Runtime-Extensible SQL Parsers Using PEG

**Publication date:** 2024-11-22

**Authors:** Hannes Mühleisen, Mark Raasveldt

**TL;DR:** Despite their central role in processing queries, parsers have not received any noticeable attention in the data systems space. State-of-the art systems are content with ancient old parser generators. These generators create monolithic, inflexible and unforgiving parsers that hinder innovation in query languages and frustrate users. Instead, parsers should be rewritten using modern abstractions like Parsing Expression Grammars (PEG), which allow dynamic changes to the accepted query syntax and better error recovery. In this post, we discuss how parsers could be re-designed using PEG, and validate our recommendations using experiments for both effectiveness and efficiency.

> **Update.** In March 2026, DuckDB v1.5 [shipped an experimental parser](https://duckdb.org/2026/03/09/announcing-duckdb-150#peg-parser). You can opt-in to use it via:
>
> ```sql
> CALL enable_peg_parser();
> ```

> This post is a shortened version of our peer-reviewed research paper "Runtime-Extensible Parsers" that was accepted for publication and presentation at the [2025 Conference on Innovative Data Systems Research](https://www.cidrdb.org/cidr2025/index.html) (CIDR) that is going to be held in Amsterdam between January 19 and 22, 2025. You can [read the full paper](https://duckdb.org/pdf/CIDR2025-muehleisen-raasveldt-extensible-parsers.pdf) if you prefer.

The parser is the DBMS component that is responsible for turning a query in string format into an internal representation which is usually tree-shaped. The parser defines which queries are going to be accepted at all. Every single SQL query starts its journey in a parser. Despite its prominent position in the stack, very little research has been published on parsing queries for data management systems. There seems to have been very little movement on the topic in the past decades and their implementations are largely stuck in sixty-year-old abstractions and technologies.

The constant growth of the SQL specification with niche features (e.g., support for graph queries in SQL/PGQ or XML support) as well as the desire to support alternative query notations like dplyr, [piped SQL](https://cloud.google.com/blog/products/data-analytics/simplify-your-sql-with-pipe-syntax-in-bigquery-and-cloud-logging), [PRQL](https://prql-lang.org) or [SaneQL](https://www.cidrdb.org/cidr2024/papers/p48-neumann.pdf) makes monolithic parsers less and less practical: in their traditional design, parser construction is a *compile-time* activity where enormous grammar files are translated into state machine transition lookup tables which are then baked in a system binary.  Having those *always* be present in the parser might be wasteful especially for size-conscious binary distributions like WebAssembly (Wasm).

Many if not most SQL systems use a static parser created using a [YACC-style](http://www.nylxs.com/docs/lexandyacc.pdf) parser toolkit: we are able to easily confirm this for open-source systems like PostgreSQL and MySQL/MariaDB. From analyzing their binaries' symbol names, we also found indications that Oracle, SQL Server and IBM Db2 use YACC. Internally, YACC and its slightly more recent variant GNU Bison as well as the "Lemon" parser generator used by SQLite all use a "single look-ahead left-to-right rightmost derivation" LALR(1) parser generator. This generator translates a formal context-free set of grammar rules in Extended Backus-Naur Form (EBNF) to a parser state machine. [LALR parsers](https://publications.csail.mit.edu/lcs/pubs/pdf/MIT-LCS-TR-065.pdf) are a more space-efficient specialization of LR(k) parsers as first described by [Knuth](https://harrymoreno.com/assets/greatPapersInCompSci/2.5_-_On_the_translation_of_languages_from_left_to_right-Donald_E._Knuth.pdf). But in effect, **the most advanced SQL systems of 2024 use parser technology from the 1960s**. Given that the rest of data management systems have been greatly overhauled since this should raise the question of why the parser did not receive any serious engineering attention.

Database systems are moving towards becoming *ecosystems* instead of pre-built monoliths. Much of the innovation in the PostgreSQL, SQLite, and DuckDB communities now comes from [extensions](https://www.pdl.cmu.edu/PDL-FTP/Database/CMU-CS-23-144.pdf), which are shared libraries that are loaded into the database system at run-time to extend the database system with features like vector similarity search, geospatial support, file systems, or graph processing. Bundling all those features upfront would be difficult due to additional binary size, external dependencies. In addition, they are often maintained independently by their communities. Thus far, at least in part due to the ubiquity of YACC-style parsers, those community extensions have been restricted from extending syntax. While this is also true in other ecosystems like Python, the design of SQL with its heavy focus on syntax and not function calls makes the extensions second-class citizens that have to somehow work around the restrictions by the original parser, e.g., by embedding custom expressions in strings.

We propose to *re-think data management system parser design* to create modern, *extensible* parsers, which allow a dynamic configuration of the accepted syntax *at run-time*, for example to allow syntax extensions, new statements, or to add entirely new query languages. This would allow to break up the monolithic grammars currently in use and enable more creativity and flexibility in what syntax a data management system can accept, both for industrial and research use. Extensible parsers allow for new grammar features to be easily integrated and tested, and can also help bridge the gap between different SQL dialects by adding support for the dialect of one system to the parser of another. Conversely, it might also be desirable in some use cases to *restrict* the acceptable grammar, e.g., to restrict the complexity of queries, or to enforce strict compliance with the SQL standard.

Modernizing parser infrastructure also has additional benefits: one of the most-reported support issues with data management systems are unhelpful syntax errors. Some systems go to great lengths to try to provide a meaningful error message, e.g., `this column does not exist, did you mean ...`, but this is typically limited to resolving identifiers following the actual parsing. YACC-style parsers exhibit "all-or-nothing" behavior, the *entire* query or set of queries either is accepted entirely or not at all. This is why queries with actual syntactical errors (e.g., `SELEXT` instead of `SELECT` are usually harshly rejected by a DBMS. MySQL for example is notorious for its unhelpful error messages:

```console
You have an error in your SQL syntax; check the manual that corresponds
to your MySQL server version for the right syntax to use near 'SELEXT'
at line 1.
```

#### Parsing Expression Grammar

[Parsing Expression Grammar](https://en.wikipedia.org/wiki/Parsing_expression_grammar) (PEG) parsers represent a more modern approach to parsing. PEG parsers are top-down parsers that effectively generate a recursive-descent style parser from a grammar. Through the "packrat" memoization technique PEG parsers exhibit linear time complexity in parsing at the expense of a grammar-dependent amount of extra memory. The biggest difference from a grammar author perspective is the choice operator where multiple syntax options can be matched. In LALR parsers options with similar syntax can create ambiguity and reduce conflicts. In PEG parsers the *first* matching option is always selected. Because of this, PEG parsers cannot be ambiguous by design.

As their name suggests, parsing expression grammar consists of a set of *parsing expressions*. Expressions can contain references to other rules, or literal token references, both as actual strings or character classes similar to regular expressions. Expressions can be combined through sequences, quantifiers, optionals, groupings and both positive and negative look-ahead. Each expression can either match or not, but it is required to consume a part of the input if it matches. Expressions are able to look ahead and consider the remaining input but are not required to consume it. Lexical analysis is typically part of the PEG parser itself, which removes the need for a separate step.

One big advantage is that PEG parsers *do not require a compilation step* where the grammar is converted to for example a finite state automaton based on lookup tables. PEG can be executed directly on the input with minimal grammar transformation, making it feasible to re-create a parser at runtime. PEG parsers are gaining popularity, for example, the Python programming language has [recently switched to a PEG parser](https://peps.python.org/pep-0617/).

Another big advantage of PEG parsers is *error handling*: the paper ["Syntax Error Recovery in Parsing Expression Grammars"](https://arxiv.org/pdf/1806.11150.pdf) describes a practical technique where parser rules are annotated with "recovery" actions, which can (1) show more than a single error and (2) annotate errors with a more meaningful error message.

A possible disadvantage of memoized packrat parsing is the memory required for memoization: the amount required is *proportional to the input size*, not the stack size. Of course, memory limitations have relaxed significantly since the invention of LALR parsers sixty years ago and queries typically are not "Big Data"` themselves.

#### Proof-of-Concept Experiments

To perform experiments on parser extensibility, we have implemented an – admittedly simplistic – experimental prototype PEG parser for enough of SQL to parse *all* the TPC-H and TPC-DS queries. This grammar is compatible with the `cpp-peglib` [single-header C++17 PEG execution engine](https://github.com/yhirose/cpp-peglib).

`cpp-peglib` uses a slightly different grammar syntax, where `/` is used to denote choices. The symbol `?` shows an optional element, and `*` defines arbitrary repetition. The special rules `Parens()` and `List()` are grammar macros that simplify the grammar for common elements. The special `%whitespace` rule is used to describe tokenization.

Below is an abridged version of our experimental SQL grammar, with the `Expression` and `Identifier` syntax parsing rules omitted for brevity:

```text
Statements <- SingleStmt (';' SingleStmt )* ';'*
SingleStmt <- SelectStmt
SelectStmt <- SimpleSelect (SetopClause SimpleSelect)*
SetopClause <-
    ('UNION' / 'EXCEPT' / 'INTERSECT') 'ALL'?
SimpleSelect <- WithClause? SelectClause FromClause?
    WhereClause? GroupByClause? HavingClause?
    OrderByClause? LimitClause?
WithStatement <- Identifier 'AS' SubqueryReference
WithClause <- 'WITH' List(WithStatement)
SelectClause <- 'SELECT' ('*' / List(AliasExpression))
ColumnsAlias <- Parens(List(Identifier))
TableReference <-
    (SubqueryReference 'AS'? Identifier ColumnsAlias?) /
    (Identifier ('AS'? Identifier)?)
ExplicitJoin <- ('LEFT' / 'FULL')? 'OUTER'?
    'JOIN' TableReference 'ON' Expression
FromClause <- 'FROM' TableReference
    ((',' TableReference) / ExplicitJoin)*
WhereClause <- 'WHERE' Expression
GroupByClause <- 'GROUP' 'BY' List(Expression)
HavingClause <- 'HAVING' Expression
SubqueryReference <- Parens(SelectStmt)
OrderByExpression <- Expression ('DESC' / 'ASC')?
    ('NULLS' 'FIRST' / 'LAST')?
OrderByClause <- 'ORDER' 'BY' List(OrderByExpression)
LimitClause <- 'LIMIT' NumberLiteral
AliasExpression <- Expression ('AS'? Identifier)?
%whitespace <- [ \t\n\r]*
List(D) <- D (',' D)*
Parens(D) <- '(' D ')'
```

All experiments were run on a 2021 MacBook Pro with the M1 Max CPU and 64 GB of RAM. The experimental grammar and the code for experiments are [available on GitHub](https://github.com/hannes/peg-parser-experiments).

Loading the base grammar from its text representation into the `cpp-peglib` grammar dictionary with symbolic rule representations takes 3 ms. In case that delay should become an issue, the library also allows to define rules programmatically instead of as strings. It would be straightforward to pre-compile the grammar file into source code for compilation, YACC-style. While somewhat counter-intuitive, it would reduce the time required to initialize the initial, unmodified parser. This difference matters for some applications of e.g., DuckDB where the database instance only lives for a few short milliseconds.

For the actual parsing, YACC parses TPC-H Query 1 in ca. 0.03 ms, where `cpp-peglib` takes ca. 0.3 ms, a ca. 10 times increase. To further stress parsing performance, we repeated all TPC-H and TPC-DS queries six times to create a 36,840 line SQL script weighing in at ca. 1 MB. Note that a [recent study](https://www.amazon.science/publications/why-tpc-is-not-enough-an-analysis-of-the-amazon-redshift-fleet) has found that the 99-percentile of read queries in the Amazon Redshift cloud data warehouse are smaller than 16.5 kB.

Postgres takes on average 24 ms to parse this file using YACC. Note that this time includes the execution of grammar actions that create Postgres' parse tree. `cpp-peglib` takes on average 266 ms to parse the test file. However, our experimental parser does not have grammar actions defined yet. When simulating actions by generating default AST actions for every rule, parsing time increases to 339 ms. Note that the AST generation is more expensive than required, because a node is created for each matching rule, even if there is no semantic meaning in the grammar at hand.

Overall, we can observe a ca. 10 times slowdown in parsing performance when using the `cpp-peglib` parser. However, it should be noted that the *absolute duration* of those two processes is still tiny; at least for analytical queries, sub-millisecond parsing time is more than acceptable as parsing still only accounts for a tiny fraction of overall query processing time. Furthermore, there are still ample optimization opportunities in the experimental parsers we created using an off-the-shelf PEG library. For example, the library makes heavy use of recursive function calls, which can be optimized e.g., by using a loop abstraction.

In the following, we present some experiments in extending the prototype parser with support for new statements, entirely new syntax and with improvements in error messages.

> It is already possible to replace DuckDB's parser by providing an alternative parser.
> Several community extensions such as [`duckpgq`](#community_extensions:extensions:duckpgq), [`prql`](#community_extensions:extensions:prql) and [`psql`](#community_extensions:extensions:psql) use this approach.
> When trying to parse a query string, DuckDB first attempts to use the default parser.
> If this fails, it switches to the extension parsers as failover.
> Therefore, these extensions cannot simply extend the parser with a few extra rules – instead, they implement the complete grammar of their target language.

##### Adding the `UNPIVOT` Statement

Let's assume we would want to add a new top-level `UNPIVOT` statement to turn columns into rows to a SQL dialect. `UNPIVOT` should work on the same level as e.g., `SELECT`, for example to unpivot a table `t1` on a specific list of columns or all columns (` *`), we would like to be able to write:

```sql
UNPIVOT t1 ON (c1, c2, c3);
UNPIVOT t1 ON (*);
```

It is clear that we would have to somehow modify the parser to allow this new syntax. However, when using a YACC parser, this would require modifying the grammar, re-running the parser generator, hoping for the absence of shift-reduce conflicts, and then recompiling the actual database system. However, this is not practical at run-time which is when extensions are loaded, ideally within milliseconds.

In order to add `UNPIVOT`, we have to define a grammar rule and then modify `SingleStmt` to allow the statement in a global sequence of SQL statements. This is shown below. We define the new `UnpivotStatement` grammar rule by adding it to the dictionary, and we then modify the `SingleStmt` rule entry in the dictionary to also allow the new statement.

```text
UnpivotStatement <- 'UNPIVOT' Identifier
    'ON' Parens(List(Identifier) / '*')

SingleStmt <- SelectStatement / UnpivotStatement
```

Note that we re-use other machinery from the grammar like the `Identifier` rule as well as the `Parens()` and `List()` macros to define the `ON` clause. The rest of the grammar dictionary remains unchanged. After modification, the parser can be re-initialized in another 3 ms. Parser execution time was unaffected.

##### Extending `SELECT` with `GRAPH_TABLE`

Let's now assume we would want to modify the `SELECT` syntax to add support for [SQL/PGQ graph matching patterns](https://arxiv.org/pdf/2112.06217.pdf). Below is an example query in SQL/PGQ that finds the university name and year for all students called Bob:

```sql
SELECT study.classYear, study.name
FROM GRAPH_TABLE (pg,
    MATCH
        (a:Person WHERE a.firstName = 'Bob')-[s:studyAt]->(u:University)
        COLUMNS (s.classYear, u.name)
) study;
```

We can see that this new syntax adds the `GRAPH_TABLE` clause and the pattern matching domain-specific language (DSL) within. To add support for this syntax to a SQL parser at runtime, we need to modify the grammar for the `SELECT` statement itself. This is fairly straightforward when using a PEG. We replace the rule that describes the `FROM` clause to also accept a sub-grammar starting at the `GRAPH_TABLE` keyword following by parentheses. Because the parser does not need to generate a state machine, we are immediately able to accept the new syntax.

Below we show a small set of grammar rules that are sufficient to extend our experimental parser with support for the SQL/PGQ `GRAPH_TABLE` clause and the containing property graph patterns. With this addition, the parser can parse the query above. Parser construction and parser execution timings were unaffected.

```text
Name <- (Identifier? ':' Identifier) / Identifier
Edge <- ('-' / '<-') '[' Name ']' ('->' / '-')
Pattern <- Parens(Name WhereClause?) Edge
   Parens(Name WhereClause?)
PropertyGraphReference <- 'GRAPH_TABLE'i '('
        Identifier ','
        'MATCH'i List(Pattern)
        'COLUMNS'i Parens(List(ColumnReference))
    ')' Identifier?

TableReference <-
    PropertyGraphReference / ...
```

`dplyr`, the ["Grammar of Data Manipulation"](https://dplyr.tidyverse.org), is the de facto standard data transformation language in the R Environment for Statistical Computing. The language uses function calls and a special chaining operator (` %>%`) to combine operators. Below is an example dplyr query:

```R
df %>%
  group_by(species) %>%
  summarise(
    n = n(),
    mass = mean(mass, na.rm = TRUE)
  ) %>%
  filter(n > 1, mass > 50)
```

For those unfamiliar with dplyr, the query is equivalent to this SQL query:

```sql
SELECT * FROM (
    SELECT count(*) AS n, AVG(mass) AS mass
        FROM df
        GROUP BY species)
    WHERE n > 1 AND mass > 50;
```

With an extensible parser, it is feasible to add support for completely new query languages like `dplyr` to a SQL parser. Below is a simplified grammar snippet that enables our SQL parser to accept the `dplyr` example from above.

```text
DplyrStatement <- Identifier Pipe Verb (Pipe Verb)*
Verb <- VerbName Parens(List(Argument))
VerbName <- 'group_by' / 'summarise' / 'filter'
Argument <- Expression / (Identifier '=' Expression)
Pipe <- '%>%'

SingleStmt <- SelectStatement /
    UnpivotStatement / DplyrStatement
```

It is important to note that the rest of the experimental SQL parser *still works*, i.e., the `dplyr` syntax now *also* works. Parser construction and parser execution timings were again unaffected.

##### Better Error Messages

As mentioned above, PEG parsers are able to generate better error messages elegantly. A common novice SQL user mistake is to mix up the order of keywords in a query, for example, the `ORDER BY` must come after the `GROUP BY`. Assume an inexperienced user types the following query:

```sql
SELECT customer, SUM(sales)
FROM revenue
ORDER BY customer
GROUP BY customer;
```

By default, both the YACC and the PEG parsers will report a similar error message about an `unexpected 'GROUP' keyword` with a byte position. However, with a PEG parser we can define a "recovery" syntax rule that will create a useful error message. We modify the `OrderByClause` from our experimental grammar like so:

```text
OrderByClause <- 'ORDER'i 'BY'i List(OrderByExpression)
    %recover(WrongGroupBy)?
WrongGroupBy <- GroupByClause
    { error_message "GROUP BY must precede ORDER BY" }
```

Here, we use the `%recover` construct to match a misplaced `GROUP BY` clause, re-using the original definition, and then trigger a custom error message that advises the user on how to fix their query. And indeed, when we parse the wrong SQL example, the parser will output the custom message.

#### Conclusion and Future Work

In this post, we have proposed to modernize the ancient art of SQL parsing using more modern parser generators like PEG. We have shown how by using PEG, a parser can be extended at run-time at minimal cost without re-compilation. In our experiments we have demonstrated how minor grammar adjustments can fundamentally extend and change the accepted syntax.

An obvious next step is to address the observed performance drawback observed in our prototype. Using more efficient implementation techniques, it should be possible to narrow the gap in parsing performance between YACC-based LALR parsers and a dynamic PEG parser. Another next step is to address some detail questions for implementation: for example, parser extension load order should ideally not influence the final grammar. Furthermore, while parser actions can in principle execute arbitrary code, they may have to be restrictions on return types and input handling.

We plan to switch DuckDB's parser, which started as a fork of the Postgres YACC parser, to a PEG parser in the near future. As an initial step, we have performed an experiment where we found that it is possible to interpret the current Postgres YACC grammar with PEG. This should greatly simplify the transitioning process, since it ensures that the same grammar will be accepted in both parsing frameworks.

#### Acknowledgments

We would like to thank [**Torsten Grust**](https://db.cs.uni-tuebingen.de/team/members/torsten-grust/), [**Gábor Szárnyas**](https://szarnyasg.org/) and [**Daniël ten Wolde**](https://www.cwi.nl/en/people/daniel-ten-wolde/) for their valuable suggestions. We would also like to thank [**Carlo Piovesan**](https://github.com/carlopi) for his translation of the Postgres YACC grammar to PEG.

## DuckDB Tricks – Part 3

**Publication date:** 2024-11-29

**Authors:** Andra Ionescu, Gábor Szárnyas

**TL;DR:** In this new installment of the DuckDB Tricks series, we present features for convenient handling of tables and performance optimization tips for Parquet and CSV files.

#### Overview

We continue our DuckDB [Tricks](https://duckdb.org/2024/08/19/duckdb-tricks-part-1) [series](https://duckdb.org/2024/10/11/duckdb-tricks-part-2) with a third part,
where we showcase [friendly SQL features](#docs:lts:sql:dialect:friendly_sql) and performance optimizations.

| Operation                                                                         | SQL instructions                                                                                                            |
| --------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------- |
| [Excluding columns from a table](#::excluding-columns-from-a-table)                 | `EXCLUDE`{:.language-sql .highlight}/`COLUMNS(...)`{:.language-sql .highlight}, `NOT SIMILAR TO`{:.language-sql .highlight} |
| [Renaming columns with pattern matching](#::renaming-columns-with-pattern-matching) | `COLUMNS(...) AS ...`{:.language-sql .highlight}                                                                            |
| [Loading with globbing](#::loading-with-globbing)                                   | `FROM '*.csv'`{:.language-sql .highlight}                                                                                   |
| [Reordering Parquet files](#::reordering-parquet-files)                             | `COPY (FROM ... ORDER BY ...) TO ...`{:.language-sql .highlight}                                                            |
| [Hive partitioning](#::hive-partitioning)                                           | `hive_partitioning = true`{:.language-sql .highlight}                                                                       |

#### Dataset

We'll use a subset of the [Dutch railway services dataset](https://www.rijdendetreinen.nl/en/open-data/train-archive), which was already featured in a [blog post earlier this year](https://duckdb.org/2024/05/31/analyzing-railway-traffic-in-the-netherlands).
This time, we'll use the CSV files between January and October 2024: [`services-2024-01-to-10.zip`](https://blobs.duckdb.org/data/services-2024-01-to-10.zip).
If you would like to follow the examples, download and decompress the data set before proceeding.

#### Excluding Columns from a Table

First, let's look at the data in the CSV files.
We pick the CSV file for August and inspect it with the [`DESCRIBE` statement](#docs:lts:guides:meta:describe).

```sql
DESCRIBE FROM 'services-2024-08.csv';
```

The result is a table with the column names and the column types.



| column_name          | column_type | null | key  | default | extra |
| -------------------- | ----------- | ---- | ---- | ------- | ----- |
| Service:RDT-ID       | BIGINT      | YES  | NULL | NULL    | NULL  |
| Service:Date         | DATE        | YES  | NULL | NULL    | NULL  |
| Service:Type         | VARCHAR     | YES  | NULL | NULL    | NULL  |
| Service:Company      | VARCHAR     | YES  | NULL | NULL    | NULL  |
| Service:Train number | BIGINT      | YES  | NULL | NULL    | NULL  |
| ...                  | ...         | ...  | ...  | ...     | ...   |

Now, let's use [`SUMMARIZE`](#docs:lts:guides:meta:summarize) to inspect some statistics about the columns.

```sql
SUMMARIZE FROM 'services-2024-08.csv';
```

With `SUMMARIZE`, we get 10 statistics about our data (` min`, `max`, `approx_unique`, etc.).
If we want to remove a few of them the result, we can use the [`EXCLUDE` modifier](#docs:lts:sql:expressions:star::exclude-modifier).
For example, to exclude `min`, `max` and the quantiles `q25`, `q50`, `q75`, we can use issue the following command:

```sql
SELECT * EXCLUDE(min, max, q25, q50, q75) 
FROM (SUMMARIZE FROM 'services-2024-08.csv');
```

Alternatively, we can use the [`COLUMNS`](#docs:lts:sql:expressions:star::columns) expression with the [`NOT SIMILAR TO` operator](#docs:lts:sql:functions:pattern_matching::similar-to).
This works with a regular expression:

```sql
SELECT COLUMNS(lambda c: c NOT SIMILAR TO 'min|max|q.*') 
FROM (SUMMARIZE FROM 'services-2024-08.csv');
```

In both cases, the resulting table will contain the 5 remaining statistical columns:

| column_name          | column_type | approx_unique | avg               | std                |   count | null_percentage |
| -------------------- | ----------- | ------------: | ----------------- | ------------------ | ------: | --------------: |
| Service:RDT-ID       | BIGINT      |        259022 | 14200071.03736433 | 59022.836209662266 | 1846574 |            0.00 |
| Service:Date         | DATE        |            32 | NULL              | NULL               | 1846574 |            0.00 |
| Service:Type         | VARCHAR     |            20 | NULL              | NULL               | 1846574 |            0.00 |
| Service:Company      | VARCHAR     |            12 | NULL              | NULL               | 1846574 |            0.00 |
| Service:Train number | BIGINT      |         17264 | 57781.81688196628 | 186353.76365744913 | 1846574 |            0.00 |
| ...                  | ...         |           ... | ...               | ...                |     ... |             ... |

#### Renaming Columns with Pattern Matching

Upon inspecting the columns, we see that their names contain spaces and semicolons (` :`).
These special characters makes writing queries a bit tedious as they necessitate quoting column names with double quotes.
For example, we have to write `"Service:Company"` in the following query:

```sql
SELECT DISTINCT "Service:Company" AS company,
FROM 'services-2024-08.csv'
ORDER BY company;
```

Let's see how we can rename the columns using the `COLUMNS` expression.
To replace the special characters (up to 2), we can write the following query:

```sql
SELECT COLUMNS('(.*?)_*$') AS "\1"
FROM (
    SELECT COLUMNS('(\w*)\W*(\w*)\W*(\w*)') AS "\1_\2_\3"
    FROM 'services-2024-08.csv'
);
```

Add `DESCRIBE` at the beginning of the query and we can see the renamed columns:

| column_name          | column_type | null | key  | default | extra |
| -------------------- | ----------- | ---- | ---- | ------- | ----- |
| Service_RDT_ID       | BIGINT      | YES  | NULL | NULL    | NULL  |
| Service_Date         | DATE        | YES  | NULL | NULL    | NULL  |
| Service_Type         | VARCHAR     | YES  | NULL | NULL    | NULL  |
| Service_Company      | VARCHAR     | YES  | NULL | NULL    | NULL  |
| Service_Train_number | BIGINT      | YES  | NULL | NULL    | NULL  |
| ...                  | ...         | ...  | ...  | ...     | ...   |

Let's break down the query starting with the first `COLUMNS` expression:

```sql
SELECT COLUMNS('(\w*)\W*(\w*)\W*(\w*)') AS "\1_\2_\3"
```

Here, we use regular expression with `(\w*)` groups that capture 0...n word characters (` [0-9A-Za-z_]`).
Meanwhile, the expression `\W*` captures 0...n non-word characters (` [^0-9A-Za-z_]`).
In the alias part we refer to the capture group `i` with `\i` so `"\1_\2_\3"` means that we only keep the word characters and separate their groups with underscores (` _`).
However, because some column names contain words separated by a space, while others don't, after this `SELECT` statement we get column names with a trailing underscore (` _`), 
e.g., `Service_Date_`.
Thus, we need an additional processing step:

```sql
SELECT COLUMNS('(.*?)_*$') AS "\1"
```

Here, we capture the group of characters without the trailing underscore(s) and rename the columns to `\1`, which removes the trailing underscores.

To make writing queries even more convenient, we can rely on the [case-insensitivity of identifiers](#docs:lts:sql:dialect:keywords_and_identifiers::case-sensitivity-of-identifiers) to query the column names in lowercase:

```sql
SELECT DISTINCT service_company
FROM (
    SELECT COLUMNS('(.*?)_*$') AS "\1"
    FROM (
       SELECT COLUMNS('(\w*)\W*(\w*)\W*(\w*)') AS "\1_\2_\3"
       FROM 'services-2024-08.csv'
    )
)
ORDER BY service_company;
```

| Service_Company |
| --------------- |
| Arriva          |
| Blauwnet        |
| Breng           |
| DB              |
| Eu Sleeper      |
| ...             |

> The returned column name preserves its original cases even though we used lowercase letters in the query.

#### Loading with Globbing

Now that we can simplify the column names, let's ingest all 3 months of data to a table:

```sql
CREATE OR REPLACE TABLE services AS
    SELECT COLUMNS('(.*?)_*$') AS "\1" 
    FROM (
        SELECT COLUMNS('(\w*)\W*(\w*)\W*(\w*)') AS "\1_\2_\3" 
        FROM 'services-2024-*.csv'
    );
```

In the inner `FROM` clause, we use the [`*` glob syntax](#docs:lts:sql:functions:pattern_matching::globbing) to match all files.
DuckDB automatically detects that all files have the same schema and unions them together.
We have now a table with all the data from January to October, amounting to almost 20 million rows.

#### Reordering Parquet Files

Suppose we want to analyze the average delay of the [Intercity Direct trains](https://en.wikipedia.org/wiki/Intercity_Direct) operated by the [Nederlandse Spoorwegen (NS)](https://en.wikipedia.org/wiki/Nederlandse_Spoorwegen), measured at the final destination of the train service.
While we can run this analysis directly on the `.csv` files, the lack of metadata (such as schema and min-max indexes) will limit the performance.
Let's measure this in the CLI client by turning on the [timer](#docs:lts:clients:cli:dot_commands):

```sql
.timer on
```

```sql
SELECT avg("Stop:Arrival delay")
FROM 'services-*.csv'
WHERE "Service:Company" = 'NS'
  AND "Service:Type" = 'Intercity direct'
  AND "Stop:Departure time" IS NULL;
```

This query takes about 1.8 seconds. Now, if we run the same query on `services` table that's already loaded to DuckDB, the query is much faster:

```sql
SELECT avg(Stop_Arrival_delay)
FROM services
WHERE Service_Company = 'NS'
  AND Service_Type = 'Intercity direct'
  AND Stop_Departure_time IS NULL;
```

The run time is about 35 milliseconds.

If we would like to use an external binary file format, we can also export the database to a single Parquet file:

```sql
EXPORT DATABASE 'railway' (FORMAT parquet);
```

We can then directly query it as follows:

```sql
SELECT avg(Stop_Arrival_delay)
FROM 'railway/services.parquet'
WHERE Service_Company = 'NS'
  AND Service_Type = 'Intercity direct'
  AND Stop_Departure_time IS NULL;
```

The runtime for this format is about 90 milliseconds – somewhat slower than DuckDB's own file format but about 20× faster than reading the raw CSV files.

If we have a priori knowledge of the fields a query filters on, we can reorder the Parquet file to improve query performance.

```sql
COPY
(FROM 'railway/services.parquet' ORDER BY Service_Company, Service_Type)
TO 'railway/services.parquet';
```

If we run the query again, it's noticeably faster, taking only 35 milliseconds.
This is thanks to [partial reading](#docs:lts:data:parquet:overview::partial-reading), which uses the zonemaps (min-max indexes) to limit the amount of data that has to be scanned.
Reordering the file allows DuckDB to skip more data, leading to faster query times.

#### Hive Partitioning

To speed up queries even further, we can use [Hive partitioning](#docs:lts:data:partitioning:hive_partitioning) to create a directory layout on disk that matches the filtering used in the queries.

```sql
COPY services
TO 'services-parquet-hive'
(FORMAT parquet, PARTITION_BY (Service_Company, Service_Type));
```

Let's peek into the directory from DuckDB's CLI using the [`.sh` dot command](#docs:lts:clients:cli:dot_commands):

```sql
.sh tree services-parquet-hive
```

```text
services-parquet-hive
├── Service_Company=Arriva
│   ├── Service_Type=Extra%20trein
│   │   └── data_0.parquet
│   ├── Service_Type=Nachttrein
│   │   └── data_0.parquet
│   ├── Service_Type=Snelbus%20ipv%20trein
│   │   └── data_0.parquet
│   ├── Service_Type=Sneltrein
│   │   └── data_0.parquet
│   ├── Service_Type=Stopbus%20ipv%20trein
│   │   └── data_0.parquet
│   ├── Service_Type=Stoptrein
│   │   └── data_0.parquet
│   └── Service_Type=Taxibus%20ipv%20trein
│       └── data_0.parquet
├── Service_Company=Blauwnet
│   ├── Service_Type=Intercity
│   │   └── data_0.parquet
...
```

We can now run the query on the Hive partitioned data set by passing the `hive_partitioning = true` flag:

```sql
SELECT avg(Stop_Arrival_delay)
FROM read_parquet(
         'services-parquet-hive/**/*.parquet',
         hive_partitioning = true
     )
WHERE Service_Company = 'NS'
  AND Service_Type = 'Intercity direct'
  AND Stop_Departure_time IS NULL;
```

This query now takes about 20 milliseconds as DuckDB can use the directory structure to limit the reads even further.
And the neat thing about Hive partitioning is that it even works with CSV files!

```sql
COPY services
TO 'services-csv-hive'
(FORMAT csv, PARTITION_BY (Service_Company, Service_Type));

SELECT avg(Stop_Arrival_delay)
FROM read_csv('services-csv-hive/**/*.csv', hive_partitioning = true)
WHERE Service_Company = 'NS'
  AND Service_Type = 'Intercity direct'
  AND Stop_Departure_time IS NULL;
```

While the CSV files lack any sort of metadata, DuckDB can rely on the directory structure to limit the scans to the relevant directories,
resulting in execution times around 150 milliseconds, more than 10× faster compared to reading all CSV files.

If all these formats and results got your head spinning, no worries.
We got your covered with this summary table:

| Format                     | Query runtime (ms) |
| -------------------------- | -----------------: |
| DuckDB file format         |                 35 |
| CSV (vanilla)              |               1800 |
| CSV (Hive-partitioned)     |                150 |
| Parquet (vanilla)          |                 90 |
| Parquet (reordered)        |                 35 |
| Parquet (Hive-partitioned) |                 20 |

Oh, and we forgot to report the result. The average delay of Intercity Direct trains is 3 minutes!

#### Closing Thoughts

That's it for part three of DuckDB tricks. If you have a trick that would like to share, please share it with the DuckDB team on our social media sites, or submit it to the [DuckDB Snippets site](https://duckdbsnippets.com/) (maintained by our friends at MotherDuck).

## CSV Files: Dethroning Parquet as the Ultimate Storage File Format — or Not?

**Publication date:** 2024-12-05

**Author:** Pedro Holanda

**TL;DR:** Data analytics primarily uses two types of storage format files: human-readable text files like CSV and performance-driven binary files like Parquet. This blog post compares these two formats in an ultimate showdown of performance and flexibility, where there can be only one winner.

#### File Formats

##### CSV Files

Data is most [commonly stored](https://www.vldb.org/pvldb/vol17/p3694-saxena.pdf) in human-readable file formats, like JSON or CSV files. These file formats are easy to operate on, since anyone with a text editor can simply open, alter, and understand them.

For many years, CSV files have had a bad reputation for being slow and cumbersome to work with. In practice, if you want to operate on a CSV file using your favorite database system, you must follow this recipe:

1. Manually discover its schema by opening the file in a text editor.
2. Create a table with the given schema.
3. Manually figure out the dialect of the file (e.g., which character is used for a quote?)
4. Load the file into the table using a `COPY` statement and with the dialect set.
5. Start querying it.

Not only is this process tedious, but parallelizing a CSV file reader is [far from trivial](https://www.microsoft.com/en-us/research/uploads/prod/2019/04/chunker-sigmod19.pdf). This means most systems either process it single-threaded or use a two-pass approach.

Additionally, [CSV files are wild](https://youtu.be/YrqSp8m7fmk?si=v5rmFWGJtpiU5_PX&t=624): although [RFC-4180](https://www.ietf.org/rfc/rfc4180.txt) exists as a CSV standard, it is [commonly ignored](https://aic.ai.wu.ac.at/~polleres/publications/mitl-etal-2016OBD.pdf). Systems must therefore be sufficiently robust to handle these files as if they come straight from the wild west.

Last but not least, CSV files are wasteful: data is always laid out as strings. For example, numeric values like `1000000000` take 10 bytes instead of 4 bytes if stored as an `int32`. Additionally, since the data layout is row-wise, opportunities to apply [lightweight columnar compression](https://duckdb.org/2022/10/28/lightweight-compression) are lost.

##### Parquet Files

Due to these shortcomings, performance-driven file formats like Parquet have gained significant popularity in recent years. Parquet files cannot be opened by general text editors, cannot be easily edited, and have a rigid schema. However, they store data in columns, apply various compression techniques, partition the data into row groups, maintain statistics about these row groups, and define their schema directly in the file.

These features make Parquet a monolith of a file format — highly inflexible but efficient and fast. It is easy to read data from a Parquet file since the schema is well-defined. Parallelizing a scanner is straightforward, as each thread can independently process a row group. Filter pushdown is also simple to implement, as each row group contains statistical metadata, and the file sizes are very small.

The conclusion should be simple: if you have small files and need flexibility, CSV files are fine. However, for data analysis, one should pivot to Parquet files, right? Well, this pivot may not be a hard requirement anymore – read on to find out why!

#### Reading CSV Files in DuckDB

For the past few releases, DuckDB has doubled down on delivering not only an easy-to-use CSV scanner but also an extremely performant one. This scanner features its own custom [CSV sniffer](https://duckdb.org/2023/10/27/csv-sniffer), parallelization algorithm, buffer manager, casting mechanisms, and state machine-based parser.

For usability, the previous paradigm of manual schema discovery and table creation has been changed. Instead, DuckDB now utilizes a CSV Sniffer, similar to those found in dataframe libraries like Pandas.
This allows for querying CSV files as easily as:

```sql
FROM 'path/to/file.csv';
```

Or tables to be created from CSV files, without any prior schema definition with:

```sql
CREATE TABLE t AS FROM 'path/to/file.csv';
```

Furthermore, the reader became one of the fastest CSV readers in analytical systems, as can be seen by the load times of the [latest iteration](https://github.com/ClickHouse/ClickBench/commit/0aba4247ce227b3058d22846ca39826d27262fe0) of [ClickBench](https://benchmark.clickhouse.com/#eyJzeXN0ZW0iOnsiQWxsb3lEQiI6ZmFsc2UsIkFsbG95REIgKHR1bmVkKSI6ZmFsc2UsIkF0aGVuYSAocGFydGl0aW9uZWQpIjpmYWxzZSwiQXRoZW5hIChzaW5nbGUpIjpmYWxzZSwiQXVyb3JhIGZvciBNeVNRTCI6ZmFsc2UsIkF1cm9yYSBmb3IgUG9zdGdyZVNRTCI6ZmFsc2UsIkJ5Q29uaXR5IjpmYWxzZSwiQnl0ZUhvdXNlIjpmYWxzZSwiY2hEQiAoRGF0YUZyYW1lKSI6ZmFsc2UsImNoREIgKFBhcnF1ZXQsIHBhcnRpdGlvbmVkKSI6ZmFsc2UsImNoREIiOmZhbHNlLCJDaXR1cyI6ZmFsc2UsIkNsaWNrSG91c2UgQ2xvdWQgKGF3cykiOmZhbHNlLCJDbGlja0hvdXNlIENsb3VkIChhenVyZSkiOmZhbHNlLCJDbGlja0hvdXNlIENsb3VkIChnY3ApIjpmYWxzZSwiQ2xpY2tIb3VzZSAoZGF0YSBsYWtlLCBwYXJ0aXRpb25lZCkiOmZhbHNlLCJDbGlja0hvdXNlIChkYXRhIGxha2UsIHNpbmdsZSkiOmZhbHNlLCJDbGlja0hvdXNlIChQYXJxdWV0LCBwYXJ0aXRpb25lZCkiOmZhbHNlLCJDbGlja0hvdXNlIChQYXJxdWV0LCBzaW5nbGUpIjpmYWxzZSwiQ2xpY2tIb3VzZSAod2ViKSI6ZmFsc2UsIkNsaWNrSG91c2UiOnRydWUsIkNsaWNrSG91c2UgKHR1bmVkKSI6dHJ1ZSwiQ2xpY2tIb3VzZSAodHVuZWQsIG1lbW9yeSkiOnRydWUsIkNsb3VkYmVycnkiOmZhbHNlLCJDcmF0ZURCIjpmYWxzZSwiQ3J1bmNoeSBCcmlkZ2UgZm9yIEFuYWx5dGljcyAoUGFycXVldCkiOmZhbHNlLCJEYXRhYmVuZCI6dHJ1ZSwiRGF0YUZ1c2lvbiAoUGFycXVldCwgcGFydGl0aW9uZWQpIjpmYWxzZSwiRGF0YUZ1c2lvbiAoUGFycXVldCwgc2luZ2xlKSI6ZmFsc2UsIkFwYWNoZSBEb3JpcyI6ZmFsc2UsIkRyaWxsIjpmYWxzZSwiRHJ1aWQiOmZhbHNlLCJEdWNrREIgKERhdGFGcmFtZSkiOmZhbHNlLCJEdWNrREIgKG1lbW9yeSkiOnRydWUsIkR1Y2tEQiAoUGFycXVldCwgcGFydGl0aW9uZWQpIjpmYWxzZSwiRHVja0RCIjpmYWxzZSwiRWxhc3RpY3NlYXJjaCI6ZmFsc2UsIkVsYXN0aWNzZWFyY2ggKHR1bmVkKSI6ZmFsc2UsIkdsYXJlREIiOmZhbHNlLCJHcmVlbnBsdW0iOmZhbHNlLCJIZWF2eUFJIjpmYWxzZSwiSHlkcmEiOmZhbHNlLCJJbmZvYnJpZ2h0IjpmYWxzZSwiS2luZXRpY2EiOmZhbHNlLCJNYXJpYURCIENvbHVtblN0b3JlIjpmYWxzZSwiTWFyaWFEQiI6ZmFsc2UsIk1vbmV0REIiOmZhbHNlLCJNb25nb0RCIjpmYWxzZSwiTW90aGVyRHVjayI6ZmFsc2UsIk15U1FMIChNeUlTQU0pIjpmYWxzZSwiTXlTUUwiOmZhbHNlLCJPY3RvU1FMIjpmYWxzZSwiT3hsYSI6ZmFsc2UsIlBhbmRhcyAoRGF0YUZyYW1lKSI6ZmFsc2UsIlBhcmFkZURCIChQYXJxdWV0LCBwYXJ0aXRpb25lZCkiOmZhbHNlLCJQYXJhZGVEQiAoUGFycXVldCwgc2luZ2xlKSI6ZmFsc2UsInBnX2R1Y2tkYiAoTW90aGVyRHVjayBlbmFibGVkKSI6ZmFsc2UsInBnX2R1Y2tkYiI6ZmFsc2UsIlBpbm90IjpmYWxzZSwiUG9sYXJzIChEYXRhRnJhbWUpIjpmYWxzZSwiUG9sYXJzIChQYXJxdWV0KSI6ZmFsc2UsIlBvc3RncmVTUUwgKHR1bmVkKSI6ZmFsc2UsIlBvc3RncmVTUUwiOmZhbHNlLCJRdWVzdERCIjp0cnVlLCJSZWRzaGlmdCI6ZmFsc2UsIlNlbGVjdERCIjpmYWxzZSwiU2luZ2xlU3RvcmUiOmZhbHNlLCJTbm93Zmxha2UiOmZhbHNlLCJTcGFyayI6ZmFsc2UsIlNRTGl0ZSI6ZmFsc2UsIlN0YXJSb2NrcyI6ZmFsc2UsIlRhYmxlc3BhY2UiOmZhbHNlLCJUZW1ibyBPTEFQIChjb2x1bW5hcikiOmZhbHNlLCJUaW1lc2NhbGUgQ2xvdWQiOmZhbHNlLCJUaW1lc2NhbGVEQiAobm8gY29sdW1uc3RvcmUpIjpmYWxzZSwiVGltZXNjYWxlREIiOmZhbHNlLCJUaW55YmlyZCAoRnJlZSBUcmlhbCkiOmZhbHNlLCJVbWJyYSI6dHJ1ZX0sInR5cGUiOnsiQyI6dHJ1ZSwiY29sdW1uLW9yaWVudGVkIjp0cnVlLCJQb3N0Z3JlU1FMIGNvbXBhdGlibGUiOnRydWUsIm1hbmFnZWQiOnRydWUsImdjcCI6dHJ1ZSwic3RhdGVsZXNzIjp0cnVlLCJKYXZhIjp0cnVlLCJDKysiOnRydWUsIk15U1FMIGNvbXBhdGlibGUiOnRydWUsInJvdy1vcmllbnRlZCI6dHJ1ZSwiQ2xpY2tIb3VzZSBkZXJpdmF0aXZlIjp0cnVlLCJlbWJlZGRlZCI6dHJ1ZSwic2VydmVybGVzcyI6dHJ1ZSwiZGF0YWZyYW1lIjp0cnVlLCJhd3MiOnRydWUsImF6dXJlIjp0cnVlLCJhbmFseXRpY2FsIjp0cnVlLCJSdXN0Ijp0cnVlLCJzZWFyY2giOnRydWUsImRvY3VtZW50Ijp0cnVlLCJHbyI6dHJ1ZSwic29tZXdoYXQgUG9zdGdyZVNRTCBjb21wYXRpYmxlIjp0cnVlLCJEYXRhRnJhbWUiOnRydWUsInBhcnF1ZXQiOnRydWUsInRpbWUtc2VyaWVzIjp0cnVlfSwibWFjaGluZSI6eyIxNiB2Q1BVIDEyOEdCIjpmYWxzZSwiOCB2Q1BVIDY0R0IiOmZhbHNlLCJzZXJ2ZXJsZXNzIjpmYWxzZSwiMTZhY3UiOmZhbHNlLCJjNmEuNHhsYXJnZSwgNTAwZ2IgZ3AyIjpmYWxzZSwiTCI6ZmFsc2UsIk0iOmZhbHNlLCJTIjpmYWxzZSwiWFMiOmZhbHNlLCJjNmEubWV0YWwsIDUwMGdiIGdwMiI6dHJ1ZSwiMTkyR0IiOmZhbHNlLCIyNEdCIjpmYWxzZSwiMzYwR0IiOmZhbHNlLCI0OEdCIjpmYWxzZSwiNzIwR0IiOmZhbHNlLCI5NkdCIjpmYWxzZSwiZGV2IjpmYWxzZSwiNzA4R0IiOmZhbHNlLCJjNW4uNHhsYXJnZSwgNTAwZ2IgZ3AyIjpmYWxzZSwiQW5hbHl0aWNzLTI1NkdCICg2NCB2Q29yZXMsIDI1NiBHQikiOmZhbHNlLCJjNS40eGxhcmdlLCA1MDBnYiBncDIiOmZhbHNlLCJjNmEuNHhsYXJnZSwgMTUwMGdiIGdwMiI6ZmFsc2UsImNsb3VkIjpmYWxzZSwiZGMyLjh4bGFyZ2UiOmZhbHNlLCJyYTMuMTZ4bGFyZ2UiOmZhbHNlLCJyYTMuNHhsYXJnZSI6ZmFsc2UsInJhMy54bHBsdXMiOmZhbHNlLCJTMiI6ZmFsc2UsIlMyNCI6ZmFsc2UsIjJYTCI6ZmFsc2UsIjNYTCI6ZmFsc2UsIjRYTCI6ZmFsc2UsIlhMIjpmYWxzZSwiTDEgLSAxNkNQVSAzMkdCIjpmYWxzZSwiYzZhLjR4bGFyZ2UsIDUwMGdiIGdwMyI6ZmFsc2UsIjE2IHZDUFUgNjRHQiI6ZmFsc2UsIjQgdkNQVSAxNkdCIjpmYWxzZSwiOCB2Q1BVIDMyR0IiOmZhbHNlfSwiY2x1c3Rlcl9zaXplIjp7IjEiOnRydWUsIjIiOmZhbHNlLCI0IjpmYWxzZSwiOCI6ZmFsc2UsIjE2IjpmYWxzZSwiMzIiOmZhbHNlLCI2NCI6ZmFsc2UsIjEyOCI6ZmFsc2UsInNlcnZlcmxlc3MiOmZhbHNlLCJ1bmRlZmluZWQiOmZhbHNlfSwibWV0cmljIjoibG9hZCIsInF1ZXJpZXMiOlt0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlXX0=). In this benchmark, the data is loaded from an [82 GB uncompressed CSV file](https://datasets.clickhouse.com/hits_compatible/hits.csv.gz) into a database table.

![](../images/blog/csv-vs-parquet-clickbench.png)

ClickBench CSV loading times (2024-12-05)

#### Comparing CSV and Parquet

With the large boost in usability and performance for the CSV reader, one might ask: what is the actual difference in performance when loading a CSV file compared to a Parquet file into a table? Additionally, how do these formats differ when running queries directly on them?

To find out, we will run a few examples using both CSV and Parquet files containing TPC-H data to shed light on their differences. All scripts used to generate the benchmarks of this blogpost can be found in a [repository](https://github.com/pdet/csv_vs_parquet).

##### Usability

In terms of usability, scanning CSV files and Parquet files can differ significantly.

In simple cases, where all options are correctly detected by DuckDB, running queries on either CSV or Parquet files can be done directly.

```sql
FROM 'path/to/file.csv';
FROM 'path/to/file.parquet';
```

Things can differ drastically for wild, rule-breaking [Arthur Morgan](https://reddead.fandom.com/wiki/Arthur_Morgan)-like CSV files. This is evident from the number of parameters that can be set for each scanner. The [Parquet](#docs:lts:data:parquet:overview) scanner has a total of six parameters that can alter how the file is read. For the majority of cases, the user will never need to manually adjust any of them.

The CSV reader, on the other hand, depends on the sniffer being able to automatically detect many different configuration options. For example: What is the delimiter? How many rows should it skip from the top of the file? Are there any comments? And so on. This results in over [30 configuration options](#docs:lts:data:csv:overview) that the user might have to manually adjust to properly parse their CSV file. Again, this number of options is necessary due to the lack of a widely adopted standard. However, in most scenarios, users can rely on the sniffer or, at most, change one or two options.

The CSV reader also has an extensive error-handling system and will always provide suggestions for options to review if something goes wrong.

To give you an example of how the DuckDB error-reporting system works, consider the following CSV file:

```csv
Clint Eastwood;94
Samuel L. Jackson
```

In this file, the second line is missing the value for the second column.

```console
Invalid Input Error: CSV Error on Line: 2
Original Line: Samuel L. Jackson
Expected Number of Columns: 2 Found: 1
Possible fixes:
* Enable null padding (null_padding=true) to replace missing values with NULL
* Enable ignore errors (ignore_errors=true) to skip this row

  file = western_actors.csv
  delimiter = , (Auto-Detected)
  quote = " (Auto-Detected)
  escape = " (Auto-Detected)
  new_line = \n (Auto-Detected)
  header = false (Auto-Detected)
  skip_rows = 0 (Auto-Detected)
  comment = \0 (Auto-Detected)
  date_format =  (Auto-Detected)
  timestamp_format =  (Auto-Detected)
  null_padding = 0
  sample_size = 20480
  ignore_errors = false
  all_varchar = 0
```

DuckDB provides detailed information about any errors encountered. It highlights the line of the CSV file where the issue occurred, presents the original line, and suggests possible fixes for the error, such as ignoring the problematic line or filling missing values with `NULL`. It also displays the full configuration used to scan the file and indicates whether the options were auto-detected or manually set.

The bottom line here is that, even with the advancements in CSV usage, the strictness of Parquet files make them much easier to operate on.

Of course, if you need to open your file in a text editor or Excel, you will need to have your data in CSV format. Note that Parquet files do have some visualizers, like [TAD](https://www.tadviewer.com/).

##### Performance

There are primarily two ways to operate on files using DuckDB:

1. The user creates a DuckDB table from the file and uses the table in future queries. This is a loading process, commonly used if you want to store your data as DuckDB tables or if you will run many queries on them. Also, note that these are the only possible scenarios for most database systems (e.g., Oracle, SQL Server, PostgreSQL, SQLite, ...).

2. One might run a query directly on the file scanner without creating a table. This is useful for scenarios where the user has limitations on memory and disk space, or if queries on these files are only executed once. Note that this scenario is typically not supported by database systems but is common for dataframe libraries (e.g., Pandas).

To fairly compare the scanners, we provide the table schemas upfront, ensuring that the scanners produce the exact same data types. We also set `preserve_insertion_order = false`, as this can impact the parallelization of both scanners, and set `max_temp_directory_size = '0GB'` to ensure no data is spilled to disk, with all experiments running fully in memory.

We use the default writers for both CSV files and Parquet (with the default Snappy compression), and also run a variation of Parquet with `CODEC 'zstd', COMPRESSION_LEVEL 1`, as this can speed up querying/loading times.

For all experiments, we use an Apple M1 Max, with 64 GB RAM. We use TPC-H scale factor 20 and report the median times from 5 runs.

###### Creating Tables

For creating the table, we focus on the `lineitem` table.

After defining the schema, both files can be loaded with a simple `COPY` statement, with no additional parameters set. Note that even with the schema defined, the CSV sniffer will still be executed to determine the dialect (e.g., quote character, delimiter character, etc.) and match types and names.

| Name           | Time (s) | Size (GB) |
| -------------- | -------: | --------: |
| CSV            |    11.76 |     15.95 |
| Parquet Snappy |     5.21 |      3.78 |
| Parquet Zstd   |     5.52 |      3.22 |

We can see that the Parquet files are definitely smaller. About 5× smaller than the CSV file, but the performance difference is not drastic.

The CSV scanner is only about 2× slower than the Parquet scanner. It's also important to note that some of the cost associated with these operations (~1-2 seconds) is related to the insertion into the DuckDB table, not the scanner itself.

However, it is still important to consider this in the comparison. In practice, the raw CSV scanner is about 3× slower than the Parquet scanner, which is a considerable difference but much smaller than one might initially think.

###### Directly Querying Files

We will run two different TPC-H queries on our files.

**Query 01.** First, we run TPC-H Q01. This query operates solely on the `lineitem` table, performing an aggregation and grouping with a filter. It filters on one column and projects 7 out of the 16 columns from `lineitem`.

Therefore, this query will stress the filter pushdown, which is [supported by the Parquet reader](#docs:lts:data:parquet:overview::partial-reading) but not the CSV reader, and the projection pushdown, which is supported by both.

```sql
SELECT
    l_returnflag,
    l_linestatus,
    sum(l_quantity) AS sum_qty,
    sum(l_extendedprice) AS sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
    avg(l_quantity) AS avg_qty,
    avg(l_extendedprice) AS avg_price,
    avg(l_discount) AS avg_disc,
    count(*) AS count_order
FROM
    lineitem
WHERE
    l_shipdate <= CAST('1996-09-02' AS date)
GROUP BY
    l_returnflag,
    l_linestatus
ORDER BY
    l_returnflag,
    l_linestatus;
```

| Name           | Time (s) |
| -------------- | -------: |
| CSV            |     6.72 |
| Parquet Snappy |     0.88 |
| Parquet Zstd   |     0.95 |

We can see that running this query directly on our file presents a much larger performance gap of approximately 7x compared to simply loading the data into the table. In the Parquet file, we can directly skip row groups that do not match our filter `l_shipdate <= CAST('1996-09-02' AS date)`. Note that this filter, eliminates approximately 30% of the data. Not only that, but we can also skip individual rows that do not match the filter. Additionally, since the Parquet format is column-oriented, we can completely skip any computation on columns that are not projected.

Unfortunately, the CSV reader does not benefit from these filters. Since it lacks partitions, it can't efficiently skip parts of the data. Theoretically, a CSV scanner could skip the computation of rows that do not match a filter, but this is not currently implemented.

Furthermore, the CSV projection skips much of the computation on a column (e.g., it does not cast or copy the value), but it still must parse the value to be able to skip it.

**Query 21.** Query 21 is a query that not only heavily depends on filter and projection pushdown but also relies significantly on join ordering based on statistics to achieve good performance. In this query, four different files are used and joined together.

```sql
SELECT
    s_name,
    count(*) AS numwait
FROM
    supplier,
    lineitem l1,
    orders,
    nation
WHERE
    s_suppkey = l1.l_suppkey
    AND o_orderkey = l1.l_orderkey
    AND o_orderstatus = 'F'
    AND l1.l_receiptdate > l1.l_commitdate
    AND EXISTS (
        SELECT
            *
        FROM
            lineitem l2
        WHERE
            l2.l_orderkey = l1.l_orderkey
            AND l2.l_suppkey <> l1.l_suppkey)
    AND NOT EXISTS (
        SELECT
            *
        FROM
            lineitem l3
        WHERE
            l3.l_orderkey = l1.l_orderkey
            AND l3.l_suppkey <> l1.l_suppkey
            AND l3.l_receiptdate > l3.l_commitdate)
    AND s_nationkey = n_nationkey
    AND n_name = 'SAUDI ARABIA'
GROUP BY
    s_name
ORDER BY
    numwait DESC,
    s_name
LIMIT 100;
```

| Name           | Time (s) |
| -------------- | -------: |
| CSV            |    19.95 |
| Parquet Snappy |     2.08 |
| Parquet Zstd   |     2.12 |

We can see that this query now has a performance difference of approximately 10×. We observe an effect similar to Query 01, but now we also incur the additional cost of performing join ordering with no statistical information for the CSV file.

#### Conclusion

There is no doubt that the performance of CSV file scanning has drastically increased over the years. If we were to take a guess at the performance difference in table creation a few years ago, the answer would probably have been at least one order of magnitude.

This is excellent, as it allows data to be exported from legacy systems that do not support performance-driven file formats.

But oh boy, don't let super-convenient and fast CSV readers fool you. Your data is still best kept in self-describing, column-binary compressed formats like Parquet — or the DuckDB file format, of course! They are much smaller and more consistent. Additionally, running queries directly on Parquet files is much more beneficial due to efficient projection/filter pushdown and available statistics.

One thing to note is that there exists an extensive body of work on [indexing CSV files](https://ir.cwi.nl/pub/19931/19931B.pdf) (i.e., building statistics in a way) to speed up future queries and enable filter pushdown. However, DuckDB does not perform these operations yet.

Bottom line: Parquet is still the undisputed champion for most scenarios, but we will continue working on closing this gap wherever possible.

## DuckDB: Running TPC-H SF100 on Mobile Phones

**Publication date:** 2024-12-06

**Authors:** Gábor Szárnyas, Laurens Kuiper, Hannes Mühleisen

**TL;DR:** DuckDB runs on mobile platforms such as iOS and Android, and completes the TPC-H benchmark faster than state-of-the-art research systems on big iron machines 20 years ago.

A few weeks ago, we set out to perform a series of experiments to answer two simple questions:

1. Can DuckDB complete the TPC-H queries on the SF100 data set when running on a new smartphone?
2. If so, can DuckDB complete a run in less than 400 seconds, i.e., faster than the system in the research paper that originally introduced vectorized query processing?

These questions took us on an interesting quest.
Along the way, we had a lot of fun and learned the difference between a cold run and a _really cold_ run.
Read on to find out more.

#### A Song of Dry Ice and Fire

Our first attempt was to use an iPhone, namely an [iPhone 16 Pro](https://www.gsmarena.com/apple_iphone_16_pro-13315.php).
This phone has 8 GB memory and a 6-core CPU with 2 performance cores (running at 4.05 GHz) and 4 efficiency cores (running at 2.42 GHz).

We implemented the application using the [DuckDB Swift client](#docs:lts:clients:swift) and loaded the benchmark on the phone, all 30 GB of it.
We quickly found that the iPhone can indeed run the workload without any problems – except that it heated up during the workload. This prompted the phone to perform thermal throttling, slowing down the CPU to reduce heat production. Due to this, DuckDB took 615.1 seconds. Not bad but not enough to reach our goal.

The results got us thinking: what if we improve the cooling of the phone? To this end, we purchased a box of dry ice, which has a temperature below -50 degrees Celsius, and put the phone in the box for the duration of the experiments.

![](../images/blog/tpch-mobile/ice-cooled-iphone-1.jpg)

iPhone in a box of dry ice, running TPC-H. Don't try this at home.

This helped a lot: DuckDB completed in 478.2 seconds. This is a more than 20% improvement – but we still didn't manage to be under 400 seconds.

![](../images/blog/tpch-mobile/ice-cooled-iphone-2.jpg)

The phone a few minutes after finishing the benchmark. It no longer booted because the battery was too cold!

#### Do Androids Dream of Electric Ducks?

In our next experiment, we picked up a [Samsung Galaxy S24 Ultra phone](https://www.gsmarena.com/samsung_galaxy_s24_ultra-12771.php), which runs Android 14. This phone is full of interesting hardware. First, it has an 8-core CPU with 4 different core types (1×3.39 GHz, 3×3.10 GHz, 2×2.90 GHz and 2×2.20 GHz). Second, it has a huge amount of RAM – 12 GB to be precise. Finally, its cooling system includes a [vapor chamber](https://www.sammobile.com/news/galaxy-s24-sustain-performance-bigger-vapor-chamber/) for improved heat dissipation.

We ran DuckDB in the [Termux terminal emulator](https://termux.dev/en/). We compiled DuckDB [CLI client](#docs:lts:clients:cli:overview) from source following the [Android build instructions](#docs:lts:dev:building:android) and ran the experiments from the command line.

![](../images/blog/tpch-mobile/duckdb-termux-android-emulator.png)

DuckDB in Termux, running in the Android emulator

In the end, it wasn't even close. The Android phone completed the benchmark in 235.0 seconds, outperforming our baseline by around 40%.

#### Never Was a Cloudy Day

The results got us thinking: how do the results stack up among cloud servers? We picked two x86-based cloud instances in AWS EC2 with instance-attached NVMe storage.

The details of these benchmarks are far less interesting than those of the previous ones. We booted up the instances with Ubuntu 24.04 and ran DuckDB in the command line. We found that an [`r6id.large` instance](https://instances.vantage.sh/aws/ec2/r6id.large) (2 vCPUs with 16 GB RAM) completes the queries in 570.8 seconds, which is roughly on-par with an air-cooled iPhone. However, an [`r6id.xlarge`](https://instances.vantage.sh/aws/ec2/r6id.xlarge) (4 vCPUs with 32 GB RAM) completes the benchmark in 166.2 seconds, faster than any result we achieved on phones.

#### Summary of DuckDB Results

The table contains a summary of the DuckDB benchmark results.

| Setup                          | CPU cores | Memory | Runtime |
| ------------------------------ | --------: | -----: | ------: |
| iPhone 16 Pro (air-cooled)     |         6 |   8 GB | 615.1 s |
| iPhone 16 Pro (dry ice-cooled) |         6 |   8 GB | 478.2 s |
| Samsung Galaxy S24 Ultra       |         8 |  12 GB | 235.0 s |
| AWS EC2 `r6id.large`           |         2 |  16 GB | 570.8 s |
| AWS EC2 `r6id.xlarge`          |         4 |  32 GB | 166.2 s |

#### Historical Context

So why did we set out to run these experiments in the first place?

Just a few weeks ago, [CWI](https://cwi.nl/), the birthplace of DuckDB, held a ceremony for the [Dijkstra Fellowship](https://www.cwi.nl/en/events/dijkstra-awards/cwi-lectures-dijkstra-fellowship/).
The fellowship was awarded to Marcin Żukowski for his pioneering role in the development of database management systems and his successful entrepreneurial career that resulted in systems such as [VectorWise](https://en.wikipedia.org/wiki/Actian_Vector) and [Snowflake](https://en.wikipedia.org/wiki/Snowflake_Inc.).

A lot of ideas that originate in Marcin's research are used in DuckDB. Most importantly, _vectorized query processing_ allows DuckDB to be both fast and portable at the same time.
With his co-authors Peter Boncz and Niels Nes, he first described this paradigm in the CIDR 2005 paper [“MonetDB/X100: Hyper-Pipelining Query Execution”](https://www.cidrdb.org/cidr2005/papers/P19.pdf).

> The terms _vectorization,_ _hyper-pipelining,_ and _superscalar_ refer to the same idea: processing data in slices, which turns out to be a good compromise between row-at-a-time or column-at-a-time. DuckDB's query engine uses the same principle.

This paper was published in January 2005, so it's safe to assume that it was finalized in late 2004 – almost exactly 20 years ago!

If we read the paper, we learn that the experiments were carried out on an HP workstation equipped with 12 GB of memory (the same amount as the Samsung phone has today!).
It also had an Itanium CPU and looked like this:

![](../images/blog/tpch-mobile/hp-itanium-workstation.jpg)

The Itanium2 workstation used in original the experiments (source: <a href="https://commons.wikimedia.org/wiki/File:HP-HP9000-ZX6000-Itanium2-Workstation_11.jpg">Wikimedia</a>)

> Upon its release in 2001, the [Itanium](https://en.wikipedia.org/wiki/Itanium) was aimed at the high-end market with the goal of eventually replacing the then-dominant x86 architecture with a new instruction set that focused heavily on [SIMD (single instruction, multiple data)](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data). While this ambition did not work out, the Itanium was the state-of-the-art architecture of its day. Due to the focus on the server market, the Itanium CPUs had a large amount of cache: the [1.3 GHz Itanium2 model used in the experiments](https://www.intel.com/content/www/us/en/products/sku/27982/intel-itanium-processor-1-30-ghz-3m-cache-400-mhz-fsb/specifications.html) had 3 MB of L2 cache, while Pentium 4 CPUs released around that time only had 0.5–1 MB.

The paper provides a detailed breakdown of the runtimes:

![](../images/blog/tpch-mobile/cidr2005-monetdb-x100-results.png)

Benchmark results from the paper “MonetDB/X100: Hyper-Pipelining Query Execution”

The total runtime of the TPC-H SF100 queries was 407.9 seconds – hence our baseline for the experiments.
Here is a video of Hannes presenting the results at the event:



And here are all results visualized on a plot:

![](../images/blog/tpch-mobile/tpch-mobile-experiment-runtimes.svg)

TPC-H SF100 total query runtimes for MonetDB/X100 and DuckDB

#### Conclusion

It was a long journey from the original vectorized execution paper to running an analytical database on a phone.
Many key innovations happened that allowed these results, and the big improvement in hardware is just one of them.
Another crucial component is that compiler optimizations became a lot more sophisticated.
Thanks to this, while the MonetDB/X100 system needed to use explicit SIMD, DuckDB can rely on the [auto-vectorization](https://en.wikipedia.org/wiki/Automatic_vectorization) of our (carefully constructed) loops.

All that's left is to answer questions that we posed at the beginning of our journey.
Yes, DuckDB can run TPC-H SF100 on a mobile phone.
And yes, in some cases it can even outperform a research prototype running on a high-end machine of 2004 – on a modern smartphone that fits in your pocket.

And with newer hardware, smarter compilers and yet-to-be-discovered database optimizations, future versions are only going to be faster.

## The DuckDB Avro Extension

**Publication date:** 2024-12-09

**Author:** Hannes Mühleisen

**TL;DR:** DuckDB now supports reading Avro files.

> **Update.** Avro support is now available through the [`avro` core extension](#docs:lts:core_extensions:avro).

#### The Apache™ Avro™ Format

[Avro](https://avro.apache.org) is a binary format for record data. Like many innovations in the data space, Avro was [developed](https://vimeo.com/7362534) by [Doug Cutting](https://en.wikipedia.org/wiki/Doug_Cutting) as part of the Apache Hadoop project [in around 2009](https://github.com/apache/hadoop/commit/8296413d4988c08343014c6808a30e9d5e441bfc). Avro gets its name – somewhat obscurely – from a defunct [British aircraft manufacturer](https://en.wikipedia.org/wiki/Avro). The company famously built over 7,000 [Avro Lancaster heavy bombers](https://en.wikipedia.org/wiki/Avro_Lancaster) under the challenging conditions of World War 2. But we digress.

The Avro format is yet another attempt to solve the dimensionality reduction problem that occurs when transforming a complex *multi-dimensional data structure* like tables (possibly with nested types) to a *single-dimensional storage layout* like a flat file, which is just a sequence of bytes. The most fundamental question that arises here is whether to use a columnar or a row-major layout. Avro uses a row-major layout, which differentiates it from its famous cousin, the [Apache™ Parquet™](https://parquet.apache.org) format. There are valid use cases for a row-major format: for example, appending a few rows to a Parquet file is difficult and inefficient because of Parquet's columnar layout and due to the fact the Parquet metadata is stored *at the back* of the file. In a row-major format like Avro with the metadata *up top*, we can “just” add those rows to the end of the files and we're done. This enables Avro to handle appends of a few rows somewhat efficiently.

Avro-encoded data can appear in several ways, e.g., in [RPC messages](https://en.wikipedia.org/wiki/Remote_procedure_call) but also in files. In the following, we focus on files since those survive long-term.

##### Header Block

Avro “object container” files are encoded using a comparatively simple binary [format](https://avro.apache.org/docs/++version++/specification/#object-container-files): each file starts with a **header block** that first has the [magic bytes](https://en.wikipedia.org/wiki/List_of_file_signatures) `Obj1`. Then, a metadata “map” (a list of string-bytearray key-value pairs) follows. The map is only strictly required to contain a single entry for the `avro.schema` key. This key contains the Avro file schema encoded as JSON. Here is an example for such a schema:

```json
{
  "namespace": "example.avro",
  "type": "record",
  "name": "User",
  "fields": [
     {"name": "name", "type": "string"},
     {"name": "favorite_number", "type": ["int", "null"]},
     {"name": "favorite_color", "type": ["string", "null"]}
  ]
}
```

The Avro schema defines a record structure. Records can contain scalar data fields (like `int`, `double`, `string`, etc.) but also more complex types like records (similar to [DuckDB `STRUCT`s](#docs:lts:sql:data_types:struct)), unions and lists. As a sidenote, it is quite strange that a data format for the definition of record structures would fall back to another format like JSON to describe itself, but such are the oddities of Avro.

##### Data Blocks

The header concludes with 16 randomly chosen bytes as a “sync marker”. The header is followed by an arbitrary amount of **data blocks**: each data block starts with a record count, followed by a size and a byte array containing the actual records. Optionally, the bytes can be compressed with deflate (gzip), which will be known from the header metadata.

The data bytes can only be decoded using the schema. The [object file specification](https://avro.apache.org/docs/++version++/specification/#object-container-files) contains the details on how each type is encoded. For example, in the example schema we know each value is a record of three fields. The root-level record will encode its entries in the order they are declared. There are no actual bytes required for this. First we will be reading the `name` field. Strings consist of a length followed by the string bytes. Like other formats (e.g., Thrift), Avro uses [variable-length integers with zigzag encoding](https://en.wikipedia.org/wiki/Variable-length_quantity#Zigzag_encoding) to store lengths and counts and the like. After reading the string, we can proceed to `favorite_number`. This field is a union type (encoded with the `[]` syntax). This union can have values of two types, `int` and `null`. The `null` type is a bit odd, it can only be used to encode the fact that a value is missing. To decode the `favorite_number` fields, we first read an `int` that encodes which choice of the union was used. Afterward, we use the “normal” decoders to read the values (e.g., `int` or `null`). The same can be done for `favorite_color`. Each data block again ends with the sync marker. The sync marker can be used to verify that the block was fully written and that there is no garbage in the file.

#### The DuckDB `avro` Community Extension

We have developed a DuckDB community extension that enables DuckDB to *read* [Apache Avro™](https://avro.apache.org) files.

The extension does not contain Avro *write* functionality. This is on purpose, by not providing a writer we hope to decrease the amount of Avro files in the world over time.

##### Installation & Loading

Installation is simple through the DuckDB community extension repository, just type

```sql
INSTALL avro FROM community;
LOAD avro;
```

in a DuckDB instance near you.

> Since DuckDB v1.2.1, DuckDB's WebAssembly client is also supported.

##### The `read_avro` Function

The extension adds a single DuckDB function, `read_avro`. This function can be used like so:

```sql
FROM read_avro('some_example_file.avro');
```

This function will expose the contents of the Avro file as a DuckDB table. You can then use any arbitrary SQL constructs to further transform this table.

##### File IO

The `read_avro` function is integrated into DuckDB's file system abstraction, meaning you can read Avro files directly from e.g., HTTP or S3 sources. For example:

```sql
FROM read_avro('https://blobs.duckdb.org/data/userdata1.avro');
FROM read_avro('s3://⟨my-example-bucket⟩/some_example_file.avro');
```

should “just” work.

You can also [*glob* multiple files](#docs:lts:sql:functions:pattern_matching::globbing) in a single read call or pass a list of files to the functions:

```sql
FROM read_avro('some-example-file-*.avro');
FROM read_avro(['some-example-file-1.avro', 'some-example-file-2.avro']);
```

If the filenames somehow contain valuable information (as is unfortunately all-too-common), you can pass the `filename` argument to `read_avro`:

```sql
FROM read_avro('some-example-file-*.avro', filename = true);
```

This will result in an additional column in the result set that contains the actual filename of the Avro file.

##### Schema Conversion

This extension automatically translates the Avro Schema to the DuckDB schema. *All* Avro types can be translated, except for *recursive type definitions*, which DuckDB does not support.

The type mapping is very straightforward except for Avro's “unique” way of handling `NULL`. Unlike other systems, Avro does not treat `NULL` as a possible value in a range of e.g., `INTEGER` but instead represents `NULL` as a union of the actual type with a special `NULL` type. This is different to DuckDB, where any value can be `NULL`. Of course DuckDB also supports `UNION` types, but this would be quite cumbersome to work with.

This extension *simplifies* the Avro schema where possible: an Avro union of any type and the special null type is simplified to just the non-null type. For example, an Avro record of the union type `["int", "null"]` (like `favorite_number` in the [example](#::header-block)) becomes a DuckDB `INTEGER`, which just happens to be `NULL` sometimes. Similarly, an Avro union that contains only a single type is converted to the type it contains. For example, an Avro record of the union type `["int"]` also becomes a DuckDB `INTEGER`.

The extension also “flattens” the Avro schema. Avro defines tables as root-level “record” fields, which are the same as DuckDB `STRUCT` fields. For more convenient handling, this extension turns the entries of a single top-level record into top-level columns.

##### Implementation

Internally, this extension uses the “official” [Apache Avro C API](https://avro.apache.org/docs/++version++/api/c/), albeit with some minor patching to allow reading Avro files from memory.

##### Limitations & Next Steps

In the following, we disclose the limitations of the `avro` DuckDB extension along with our plans to mitigate them in the future:

* The extension currently does not make use of **parallelism** when reading either a single (large) Avro file or when reading a list of files. Adding support for parallelism in the latter case is on the roadmap.

* There is currently no support for projection or filter **pushdown**, but this is also planned at a later stage.

* As mentioned above, DuckDB cannot express recursive type definitions that Avro has. This is unlikely to ever change.

* There is no support to allow users to provide a separate Avro schema file. This is unlikely to change, all Avro files we have seen so far had their schema embedded.

* There is currently no support for the `union_by_name` flag that other readers in DuckDB support. This is planned for the future.

#### Conclusion

The new `avro` community extension for DuckDB enables DuckDB to read Avro files directly as if they were tables. If you have a bunch of Avro files, go ahead and try it out! We'd love to [hear from you](https://github.com/hannes/duckdb_avro/issues) if you run into any issues.

## 25 000 Stars on GitHub

**Publication date:** 2024-12-16

**Author:** The DuckDB team

**TL;DR:** We have recently reached 25 000 stars on GitHub. We would like to use this occasion to stop and reflect about DuckDB's recent year and our future plans.

Our [GitHub repository](https://github.com/duckdb/duckdb) has just passed 25,000 stars. This is great news and since it is also the end of the year it is a good moment to reflect on DuckDB’s trajectory. There has been a lot of new and exciting adoption of DuckDB across the industry.

We would like to highlight two main events that have happened this year:

* We [released DuckDB 1.0.0](https://duckdb.org/2024/06/03/announcing-duckdb-100). This version introduced a stable storage format which guarantees [backwards compatibility and limited forward compatibility](#docs:lts:internals:storage::compatibility).
* We started the [DuckDB Community Extensions project](https://duckdb.org/2024/07/05/community-extensions). Community extensions allow developers to contribute packages to DuckDB and users to easily install these extensions using the simple command `INSTALL xyz FROM community`.

Besides the GitHub stars we have also observed a lot of growth in various metrics.

* Each month, our website handles over 1.5 million unique visitors. In addition, we see over 300 TB in traffic from ca. 30 million extension downloads. Thanks again to Cloudflare for [sponsoring the project](https://duckdb.org/foundation/index.html#technical-sponsors) with free content delivery services!
* In one year, we rose in the [DB Engines ranking](https://db-engines.com/en/ranking) from position 91 to 55 on the general board and from position 47 to 33 in the [relational board](https://db-engines.com/en/ranking/relational+dbms), which makes DuckDB the fastest growing relational system in the top-50.
* We count [7.5M+ monthly downloads in PyPI](https://pypistats.org/packages/duckdb).
* Maven Central downloads for the JDBC driver have also shot up, we now see over 500k+ downloads per month.

We should note that we’re not glorifying those numbers and they are not a target per se for our much-beloved optimization in accordance with [Goodhart’s law](https://en.wikipedia.org/wiki/Goodhart%27s_law). Still, they are just motivating to see grow.

As an aside, we have recently opened a [Bluesky account](https://bsky.app/profile/duckdb.org) and are seeing great discussions happening over there. The account has already exceeded 4 thousand followers!

Following our ancient two-year tradition, we hosted two DuckCon events, one in [Amsterdam](#_events:2024-02-02-duckcon4) and another in [Seattle](#_events:2024-08-15-duckcon5). We also organized the first [DuckDB Amsterdam Meetup](#_events:2024-10-17-duckdb-amsterdam-meetup-1).

Early next year, we are going to host [DuckCon in Amsterdam](#_events:2025-01-31-duckcon6), which is going to be the first event that we live stream in order to be more accessible to the growing DuckDB users in, e.g., Asia.
But for now, let’s sit around the syntax tree and be merry thinking about what’s to come.

![](../images/blog/duckdb-syntax-tree.jpg)


## DuckDB Node Neo Client

**Publication date:** 2024-12-18

**Author:** Jeff Raymakers

**TL;DR:** The new DuckDB Node client, “Neo”, provides a powerful and friendly way to use your favorite database

Meet the newest DuckDB client API: [DuckDB Node “Neo”](#docs:lts:clients:node_neo:overview)!

You may be familiar with DuckDB’s [old Node client](https://www.npmjs.com/package/duckdb). While it has served the community well over the years, “Neo” aims to learn from and improve upon its predecessor. It presents a friendlier API, supports more features, and uses a more robust and maintainable architecture. It provides both high-level conveniences and low-level access. Let’s take a tour!

#### What Does It Offer?

##### Friendly, Modern API

The old Node client’s API is based on [SQLite’s](https://www.npmjs.com/package/sqlite3). While familiar to many, it uses an awkward, dated callback-based style. Neo uses [Promises](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise) natively.

```ts
const result = await connection.run(` SELECT 'Hello, Neo!'`);
```

Additionally, Neo is built from the ground up in [TypeScript](https://www.typescriptlang.org/). Carefully chosen names and types minimize the need to check documentation.

```ts
const columnNames = result.columnNames();
const columnTypes = result.columnTypes();
```

Neo also provides convenient helpers to read only as many rows as needed and return them in either column-major or row-major format.

```ts
const reader = await connection.runAndReadUtil('FROM range(5000)',
    1000);
const rows = reader.getRows();
// OR: const columns = reader.getColumns();
```

##### Full Data Type Support

DuckDB supports a [rich variety of data types](#docs:lts:sql:data_types:overview). Neo supports every built-in type as well as custom types such as [`JSON`](#docs:lts:data:json:json_type). For example, `ARRAY`:

```ts
if (columnType.typeId === DuckDBTypeId.ARRAY) {
  const arrayValueType = columnType.valueType;
  const arrayLength = columnType.length;
}
```

`DECIMAL`:

```ts
if (columnType.typeId === DuckDBTypeId.DECIMAL) {
  const decimalWidth = columnType.width;
  const decimalScale = columnType.scale;
}
```

And `JSON`:

```ts
if (columnType.alias === 'JSON') {
  const json = JSON.parse(columnValue);
}
```

Type-specific utilities ease common conversions such as producing human-readable strings from [`TIMESTAMP`](#docs:lts:sql:data_types:timestamp)s or [`DECIMAL`](#docs:lts:sql:data_types:numeric::fixed-point-decimals)s, while preserving access to the raw values for lossless processing.

```ts
if (columnType.typeId === DuckDBTypeId.TIMESTAMP) {
  const timestampMicros = columnValue.micros; // bigint
  const timestampString = columnValue.toString();
  const {
    date: { year, month, day },
    time: { hour, min, sec, micros },
  } = columnValue.toParts();
}
```

##### Advanced Features

Need to bind specific types of values to [prepared statements](#docs:lts:sql:query_syntax:prepared_statements), or precisely [control SQL execution](#docs:lts:clients:c:api::pending-result-interface)? Perhaps you want to leverage DuckDB’s parser to [extract statements](#docs:lts:clients:c:api::extract-statements), or efficiently [append data to a table](#docs:lts:clients:c:appender). Neo has you covered, providing full access to these powerful features of DuckDB.

###### Binding Values to Prepared Statements

When binding values to parameters of [prepared statements](#docs:lts:sql:query_syntax:prepared_statements), you can select the SQL data type. This is useful for types that don’t have a natural equivalent in JavaScript.

```ts
const prepared = await connection.prepare('SELECT $1, $2');
prepared.bindTimestamp(1, new DuckDBTimestampValue(micros));
prepared.bindDecimal(2, new DuckDBDecimalValue(value, width, scale));
const result = await prepared.run();
```

###### Controlling Task Execution

Using [pending results](#docs:lts:clients:c:api::pending-result-interface) allows pausing or stopping SQL execution at any point, even before the result is ready.

```ts
import { DuckDBPendingResultState } from '@duckdb/node-api';

// Placeholder to demonstrate doing other work between tasks.
async function sleep(ms) {
  return new Promise((resolve) => {
    setTimeout(resolve, ms);
  });
}

const prepared = await connection.prepare('FROM range(10_000_000)');
const pending = prepared.start();
// Run tasks until the result is ready.
// This allows execution to be paused and resumed as needed.
// Other work can be done between tasks.
while (pending.runTask() !== DuckDBPendingResultState.RESULT_READY) {
  console.log('not ready');
  await sleep(1);
}
console.log('ready');
const result = await pending.getResult();
// ...
```

###### Extracting Statements and Running Them with Parameters

You can run multi-statement SQL containing parameters using the [extract statements API](#docs:lts:clients:c:api::extract-statements).

```ts
// Parse this multi-statement input into separate statements.
const extractedStatements = await connection.extractStatements(` 
  CREATE OR REPLACE TABLE numbers AS FROM range(?);
  FROM numbers WHERE range < ?;
  DROP TABLE numbers;
`);
const parameterValues = [10, 7];
const stmtCount = extractedStatements.count;
// Run each statement, binding values as needed.
for (let stmtIndex = 0; stmtIndex < stmtCount; stmtIndex++) {
  const prepared = await extractedStatements.prepare(stmtIndex);
  const paramCount = prepared.parameterCount;
  for (let paramIndex = 1; paramIndex <= paramCount; paramIndex++) {
    prepared.bindInteger(paramIndex, parameterValues.shift());
  }
  const result = await prepared.run();
  // ...
}
```

###### Appending Data to a Table

The [appender API](#docs:lts:clients:c:appender) is the most efficient way to bulk insert data into a table.

```ts
await connection.run(
  `CREATE OR REPLACE TABLE target_table(i INTEGER, v VARCHAR)`
);

const appender = await connection.createAppender('main', 'target_table');

appender.appendInteger(100);
appender.appendVarchar('walk');
appender.endRow();

appender.appendInteger(200);
appender.appendVarchar('swim');
appender.endRow();

appender.appendInteger(300);
appender.appendVarchar('fly');
appender.endRow();

appender.close();
```

#### How Is It Built?

##### Dependencies

Neo uses a different implementation approach from most other DuckDB client APIs, including the old Node client. It binds to DuckDB’s [C API](#docs:lts:clients:c:overview) instead of the C++ API.

Why should you care? Using DuckDB’s C++ API means building all of DuckDB from scratch. Each client API using this approach ships with a slightly different build of DuckDB. This can create headaches for both library maintainers and consumers.

Maintainers need to pull in the entire DuckDB source code. This increases the cost and complexity of the build, and thus the cost of code changes and especially DuckDB version updates. These costs often lead to significant delays in fixing bugs or supporting new versions.

Consumers are impacted by these delays. There’s also the possibility of subtle behavioral differences between the builds in each client, perhaps introduced by different compile-time configuration.

> Some client APIs reside in the [main DuckDB repository](https://github.com/duckdb/duckdb/tree/main/tools). This addresses some of the problems above, but increases the cost and complexity of maintaining DuckDB itself.

To use DuckDB’s C API, on the other hand, one only needs to depend on [released binaries](https://github.com/duckdb/duckdb/releases). This significantly simplifies the maintenance required, speeds up builds, and minimizes the cost of updates. It removes the uncertainty and risk of rebuilding DuckDB.

##### Packages

DuckDB requires different binaries for each platform. Distributing platform-specific binaries in Node packages is notoriously challenging. It can often lead to inscrutable errors when installing, when the package manager attempts to rebuild some component from source, using whatever build and configuration tools happen to be around.

Neo uses a package design aimed to avoid these problems. Inspired by [ESBuild](https://github.com/evanw/esbuild/pull/1621), Neo packages pre-built binaries for each supported platform in a separate package. Each of these packages declares the particular platform (e.g., `os` and `cpu`) it supports. Then, the main package depends on all these platform-specific packages using `optionalDependencies`.

When the main package is installed, the package manager will only install optionalDependencies for supported platforms. So you only get exactly the binaries you need, no more. If installed on an unsupported platform, no binaries will be installed. At no point will an attempt to build from source occur during install.

##### Layers

The DuckDB Node Neo client has multiple layers. Most people will want to use Neo’s main “api” package, [@duckdb/node-api](https://www.npmjs.com/package/@duckdb/node-api). This contains the friendly API with convenient helpers. But, for advanced use cases, Neo also exposes the lower-level “bindings” package, [@duckdb/node-bindings](https://www.npmjs.com/package/@duckdb/node-bindings), which implements a more direct translation of DuckDB’s C API into Node.

This API has TypeScript definitions, but, as it follows the conventions of C, it can be awkward to use from Node. However, it provides a relatively unopinionated way to access DuckDB, which supports building special-purpose applications or alternate higher-level APIs.

#### Where Is It Headed?

Neo is currently marked “alpha”. This is an indication of completeness and maturity, not robustness. Most of the functionality of DuckDB’s C API is exposed, and what is exposed has extensive tests. But it’s relatively new, so it may contain undiscovered bugs.

Additionally, some areas of functionality are not yet complete:

* Appending and binding advanced data types. These require additional functions in DuckDB’s C API. The goal is to add these for the next release of DuckDB 1.2, [currently planned for January 2025](#release_calendar).

* Writing to data chunk [vectors](#docs:lts:internals:vector). Modifying binary buffers in a way that can be seen by a native layer presents special challenges in the Node environment. This is a high priority to work on in the near future.

* User-defined types & functions. The necessary functions and types were added to the DuckDB C API relatively recently, in v1.1.0. This is on the near-term roadmap.

* Profiling info. This was added in v1.1.0. It’s on the roadmap.

* Table descriptions. This was also added in v1.1.0. It’s on the roadmap.

New versions of DuckDB will include additions to the C API. Since Neo aims to cover all the functionality of the C API, these additions will be added to the roadmap as they are released.

If you have a feature request, or other feedback, [let us know](https://github.com/duckdb/duckdb-node-neo/issues)! [Pull requests](https://github.com/duckdb/duckdb-node-neo/pulls) are also welcome.

#### What Now?

DuckDB Node Neo provides a friendly and powerful way to use DuckDB with Node. By leveraging DuckDB’s C API, it exemplifies a new, more maintainable way to build on DuckDB, providing benefits to maintainers and consumers alike. It’s still young, but growing up fast. [Try it yourself](https://www.npmjs.com/package/@duckdb/node-api)!

## Vertical Stacking as the Relational Model Intended: UNION ALL BY NAME

**Publication date:** 2025-01-10

**Author:** Alex Monahan

**TL;DR:** DuckDB allows vertical stacking of datasets by column name rather than position. This allows DuckDB to read files with schemas that evolve over time and finally aligns SQL with Codd's relational model.

#### Overview

Ever heard of SQL's `CORRESPONDING` keyword?
Yeah, me neither!
Well, it has been in the [SQL standard since at least 1992](https://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt), and almost nobody implemented it!
`CORRESPONDING` was an attempt to fix a flaw in SQL – but it failed.
It's time for SQL to get back to the relational model's roots when stacking data.
Let's wind the clocks back to 1969...

You just picked up your own [Ford Mustang Boss 302](https://en.wikipedia.org/wiki/Boss_302_Mustang), drifting around the corner at every street to make it to the library to read the latest [research report out of IBM by Edgar Codd](https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf).
(Do we need a Neflix special about databases?)
Reading that report, wearing plenty of plaid, you gain a critical insight: data should be treated as unordered sets!
(Technically [multisets](https://en.wikipedia.org/wiki/Multiset) – duplicates are everywhere...)
Rows should be treated as unordered and so should columns.
The relational model is _the way_.
Any language built atop the relational model should absolutely follow those core principles.

A few years later, you learn about SQL, and it looks like a pretty cool idea.
Declarative, relational – none of this maintaining order business.
You don't want to be tied down by an ordering, after all.
What if you change your mind about how to query your data?
Sets are the best way to think about these things.

More time passes, and then, you have the need to stack some data in SQL.
Should be easy enough – I can just take two tables and stack them, and the corresponding attributes will map together.
No need to worry about ordering, and certainly no need to make sure that the relations are exactly the same width.

Wait.
This can't be right.

I have to get the order of my columns exactly right?
And I have to have the exact same number of columns in both relations?
Did these SQL folks forget about Codd??

Fast forward just a couple of decades, and DuckDB is making stacking in SQL totally groovy again.

#### Making Vertical Stacking Groovy Again

In addition to the traditional [`UNION`](#docs:lts:sql:query_syntax:setops::union) and [`UNION ALL`](#docs:lts:sql:query_syntax:setops::union-all-bag-semantics) operators, DuckDB adds both [`UNION BY NAME` and `UNION ALL BY NAME`](#docs:lts:sql:query_syntax:setops::union-all-by-name).
These will vertically stack multiple relations (e.g., `SELECT` statements) by matching on the names of columns independent of their order.
As an example, we provide columns `a` and `b` out of order, and even introduce the entirely new column `c` and stacking will still succeed:

```sql
SELECT
    42 AS a,
    'woot' AS b

UNION ALL BY NAME

SELECT
    'woot2' AS b,
    9001 AS a,
    'more wooting' AS c;
```

|    a | b     | c            |
| ---: | ----- | ------------ |
|   42 | woot  | NULL         |
| 9001 | woot2 | more wooting |

> Any column that is not present in all relations is filled in with `NULL` in the places where it is missing.

This capability unlocks a variety of useful patterns that can add flexibility and save time.
Some examples include:

* Stacking datasets that have different column orders
* Adding new columns to an analysis, but only for a portion of the rows
* Combining completely unrelated datasets into a single resultset
    * This can be useful if your IDE, BI tool, or API can only return a single resultset at a time, but you need to view multiple datasets

> DuckDB has had this capability since August of 2022, but the performance and scalability of this feature has recently been greatly improved!
See the end of the post for some micro-benchmarks.

##### `UNION` vs. `UNION ALL`

If only using the keyword `UNION`, duplicates are removed when stacking.
With `UNION ALL`, duplicates are permitted and the stacking occurs without additional processing.

Unfortunately we have Codd to thank for this confusing bit!
If only `UNION ALL` were the default...
Typically, `UNION ALL` (and its new counterpart `UNION ALL BY NAME`!) are the desired behavior as they faithfully reproduce the input relations, just stacked together.
This is higher performance as well, since the deduplication that occurs with `UNION` can be quite time intensive with large datasets.
And finally, `UNION ALL` [preserves the original row order](#docs:lts:sql:dialect:order_preservation).

##### Reading Multiple Files

This column matching functionality becomes particularly useful when querying data from multiple files with different schemas.
DuckDB provides a `union_by_name` boolean parameter in the table functions used to pull external flat files:

* [`read_csv`](#docs:lts:data:csv:overview::parameters)
* [`read_json`](#docs:lts:data:json:loading_json::parameters)
* [`read_parquet`](#docs:lts:data:parquet:overview::parameters)

To read multiple files, DuckDB can use glob patterns within the file path parameter (or a list of files, or a list of glob patterns!).
If those files could have different schemas, adding `union_by_name=True` will allow them to be read and stacked!
Any columns that do not appear in a particular file will be filled with `NULL` values.
For example:

```sql
COPY (SELECT 'Star' AS col1) TO 'star.parquet';
COPY (SELECT 'Wars' AS col2) TO 'wars.parquet';

FROM read_parquet(
    ['star.parquet', 'wars.parquet'],
    union_by_name = true);
```

| col1 | col2 |
| ---- | ---- |
| Star | NULL |
| NULL | Wars |

> If your files have different schemas and you did not expect it, DuckDB's friendly error messages will suggest the `union_by_name` parameter!
> There is no need for memorization:
>
> `If you are trying to read files with different schemas, try setting union_by_name=True`

##### Data Lakes

It is very common to have schema changes over time in data lakes, so this unlocks many additional uses for DuckDB in those environments.
The secondary effect of this feature is that you may now feel free to change your data lake schemas freely!
Now it is painless to add more attributes to your data lake over time – DuckDB will be ready to handle the analysis!

> DuckDB's extensions to read lakehouse table formats like [Delta](#docs:lts:core_extensions:delta) and [Iceberg](#docs:lts:core_extensions:iceberg:overview) handle schema evolution within the formats' own metadata, so `union_by_name` is not needed.

#### Inserting Data by Name

Another use case for vertically stacking data is when inserting into an existing table.
The DuckDB syntax of [`INSERT INTO ⟨my_table⟩ BY NAME`{:.language-sql .highlight}](#docs:lts:sql:statements:insert::insert-into--by-name) offers the same flexibility of referring to columns by name rather than by position.
This allows you to provide the data to insert with any column order and even including only a subset of columns.
For example:

```sql
CREATE TABLE year_info (year INTEGER, status VARCHAR);

INSERT INTO year_info BY NAME 
    SELECT 
        'The planet made it through' AS status,
        2024 AS year;

INSERT INTO year_info BY NAME 
    SELECT 
        2025 AS year;

FROM year_info;
```

| year | status                     |
| ---: | -------------------------- |
| 2024 | The planet made it through |
| 2025 | NULL                       |

The pre-existing alternative approach was to provide an additional clause that specified the list of columns to be added in the same order as the dataset.
However, this requires the ordering and number of columns to be known up front rather than determined dynamically.
In many cases it also requires specifying columns in two locations: the `INSERT` statement and the `SELECT` statement producing the data.
Ignoring the sage advice of [“Don't Repeat Yourself”](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself) has led to more than a few unintended consequences in my own code...
It is always nicer to have a single location to edit rather than having to keep things in sync!

#### The Inspirations for `UNION ALL BY NAME`

Other systems and communities have tackled the challenges of stacking messy data for many years.
DuckDB takes inspiration from them and brings their improvements back into SQL!

The most direct inspiration is the [Pandas `concat` function](https://pandas.p