Skip to main content
Skip to main content

url Table Function

url function creates a table from the URL with given format and structure.

url function may be used in SELECT and INSERT queries on data in URL tables.

Syntax

url(URL [,format] [,structure] [,headers])

Parameters

ParameterDescription
URLSingle quoted HTTP or HTTPS server address, which can accept GET or POST requests (for SELECT or INSERT queries correspondingly). Type: String.
formatFormat of the data. Type: String.
structureTable structure in 'UserID UInt64, Name String' format. Determines column names and types. Type: String.
headersHeaders in 'headers('key1'='value1', 'key2'='value2')' format. You can set headers for HTTP call.

Returned value

A table with the specified format and structure and with data from the defined URL.

Examples

Getting the first 3 lines of a table that contains columns of String and UInt32 type from HTTP-server which answers in CSV format.

SELECT * FROM url('http://127.0.0.1:12345/', CSV, 'column1 String, column2 UInt32', headers('Accept'='text/csv; charset=utf-8')) LIMIT 3;

Inserting data from a URL into a table:

CREATE TABLE test_table (column1 String, column2 UInt32) ENGINE=Memory;
INSERT INTO FUNCTION url('http://127.0.0.1:8123/?query=INSERT+INTO+test_table+FORMAT+CSV', 'CSV', 'column1 String, column2 UInt32') VALUES ('http interface', 42);
SELECT * FROM test_table;

Globs in URL

Patterns in { } are used to generate a set of shards or to specify failover addresses. Supported pattern types and examples see in the description of the remote function. Character | inside patterns is used to specify failover addresses. They are iterated in the same order as listed in the pattern. The number of generated addresses is limited by glob_expansion_max_elements setting.

Virtual Columns

  • _path — Path to the URL. Type: LowCardinality(String).
  • _file — Resource name of the URL. Type: LowCardinality(String).
  • _size — Size of the resource in bytes. Type: Nullable(UInt64). If the size is unknown, the value is NULL.
  • _time — Last modified time of the file. Type: Nullable(DateTime). If the time is unknown, the value is NULL.
  • _headers - HTTP response headers. Type: Map(LowCardinality(String), LowCardinality(String)).

use_hive_partitioning setting

When setting use_hive_partitioning is set to 1, ClickHouse will detect Hive-style partitioning in the path (/name=value/) and will allow to use partition columns as virtual columns in the query. These virtual columns will have the same names as in the partitioned path.

Example

Use virtual column, created with Hive-style partitioning

SELECT * FROM url('http://data/path/date=*/country=*/code=*/*.parquet') WHERE date > '2020-01-01' AND country = 'Netherlands' AND code = 42;

Resolving relative URLs

The url_base setting allows passing a relative URL to the url function. When url_base is set and the function argument is a relative reference, it is resolved against the base URL per RFC 3986.

The resolution rules are:

  • Path-relative (e.g. data.csv): merged with the base URL path — everything after the last / of the base path is replaced. The trailing slash matters: https://example.com/dir/ + data.csv gives https://example.com/dir/data.csv, but https://example.com/dir + data.csv gives https://example.com/data.csv. Dot segments (./ and ../) are normalized.
  • Host-relative (e.g. /test/data.csv): resolved using the scheme and host of the base URL.
  • Scheme-relative (e.g. //other.com/test/data.csv): resolved using the scheme of the base URL.
  • Query-only (e.g. ?x=1): appended to the full base path, replacing any existing query or fragment.
  • Fragment-only (e.g. #frag): appended to the base URL, preserving the query, replacing any existing fragment.
  • Empty: returns the base URL without fragment.
  • Absolute URL: passed through unchanged; url_base is ignored.

Example

SET url_base = 'https://raw.githubusercontent.com/ClickHouse/ClickHouse/master/';
SELECT * FROM url('tests/queries/0_stateless/data_csv/data.csv', CSV) LIMIT 3;

Storage Settings

Permissions

url function requires CREATE TEMPORARY TABLE permission. As such - it'll not work for users with readonly = 1 setting. At least readonly = 2 is required.