Aug 25, 2022
Simplifying database connectivity with Arrow Flight SQL and ADBC
David Li, Tom Drabas, Alison Hill
Multi-language tools allow users the freedom to work where they want, how they want, unleashing greater productivity. We design and build composable data systems that enable users to write code once, in any language, on any device, and at any scale to develop pipelines faster.
TL;DR The projects shared in this post are:
- Arrow Database Connectivity (we will use the ADBC acronym throughout the post)
- Arrow Flight SQL (we shorten this to Flight SQL in this post)
- Arrow Flight RPC (we shorten this to Flight RPC in this post)
From the start, Apache Arrow has aimed to bridge data ecosystems together. With the ADBC and Flight SQL projects under active development, the Arrow community is now working to simplify database connectivity for both clients and vendors. In this post, we’ll explain who each project tries to help and how they fit together.
Flight SQL or ADBC? Well…
Why two projects? Because database clients and vendors face overlapping but distinct problems.
Database clients face a choice when deciding how to get Arrow data. They could start from a tried-and-true, generic API like JDBC and ODBC. This choice makes it easy to work with different databases. But, neither API has native support for the Arrow columnar format, so the data has to be converted. Converting row-oriented data to columnar is costly. It adds extra development time and uses expensive hardware resources. Another option is to integrate with database-specific libraries that do offer columnar data. Some examples include clickhouse-cpp or google-cloud-bigquery. This option requires extra development work for each database that clients want to use. The ADBC project provides a simpler way for clients to get Arrow data. ADBC is a standard API for connecting Arrow-native clients with databases, engines, and storage. ADBC is modular, and abstracts over different wire protocols and database-specific libraries.
Database vendors also have to make a difficult choice when deciding how to serve up data to clients. If they implement existing generic APIs and protocols, they have to give up some of the benefits of columnar data. On the other hand, they could implement a custom API. But, they would have to then build out support for every different client, now and into the future. Flight SQL offers an alternative for vendors with a fully integrated, Arrow-native wire protocol. Flight SQL is flexible, and not only integrates with ADBC, but also with JDBC and ODBC.