Skip to content

coderfender/datafusion-java

 
 

Repository files navigation

Apache DataFusion Java

Java bindings for Apache DataFusion. Queries run in native Rust and results return to the JVM as Apache Arrow batches via the Arrow C Data Interface.

Early development: no releases yet, API will change. Bug reports and contributions welcome.

Quickstart

import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.ipc.ArrowReader;
import org.apache.datafusion.DataFrame;
import org.apache.datafusion.SessionContext;

try (var allocator = new RootAllocator();
     var ctx = new SessionContext()) {

    ctx.registerParquet("orders", "/path/to/orders.parquet");

    try (DataFrame df = ctx.sql(
            "SELECT o_orderpriority, COUNT(*) AS n " +
            "FROM orders GROUP BY o_orderpriority");
         ArrowReader reader = df.collect(allocator)) {
        while (reader.loadNextBatch()) {
            var batch = reader.getVectorSchemaRoot();
            // ...
        }
    }
}

SessionContext and DataFrame are AutoCloseable and not thread-safe.

Documentation

The full documentation lives under docs/source/ and is built with Sphinx (see docs/README.md for the build steps):

  • User guide — installation, the DataFrame and SQL APIs, Parquet ingestion.
  • Contributor guide — build, test, code style, and how to bump the DataFusion version.

Requirements

JDK 17+. Building from source: see docs/source/contributor-guide/development.md.

Contributing

Open an issue to discuss non-trivial changes before sending a PR. See the contributor guide.

License

Apache License 2.0. See LICENSE.txt and NOTICE.txt.

About

Java bindings for Apache DataFusion

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Java 71.8%
  • Rust 22.8%
  • Shell 3.6%
  • Python 1.3%
  • Other 0.5%