A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.
Fast SHAP value computation for interpreting tree-based models
DynoYARN is a framework to run simulated YARN clusters and workloads for YARN scale testing.
Apache Hive
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization a...
最近更新: 5天前DynoYARN is a framework to run simulated YARN clusters and workloads for YARN scale testing.
最近更新: 5天前The Data Integration Library project provides a library of generic components based on a multi-stage architecture for data ingress and egress.
最近更新: 5天前A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Apache Hive,...
最近更新: 5天前Collection of utilities to allow writing java code that operates across a wide range of avro versions.
最近更新: 5天前A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
最近更新: 5天前Gobblin is a distributed big data integration framework (ingestion, replication, compliance, retention) for batch and streaming systems. Gobblin f...
最近更新: 5天前li-apache-kafka-clients is a wrapper library for the Apache Kafka vanilla clients. It provides additional features such as large message support an...
最近更新: 5天前