Technology | Feature | Use Cases |
---|---|---|
Apache Flink | Streaming Dataflow Engine Event Time Semantics Exactly-Once Semantics Backpressure Control APIs for Streaming and Batch Applications Connectors for Third-Party Data Sources | Real-Time Stream Processing on High-Throughput Data Sources Writing Both Streaming and Batch Applications |
Ganglia | Scalable and Distributed System for Monitoring Clusters and Grids Generates Reports and Views the Performance of Cluster and Individual Node Instances Ingests and Visualizes Hadoop and Spark Metrics | Monitoring Cluster Performance, Inspecting Performance of Individual Node Instances |
Apache Hadoop | Supports Massive Data Processing across Cluster of Instances Processing Models such as MapReduce and Tez Distributed File System called HDFS | Increased Processing and Storage Capacity High Availability |
HBase | Open Source & Non-Relational Distributed Database for Hadoop Ecosystem Runs on Top of HDFS Integrates with Apache Hive Backup and Restore from Amazon S3 | Providing Non-Relational Database Capabilities Direct Input and Output to MapReduce Framework SQL-Like Queries over HBase Tables Data Persistence and Disaster Recovery |
HCatalog | Allows Access to Hive Metastore Tables within Pig, Spark SQL, Custom MapReduce Applications, REST Interface, and Command Line Client Supports AWS Glue Data Catalog as Metastore for Hive | Accessing Hive Metastore Tables within Various Applications Using AWS Glue Data Catalog as Metastore for Hive |