TechnologyFeatureUse Cases
Apache FlinkStreaming Dataflow Engine
Event Time Semantics
Exactly-Once Semantics
Backpressure Control
APIs for Streaming and Batch Applications
Connectors for Third-Party Data Sources
Real-Time Stream Processing on High-Throughput Data Sources
Writing Both Streaming and Batch Applications
GangliaScalable and Distributed System for Monitoring Clusters and Grids
Generates Reports and Views the Performance of Cluster and Individual Node Instances
Ingests and Visualizes Hadoop and Spark Metrics
Monitoring Cluster Performance, Inspecting Performance of Individual Node Instances
Apache HadoopSupports Massive Data Processing across Cluster of Instances
Processing Models such as MapReduce and Tez
Distributed File System called HDFS
Increased Processing and Storage Capacity
High Availability
HBaseOpen Source & Non-Relational
Distributed Database for Hadoop Ecosystem
Runs on Top of HDFS
Integrates with Apache Hive
Backup and Restore from Amazon S3
Providing Non-Relational Database Capabilities
Direct Input and Output to MapReduce Framework
SQL-Like Queries over HBase Tables
Data Persistence and Disaster Recovery
HCatalogAllows Access to Hive Metastore Tables within Pig, Spark SQL, Custom MapReduce Applications, REST Interface, and Command Line Client
Supports AWS Glue Data Catalog as Metastore for Hive
Accessing Hive Metastore Tables within Various Applications
Using AWS Glue Data Catalog as Metastore for Hive
error: Content is protected !!