Cluster Infrastructure Management

Create and Configure Bigdata Clusters

We create, support, and manage various clusters. These clusters are of type of Hadoop, Hive, Spark, Airflow, Presto, Trino or Other Data Lake. There is basic type of configuring it through Node Bootstrap.

Create Custom Bootstrap Script

Clusters or Compute engines runs on Linux Machine or Virtual Machines (Box) provided on Clouds provided by either AWS or Azure or GCP. All required libraries in the project can be put in Bootstrap file. The Bootstrap file is kept in Linux Machine or Virtual Box.

Add User Defined Function (UDF)

There are general purpose UDF's those help while provisioning cluster infrastructure and installing prerequisite libraries and dependencies uniformly. There are other general purpose UDF's which monitor and manage cluster infrastructure and notify users in case found any degradation. Such UDF's reduce cluster set up time and administration.

Add Custom JAR File

There are various jar files available for different Big Data Services and use cases. Custom jar files are used and added as per requirement.

Create and Configure Environment for custom packages and usages

Cloud providers or Big Data Organization provides "environment" functionality at UI. The environment supports the custom packages or libraries installations. There are various libraries or packages but there are python or R based libraries. The environments also support for installations of .egg or .whl or other format packages.

Configure Caches and Loggings

The logging is configured for wider observability and measuring the same. Caching is required to maintain running queries or jobs in runtime memory till retention period defined for the cache.