We create, support, and manage various clusters. These clusters are of type of Hadoop, Hive, Spark, Airflow, Presto, Trino or Other Data Lake. There is basic type of configuring it through Node Bootstrap.
Clusters or Compute engines runs on Linux Machine or Virtual Machines (Box) provided on Clouds provided by either AWS or Azure or GCP. All required libraries in the project can be put in Bootstrap file. The Bootstrap file is kept in Linux Machine or Virtual Box.
There are general purpose UDF's those help while provisioning cluster infrastructure and installing prerequisite libraries and dependencies uniformly. There are other general purpose UDF's which monitor and manage cluster infrastructure and notify users in case found any degradation. Such UDF's reduce cluster set up time and administration.
There are various jar files available for different Big Data Services and use cases. Custom jar files are used and added as per requirement.
Cloud providers or Big Data Organization provides "environment" functionality at UI. The environment supports the custom packages or libraries installations. There are various libraries or packages but there are python or R based libraries. The environments also support for installations of .egg or .whl or other format packages.
The logging is configured for wider observability and measuring the same. Caching is required to maintain running queries or jobs in runtime memory till retention period defined for the cache.