Schedulers Management

Join our re-occurring workshop online as well as offline (at Gurugram)

Scheduler Workflow

Schedule a job with Platform Scheduler

Big Data organization provides scheduler on UI or command line. User can create scheduled job to execute certain business tasks. Such tasks are set to run at specified time and date once or on recurring basis. Scheduler helps to do so conveniently. 

Create and Deploy DAG to an Airflow Cluster

Apache Airflow is an open-source platform for developing, scheduling, and monitoring batch-oriented workflows. The airflow works based upon Directed acyclic graph. DAG is written in Python. Python code is written for specific workflow and placed in DAG directory. Python DAG can be deployed to other Airflow. 

Create DAG to Schedule an External Job

DAG is also used to run external job. These DAG's can be called and run within Hadoop or Hive or Spark. 

Troubleshoot Scheduler and DAG

There are various use cases wherein Scheduler and DAG don't run as expected. The most common use cases are When a DAG has X number of tasks but it has only Y number of running tasks, When it is difficult to trigger one of the DAGs, Tasks for a specific DAG get stuck, and When manually running a DAG is impossible. There are other various challenges faced while creating, configuring, administrating, and maintaining the DAG's. There is also need to manage underlying infrastructure for smooth running of DAG and hence scheduler. 

Migrate an Existing DAG to Others

DAG's are python program or package. DAG's can be exported and imported into another Airflow system or Application. Imported DAG's can be made working by making compatible virtual environment and installing all prerequisite and dependent libraries to be installed. 

Integrate with External Airflow Schedule

Tasks inside DAG are used to trigger actions remotely. But remote system or application should be integrated with Airflow scheduler by any type of connectors.