In airflow we can easily execute external scripts using BashOperator and PythonOperator. This provides flexibility in writing scalable data pipelines. In this post we will discuss on how to execute external script written in Python to perform a simple data generation and transformation. We will extend on the previous post on creating your first DAG on airflow installed in Windows Ubuntu.

AirflowLogo

Executing External Scripts in Airflow

In this post we will create an external python file that will contain a function to generate random numbers from 1 to 50 and append to a timestamp it was generated at. We will call this script in airflow using BashOperator and monitor it in airflow web interface.

Step 1: Inside the dags folder create a new folder name it scripts. We will save the Python script magic_number_generators.py and the output of the function here.

Step 2: Create a Python file and add the below code. Save it as magic_number_generators.py. The file generates an array of random numbers and append it to the generated timestamp. We then save the output to a file named magic_numbers.txt. If the output file does not exist then the program will create it and if it exists it will just open it and append new data record randomly generated.

                    

import random
from datetime import datetime

def generate_magic_numbers():
    data=[]
    
    data.append(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
    numbers = [random.randrange(1, 50, 1) for i in range(10)]
    data.append(numbers)

    f = open("/c/Users/soongaya/AirflowHome/dags/scripts/magic_numbers.txt", "a")
    f.writelines(str(data)+"\n")
    f.close()
    print(data)
    return data

generate_magic_numbers()

Complete code is available here.

To test if the script runs correctly. Navigate to the scripts folder inside the dags folder where you’ve saved the file and open the command line. Type the below command to run the Python script in command line.

This will create a file in the output file path and save data inside and print the generated data on the command line terminal.

Step 3: Create new dag file and save it in dags folder. Name it airflow-execute-external-scripts.py and add the code below.

                    

# step 1 -- Import needed libraries
from datetime import datetime
from airflow import DAG
from airflow.operators.bash_operator import BashOperator

# step 2 -- Define defaults parameters
default_args={
    'owner':'Sam',
    'depends_on_past':False,
    'start_date': datetime(2021,11,11),
    'retries':3
}

# step 3 -- Define DAG Object. 
dag = DAG(dag_id='Magic_Number_Generator',default_args=default_args, schedule_interval='*/1 * * * *',
	description='DAG to execute external Python scripts', 
        catchup=False)

# step 4 -- Add Task to DAG
generate_magic_numbers_operator = BashOperator(task_id='Magic-Number-BashOperator-Task', 
	bash_command="python /scripts/generate_magic_numbers.py", 
    dag=dag)

# step 5 Define dependency. 
generate_magic_numbers_operator

Complete code is available here.

First we import required packages including BashOperator which we will use to execute the external file. We then define the default parameters and create the DAG Object. We add the task to the DAG with the below code;

                    

generate_magic_numbers_operator = BashOperator(task_id='Magic-Number-BashOperator-Task',

            bash_command="python /scripts/generate_magic_numbers.py",

    dag=dag)

We finally add below code for defining the DAGs dependency to complete the DAG.

                    

generate_magic_numbers_operator

Start two Linux terminals and run airflow scheduler in one and airflow webserver in another. Navigate to web browser through localhost:8080  or localhost:8081 . When everything looks OK you should be able to see the DAG added to the DAG View in the airflow web interface.

airflow-Executing External Scripts-magic-number-generator

Note:

This approach works in airflow installed in Ubuntu in windows through Windows Subsystem for Linux. It works when the scripts are inside the dags folder. Other airflow installation such as managed services and Linux standalone and virtual machine allows you to execute scripts that are outside the dags folder.

Conclusion

In this post we have learned about executing external scripts with airflow. We have looked at the BashOperator and how we can use it to execute external Python files with code that generates random numbers from 1 to 50 and saves the output to a file. The power of airflow lies in its ability to execute external scripts that performs different tasks. In this post we run our scripts externally in airflow installed in Windows on Ubuntu. The same principles works in other airflow deployments.

Executing External Scripts

Post navigation


0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x