Thursday 17 October 2024

how install apache airflow

Here's a step-by-step guide on how to install Apache Airflow, complete with images to illustrate each step.

Introduction

Apache Airflow is a popular open-source platform to programmatically author, schedule, and monitor workflows. This guide will walk you through installing Apache Airflow on a local machine using a Python environment.

Prerequisites

  • Python 3.7+ installed on your machine.
  • pip package installer for Python.
  • A virtual environment tool such as venv or virtualenv (optional but recommended).
  • Make sure to have at least 2GB of free memory for a smooth installation.

Step 1: Set Up a Python Virtual Environment

It's recommended to install Airflow in a virtual environment to avoid conflicts with other Python packages on your system.

  1. Open a Terminal or Command Prompt.

  2. Navigate to your project directory, or create a new one:

    bash
    mkdir airflow_project cd airflow_project
  3. Create a Virtual Environment:

    bash
    python3 -m venv airflow_env
  4. Activate the Virtual Environment:

    • On macOS/Linux:
      bash
      source airflow_env/bin/activate
    • On Windows:
      bash
      .\airflow_env\Scripts\activate


Step 2: Install Apache Airflow

  1. Set Airflow Constraints Version: It’s important to specify a constraints file for a stable Airflow installation. Replace 2.7.0 with the desired version.

    bash
    export AIRFLOW_VERSION=2.7.0 export PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)" export CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
  2. Install Apache Airflow:

    bash
    pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"


Step 3: Initialize the Airflow Database

Airflow uses a metadata database to keep track of tasks and schedules. Initialize the database with the following command:

bash
airflow db init

This will create an SQLite database by default in the current directory. You can configure other database backends if needed.


Step 4: Create an Admin User

To access the Airflow web interface, you'll need an admin user:

bash
airflow users create \ --username admin \ --firstname YourFirstName \ --lastname YourLastName \ --role Admin \ --email youremail@example.com

Replace YourFirstName, YourLastName, and youremail@example.com with your own details.


Step 5: Start the Airflow Web Server and Scheduler

With the setup complete, you can now start the web server and scheduler.

  1. Start the Web Server (default port: 8080):

    bash
    airflow webserver --port 8080
  2. Start the Scheduler (in a new terminal):

    bash
    airflow scheduler

    You can now navigate to http://localhost:8080 to access the Airflow UI.


Step 6: Test Your Installation

Once you access the Airflow UI, you can view preloaded example DAGs or create your own to ensure everything is running smoothly.


Conclusion

Congratulations! You've successfully installed Apache Airflow and can now start building and managing workflows. For production environments, consider using a more robust database like PostgreSQL or MySQL and explore Airflow’s extensive documentation to harness its full capabilities.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.