2021年1月27日星期三

Airflow KubernetesExecutor, logs do not show up until after executor pods complete

I have started using KubernetesExecutor and I have set up a PV/PVC with an AWS EFS to store logs for my dags. I am also using s3 remote logging.

All the logging is working perfectly fine after a dag completes. However, I want to be able to see the logs of my jobs as they are running for long running ones.

When I exec into my scheduler pod while an executor pod is running, I am able to see the .log file of the currently running job because of the shared EFS. However when I cat the log file I do not see the logs as long as the executor is still running. Once the executor finishes however, I can see the full logs both when I cat the file and in the airflow UI.

Weirdly, on the other hand when I exec into the executor pod as it is running and I cat the exact same log file in the shared EFS I am able to see the correct logs up until that point in the job, and when I immediately cat from the scheduler or check the UI I can also see the logs up until that point.

So it seems that when I cat from within the executor pod, it is causing the logs to be flushed so that it is available everywhere. Why are the logs not flushing regularly?

Here are the config variables I am setting, note these env variables get set in my webserver/scheduler and executor pods:

# ----------------------  # For Main Airflow Pod (Webserver & Scheduler)  # ----------------------  export PYTHONPATH=$HOME  export AIRFLOW_HOME=$HOME  export PYTHONUNBUFFERED=1    # Core configs  export AIRFLOW__CORE__LOAD_EXAMPLES=False  export AIRFLOW__CORE__SQL_ALCHEMY_CONN=${AIRFLOW__CORE__SQL_ALCHEMY_CONN:-postgresql://$DB_USER:$DB_PASSWORD@$DB_HOST:5432/$DB_NAME}  export AIRFLOW__CORE__FERNET_KEY=$FERNET_KEY  export AIRFLOW__CORE__DAGS_FOLDER=$AIRFLOW_HOME/git/dags/$PROVIDER-$ENV/    # Logging configs  export AIRFLOW__LOGGING__BASE_LOG_FOLDER=$AIRFLOW_HOME/logs/  export AIRFLOW__LOGGING__REMOTE_LOGGING=True  export AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID=aws_default  export AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER=s3://path-to-bucket/airflow_logs  export AIRFLOW__LOGGING__TASK_LOG_READER=s3.task  export AIRFLOW__LOGGING__LOGGING_CONFIG_CLASS=config.logging_config.LOGGING_CONFIG    # Webserver configs  export AIRFLOW__WEBSERVER__COOKIE_SAMESITE=None  

and my logging config looks like the one in the question here

I thought this could be a python buffering issue so added PYTHONUNBUFFERED=1, but that didn't help. This is happening whether I use the PythonOperator or BashOperator

Is it the case that K8sExecutors logs just won't be available during their runtime? Only after? Or is there some configuration I must be missing?

https://stackoverflow.com/questions/65929595/airflow-kubernetesexecutor-logs-do-not-show-up-until-after-executor-pods-comple January 28, 2021 at 08:59AM

没有评论:

发表评论