I have started using KubernetesExecutor and I have set up a PV/PVC with an AWS EFS to store logs for my dags. I am also using s3 remote logging.
All the logging is working perfectly fine after a dag completes. However, I want to be able to see the logs of my jobs as they are running for long running ones.
When I exec into my scheduler pod while an executor pod is running, I am able to see the .log file of the currently running job because of the shared EFS. However when I cat the log file I do not see the logs as long as the executor is still running. Once the executor finishes however, I can see the full logs both when I cat the file and in the airflow UI.
Weirdly, on the other hand when I exec into the executor pod as it is running and I cat the exact same log file in the shared EFS I am able to see the correct logs up until that point in the job, and when I immediately cat from the scheduler or check the UI I can also see the logs up until that point.
So it seems that when I cat from within the executor pod, it is causing the logs to be flushed so that it is available everywhere. Why are the logs not flushing regularly?
Here are the config variables I am setting, note these env variables get set in my webserver/scheduler and executor pods:
# ---------------------- # For Main Airflow Pod (Webserver & Scheduler) # ---------------------- export PYTHONPATH=$HOME export AIRFLOW_HOME=$HOME export PYTHONUNBUFFERED=1 # Core configs export AIRFLOW__CORE__LOAD_EXAMPLES=False export AIRFLOW__CORE__SQL_ALCHEMY_CONN=${AIRFLOW__CORE__SQL_ALCHEMY_CONN:-postgresql://$DB_USER:$DB_PASSWORD@$DB_HOST:5432/$DB_NAME} export AIRFLOW__CORE__FERNET_KEY=$FERNET_KEY export AIRFLOW__CORE__DAGS_FOLDER=$AIRFLOW_HOME/git/dags/$PROVIDER-$ENV/ # Logging configs export AIRFLOW__LOGGING__BASE_LOG_FOLDER=$AIRFLOW_HOME/logs/ export AIRFLOW__LOGGING__REMOTE_LOGGING=True export AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID=aws_default export AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER=s3://path-to-bucket/airflow_logs export AIRFLOW__LOGGING__TASK_LOG_READER=s3.task export AIRFLOW__LOGGING__LOGGING_CONFIG_CLASS=config.logging_config.LOGGING_CONFIG # Webserver configs export AIRFLOW__WEBSERVER__COOKIE_SAMESITE=None and my logging config looks like the one in the question here
I thought this could be a python buffering issue so added PYTHONUNBUFFERED=1, but that didn't help. This is happening whether I use the PythonOperator or BashOperator
Is it the case that K8sExecutors logs just won't be available during their runtime? Only after? Or is there some configuration I must be missing?
https://stackoverflow.com/questions/65929595/airflow-kubernetesexecutor-logs-do-not-show-up-until-after-executor-pods-comple January 28, 2021 at 08:59AM
没有评论:
发表评论