Saving Certain Logs From MySQL(Cloud SQL) To BigQuery(With Airflow Scheduling)

JooYoung Lim
2 min readApr 17, 2020

Today’s post is saving certain logs from MySQL(Cloud SQL) to BigQuery(with airflow scheduling).

Basic Flow (ETL + Scheduling)

We did ETL(Extract, Transform, Load) process.

EXTRACT data from MySQL(Cloud SQL), TRANSFORM data by proper query. Finally, LOAD data to BigQuery!

With Airflow DAG, we can do this job very easily.

Details about BigQueryOperator is in this blog!(https://zzsza.github.io/mlops/2019/04/17/airflow-bigquery-operator/
https://airflow.apache.org/docs/stable/integration.html?highlight=bigquery#bigquery)

First of all, create dataset and table to save extracted data.

Create Dataset and Table

Table’s field name should be same as transformed data’s field name.

Here’s my DAG file.

airflow_DAG.py

In default_dag_args, save your name and start date. Also, project_id should be your Google Cloud Platform’s project id.

In mysql function, type your MySQL(Cloud SQL) account name, password, host name and database name.

In mysql_query, type your query that you want to EXTRACT data.

In DAG part, type dag_id,schedule_interval,catchup, etc… for other settings.

By schedule_interval field, you can schedule your DAG. In my case, I scheduled for 1 hour interval.

You can easily create cron expression by using this website : http://www.cronmaker.com/

Inside the DAG, variable query means BigQuery’s query. You should type query that you want to LOAD data.

And save your DAG file.

When saving is done, upload DAG file to Airflow DAG folder and access to your Airflow web server…

DAG is successfully saved!

Like this, you can save your mySQL data to BigQuery easily when you use Airflow.

Thank you for reading this insufficient post! 😀

--

--