Cloud Data Engineering
Main Technologies of the project
GCP Compute Engine, Docker, BigQuery, Google Looker Studio, AWS VPC, AWS RDS, OpenVPN, Postgres, JavaScript
Accomplishments
- created and maintained a cross-cloud data pipeline
- developed, delivered, and deployed solution in a docker image
- created a new Google Looker Studio community visualization tool in js: Sankey Diagram
Project Description
The scope of the project was Google Looker Studio with a BigQuery data source. The requirement was to create a flow diagram for displaying app usage, for which a Sankey Diagram appeared to be the most suitable solution. However, due to the limitation of existing Sankey Diagram implementations in Looker Studios Community Visualizations, (notably there was no coloring of the edges and recursive connections were prohibited), it was decided to develop a customer-tailored solution in the form of a Looker Studio Community Visualization. Another requirement was integrating a database from an AWS RDB Instance into BigQuery, located within a private VPN and behind an OpenVPN gate. To allow for synchronization of the data in BigQuery w.r.t this private AWS database, a solution was developed, which first connects via OpenVPN to the private AWS DB instance, upon connection queries the latest data, then connects to BQ, and finally appends it to the respective BigQuery tables. The Docker image was deployed as a Compute Engine instance and triggered via Cloud Scheduler, allowing for easy up-scaling in case of increasing memory requirements.