9. Sprint 3 | G-sheets files on Google Drive injected on BigQuery as Tables
Project Info
- Created By Diego Oroza
- Project X - Data Pipeline
- Date 2022
Project Description
Information gathered from pdf files and other sources has automatically been sent to Google Drive Repository as G-sheet files. Once all the data is cleaned and structured within it, we proceed to load it into our Big Query instance as tables, and for that purpose we use a Python script that connects to Google Drive, search for the corresponding folder and brings each of the sheets by using multiple algorithms and data structures. Once all the information is correctly available on the script, we use Pandas Data Frames and Big Query libraries to create a bridge within Colab and GCP.