How to upload data to Google BigQuery

6 min


Translation of the article was prepared on the eve of the start of the course “Non-relational databases”


In this article, we will look at options for uploading data to the Google BigQuery cloud storage. This includes easy ways to load data from CSV / JSON files and ways to load via API or extension.

With Google BigQuery (GBQ), you can collect data from various sources and analyze it using SQL queries. Among the advantages of GBQ are high computation speed even for large amounts of data and low cost.

Why do you need to upload data to a single repository? If you want to use end-to-end analytics, generate reports from raw data, and measure the effectiveness of your marketing, then you need Google BigQuery.

If you need to analyze terabytes of data in seconds, Google BigQuery is the easiest and most affordable choice. You can find out more about this service by looking short video on YouTube channel Google Developers

Creating a dataset and table

Before uploading any data, you first need to create a dataset and table in Google BigQuery. For this on BigQuery home page select the resource where you want to create the dataset.


Images used in this article are provided by the author

Specify the dataset identifier in the “Create dataset” window, select the data processing location and set the default storage period for the table.
Note: If you choose “Never” as the expiration date for the table, no physical storage will be defined. For temporary tables, you can specify the number of days to keep them.

Then create a table in the dataset.

Done! Now you can start downloading data.

Loading data using Google Sheets (OWOX BI BigQuery Reports extension).

If you need to upload data from Google Sheets to Google BigQuery, the easiest way to do this is to install the free OWOX BI BigQuery Reports extension.

You can install this extension directly from Google Sheets or from Chrome Web Store

After installing it, a dialog box appears with prompts and asking for permissions.

Now it’s time to get back to Google Sheets. To upload data to BigQuery, simply select Upload data to BigQuery from Add-ons -> OWOX BI BigQuery Reports.

Specify the project, dataset, and table name to load the data into. And that’s all 🙂

The undeniable advantage of the OWOX BI BigQuery Reports extension is its ease of use. You can also use the extension to customize scheduled reports.

To generate reports based on accurate raw data from all sources and automatically upload them to the Google BigQuery repository, we recommend using the service OWOX BI Pipeline

With Pipeline, you can set up automatic data collection from advertising services, call tracking systems and CRM. This allows you to quickly and easily retrieve complete datasets from the sources of your choice.

Just select data sources and allow access; leave the rest OWOX BI

With OWOX BI you can create reports for any taste and color, from ROI, ROPO effect and cohort analysis to LTV and RFM analysis.

Loading data from CSV files

To upload data from a CSV file, select a data source in the Create table window and use the Upload option.

Then select the file and its format.

Next, you need to define the destination for the data by specifying the project name and dataset.

Note: In Google BigQuery, you can select two types of tables: native and external.

Google BigQuery will automatically detect the structure of the table, but if you want to add fields manually, you can use either the text edit function or the + Add field button.

Note: If you want to interfere with the parsing of data from a CSV file into Google BigQuery, you can use advanced options.

For more information on the CSV format, see the detailed documentation from the Internet Society.

Loading data from JSON files

To load data from a JSON file, repeat all the steps above: create or select a dataset and table that you are working with – only select JSON as the file format.
You can upload a JSON file from your computer, Google Cloud Storage, or Google Drive.

Note: For more information on the JSON format, see the Google Cloud documentation.

Downloading data from Google Cloud Storage.

Google Cloud Storage allows you to safely store and transmit data online.

Useful information about working with this service:

Getting Started with Google Cloud Storage
Cloud Storage Documentation
Quick Guides
Choosing storage and database on Google Cloud Platform

You can upload files from Google Cloud Storage to Google BigQuery in the following formats:

  • CSV
  • JSON (newline delimited)
  • Avro
  • Parquet
  • ORC
  • Cloud Datastore

You can read more about using Cloud Storage with big data in official documentation

You can also inquire about data upload restrictions and Cloud Storage Permissions in the Google Cloud Help Center.

Loading data from other Google services such as Google Ads and Google Ad Manager.

To download data from various Google services, you first need to set up the BigQuery Data Transfer Service. Before you can use it, you must select or create a data project and, in most cases, enable billing for it. For example, billing is required for the following services:

  • Campaign Manager
  • Google ad manager
  • Google Ads
  • Google Play (beta)
  • YouTube – Channel Reports
  • YouTube – Content Owner Reports

Note: More about setting up and changing payment you can find out in the Google Cloud Help Center.

To launch BigQuery Data Transfer Service, on home page BigQuery, select Transfers from the menu on the left.

Note: You will need administrator rights to create a Transfer.

In the next window, all you have to do is select the desired data source.

Note: BigQuery Data Transfer Service can be accessed not only from the platform console, but also from:

  • сlassic bq_ui
  • bq command-line tool
  • BigQuery Data Transfer Service API

Once set up, the service will automatically and regularly upload data to BigQuery. However, you cannot use it to download data from BigQuery.

Loading data using the API

With Cloud Client Libraries, you can use your favorite programming language to work with the Google BigQuery API.

Note: More information on loading data using the API can be found in Google Cloud documentation

First, you need to create or select a project with which you will work. Then on the main page go to the API section.

In the API overview window, you can connect APIs and services. You need to select the API you need from the library.

In the library, you can use field search or filter the API by category.

You can use a set of Python scripts from OWOX BI to automate the import of data into Google BigQuery.
There are scripts for automating the import of data into Google BigQuery from the following sources:

  • amoCRM
  • FTP
  • FTPS
  • HTTP (S)
  • Intercom
  • ExpertSender
  • MySQL
  • SFTP

These Python scripts can be downloaded from Github

Note: Learn how to use Python while working with the Google API from this video tutorial from Google Developers on YouTube

conclusions

In this article, we’ve covered the most popular ways to load data into Google BigQuery. From simple uploading of a data file to uploading data via API, any user can find the one that suits them.


0 Comments

Leave a Reply