How to visualize daily spending on GCP cloud solutions

Customers don’t like to pay anymore. what was planned – a detailed justification of the costs is an integral and important part of the implementation of cloud technologies.

Google cloud platform provides various tariff plans for the resources used. For example, cost Gce depends on computer configuration (CPU, memory, network modules, hard drives). Expenses for Google Kubernetes Engine (GKE) and Google cloud dataproc based on all nodes that work in Google Compute Engine (GCE). The remaining costs can be calculated by a complex and intricate formula. Planning a budget is becoming increasingly difficult, especially if you use several cloud technologies. Monitoring and timely reporting becomes all the more valuable as spending on infrastructure increases.

The possibility of daily checking of reports on expenditures will also allow timely adjustment of distributed capacities, and the final bill at the end of the month will not cause surprise.

Background

Projects often need extensive reporting to improve efficiency and track costs. Though Gcp allows you to export spending data to Bigquery for further comprehensive analysis, manual processing Bigquery often not enough. Ideally, we should be able to visualize data, filter it and accumulate according to additional parameters.

Common filters:

  • Type of resource;
  • A period of time;
  • Department and team;
  • Comparison with previous periods.

First decision based on recommendation Google – use Google data studio. The implementation is very simple – you only need to configure the data source. Sample reports and a dashboard are preinstalled, but the solution itself is not flexible enough. When creating charts and graphs in Google data studio, you cannot enter a formula, you have to select all the parameters manually.

Grafana – A simple and understandable tool for monitoring and analyzing data. He is well established. Such a dashboard would be great for visualizing payment data. The question remains how to connect the web interface to BigQuery (BQ). Plugin Bq, open source, overly buggy. In the end, a bunch Bq-grafana It turns out not stable enough. In addition, there is another inconvenient point – requests Bq return data too long.

A good solution was to load data into PostgreSQL across CloudSQL, this DBMS has an official plugin PostgreSQL for Grafana. And subsequent test results showed that the selected method has clear advantages in speed.

Hidden text

CloudSQL can also be selected for other cases as the easiest way to manage relational database services.

Solution Overview

The workflow can be schematically described as follows. Pending Jobs Tool in Kubernetes cluster, every 4 hours starts a task in Cloud dataflow. This task, in turn, starts the process of loading data from Bigquery at PostgreSQL. Only the data that was found after the last export from Bq. After which the latest data can be seen in Grafanaconnected to PostgreSQL.

Database Setup

First of all, you need to configure the export of payment data to Bigquery and create a database PostgreSQL.

Hidden text

You can read more about BigQuery setup in documentation. And the database can be created using Cloud sql.

After the preparatory stage, we create several database objects, including a table with a simulation of the structure Bq:

Hidden text

CREATE DATABASE billing;
USE billing;
 
CREATE TABLE public.billing_export_2 (
	id serial NOT NULL,
	sku_id varchar NULL,
	labels varchar NULL,
	export_time varchar NULL,
	currency varchar NULL,
	sku_description varchar NULL,
	location_zone varchar NULL,
    currency_conversion_rate float8 NULL,
	project_labels varchar NULL,
	location_country varchar NULL,
	usage_start_time varchar NULL,
	billing_account_id varchar NULL,
	location_region varchar NULL,
	usage_pricing_unit varchar NULL,
	usage_amount_in_pricing_units float8 NULL,
	cost_type varchar NULL,
	project_id varchar NULL,
	system_labels varchar NULL,
	project_description varchar NULL,
	location_location varchar NULL,
	project_ancestry_numbers varchar NULL,
	credits varchar NULL,
	service_description varchar NULL,
	usage_amount float8 NULL,
	invoice_month varchar NULL,
	usage_unit varchar NULL,
	usage_end_time varchar NULL,
	"cost" float8 NULL,
	service_id varchar NULL,
	CONSTRAINT billing_export_2_pkey PRIMARY KEY (id)
);

We also create a materialized view with the values ​​that will be used to create the connection PostgreSQL across Grafana:


CREATE MATERIALIZED VIEW vw_billing_export AS
	SELECT
    	id,
        sku_id,
        labels,
        export_time::timestamp,
        currency,
        sku_description,
        location_zone,
        currency_conversion_rate,
        project_labels,
        location_country,
        usage_start_time::timestamp,
        billing_account_id,
        location_region,
        usage_pricing_unit,
        usage_amount_in_pricing_units,
        cost_type, project_id,
        system_labels,
        project_description,
        location_location,
        project_ancestry_numbers,
        credits,
        service_description,
        usage_amount,
        invoice_month,
        usage_unit,
        usage_end_time::timestamp,
        "cost",
        service_id,
    	l_label1 ->> 'value' as label1,
    	l_label2 ->> 'value' as label2,
   	...
    	FROM billing_export_2
        	LEFT  JOIN jsonb_array_elements(labels::jsonb) AS l_label1
on l_label1 ->> 'key' = ‘label1’
        	LEFT  JOIN jsonb_array_elements(labels::jsonb) AS l_label2
on l_label2 ->> 'key' = ‘label2’
        	...

To all resources you need to add a set of labels that will be used. Each label in this set is a separate column in the view. To increase the speed of work GrafanaIt’s worth creating indexes for these columns.

Hidden text

Indexes for presentation can be done later — it’s more important to understand what the queries will look like.


CREATE INDEX vw_billing_export_label1
ON vw_billing_export (label1);

Dataflow

Create an account with access to Dataflow and Bigquery. This is necessary so that the tasks Dataflow could receive data from Bigquery.

export  project=myproject
gcloud iam service-accounts create "bq-to-sql-dataflow" --project ${project}
 
gcloud projects add-iam-policy-binding ${project} 
--member serviceAccount:"bq-to-sql-dataflow@${project}.iam.gserviceaccount.com" 
--role roles/dataflow.admin
 
gcloud projects add-iam-policy-binding ${project} 
--member serviceAccount:"bq-to-sql-dataflow@${project}.iam.gserviceaccount.com" 
--role roles/bigquery.dataViewer

For tasks Dataflow you need to create two containers: one for temporary, the second for output.

gsutil mb gs://some-bucket-staging
gsutil mb gs://some-bucket-temp

Script to load data

A script is required to load data from Bigquery at CloudSQL. Cloudy Dataflow supports languages Python and Javascript. Also need a library Apache beam, a json file (set to the GOOGLE_APPLICATION_CREDENTIALS environment variable with pre-configured account privileges) and a requirements.txt file that contains a list of packages for installation (in our case, only one beam-nuggets package). Script on Python presented below:

Hidden text

The main part of the script bq-to-sql as follows:

args = parser.parse_args()
project = args.project
job_name = args.job_name + str(uuid.uuid4())
bigquery_source = args.bigquery_source
postgresql_user = args.postgresql_user
postgresql_password = args.postgresql_password
postgresql_host = args.postgresql_host
postgresql_port = args.postgresql_port
postgresql_db = args.postgresql_db
postgresql_table = args.postgresql_table
staging_location = args.staging_location
temp_location = args.temp_location
subnetwork = args.subnetwork
 
options = PipelineOptions(
        	flags=["--requirements_file", "/opt/python/requirements.txt"])
# For Cloud execution, set the Cloud Platform project, job_name,
# staging location, temp_location and specify DataflowRunner.
 
google_cloud_options = options.view_as(GoogleCloudOptions)
google_cloud_options.project = project
google_cloud_options.job_name = job_name
google_cloud_options.staging_location = staging_location
google_cloud_options.temp_location = temp_location
google_cloud_options.region = "us-west1"
worker_options = options.view_as(WorkerOptions)
worker_options.zone = "us-west1-a"
worker_options.subnetwork = subnetwork
worker_options.max_num_workers = 20
 
options.view_as(StandardOptions).runner="DataflowRunner"
 
start_date = define_start_date()
with beam.Pipeline(options=options) as p:
	rows = p | 'QueryTableStdSQL' >> beam.io.Read(beam.io.BigQuerySource(
                    	query='SELECT 
                        	billing_account_id, 
                        	service.id as service_id, 
                        	service.description as service_description, 
                        	sku.id as sku_id, 
                        	sku.description as sku_description, 
                        	usage_start_time, 
                        	usage_end_time, 
                        	project.id as project_id, 
                        	project.name as project_description, 
                        	TO_JSON_STRING(project.labels) 
                            	as project_labels, 
                        	project.ancestry_numbers 
                            	as project_ancestry_numbers, 
                        	TO_JSON_STRING(labels) as labels, 
                        	TO_JSON_STRING(system_labels) as system_labels, 
                        	location.location as location_location, 
                        	location.country as location_country, 
                        	location.region as location_region, 
                        	location.zone as location_zone, 
                        	export_time, 
                        	cost, 
                        	currency, 
                        	currency_conversion_rate, 
                        	usage.amount as usage_amount, 
                        	usage.unit as usage_unit, 
                        	usage.amount_in_pricing_units as 
                         	usage_amount_in_pricing_units, 
                        	usage.pricing_unit as usage_pricing_unit, 
                        	TO_JSON_STRING(credits) as credits, 
                        	invoice.month as invoice_month, 
                   	     cost_type 
FROM `' + project + '.' + bigquery_source + '` 
                        	WHERE export_time >= "' + start_date + '"',
                    	use_standard_sql=True))
	source_config = relational_db.SourceConfiguration(
          	              drivername="postgresql+pg8000",
                        	host=postgresql_host,
                        	port=postgresql_port,
                        	username=postgresql_user,
                        	password=postgresql_password,
                        	database=postgresql_db,
                        	create_if_missing=True,
                        	)
	table_config = relational_db.TableConfiguration(
                        	name=postgresql_table,
            	            create_if_missing=True
                        	)
	rows | 'Writing to DB' >> relational_db.Write(
    	source_config=source_config,
    	table_config=table_config
	)

For data consistency, you must determine the maximum export_timeexport time to PostgreSQLand then load records from Bigquerythat would start with this time cut-off.

After the data is loaded, you need to update the materialized view. Since this process takes some time, this will allow you to create a new materialized view with identical structure and indices, delete the old one and rename the new one.

Creating a JSON file with SA permissions

An account created earlier for a workflow Cloud dataflowalso used in tasks Cron. You need to specify a command that will create a private key-password for the account, subsequently uploaded to the cluster Kuberneteswhere it will act as a hidden password to access Cron tasks.

gcloud iam service-accounts keys create ./cloud-sa.json 
--iam-account "bq-to-sql-dataflow@${project}.iam.gserviceaccount.com" 
--project ${project}

Deploying a secret password in a K8s cluster:

kubectl create secret generic bq-to-sql-creds --from-file=./cloud-sa.json

Creating a Docker Image

Since the main goal is daily automatic task execution Dataflowcreate Docker image with all necessary environment variables and a script on Python:

Hidden text

Dockerfile:

FROM python:latest
RUN 
  bin/bash -c " 
  apt-get update && 
  apt-get install python2.7-dev -y && 
  pip install virtualenv && 
  virtualenv -p /usr/bin/python2.7 --distribute temp-python && 
  source temp-python/bin/activate && 
  pip2 install --upgrade setuptools && 
  pip2 install pip==9.0.3 && 
  pip2 install requests && 
  pip2 install Cython && 
  pip2 install apache_beam && 
  pip2 install apache_beam[gcp] && 
  pip2 install beam-nuggets && 
  pip2 install psycopg2-binary && 
  pip2 install uuid"

COPY ./bq-to-sql.py /opt/python/bq-to-sql.py
COPY ./requirements.txt /opt/python/requirements.txt
COPY ./main.sh /opt/python/main.sh

FROM python:latest
RUN 
  bin/bash -c " 
  apt-get update && 
  apt-get install python2.7-dev -y && 
  pip install virtualenv && 
  virtualenv -p /usr/bin/python2.7 --distribute temp-python && 
  source temp-python/bin/activate && 
  pip2 install --upgrade setuptools && 
  pip2 install pip==9.0.3 && 
  pip2 install requests && 
  pip2 install Cython && 
  pip2 install apache_beam && 
  pip2 install apache_beam[gcp] && 
  pip2 install beam-nuggets && 
  pip2 install psycopg2-binary && 
  pip2 install uuid"

COPY ./bq-to-sql.py /opt/python/bq-to-sql.py
COPY ./requirements.txt /opt/python/requirements.txt
COPY ./main.sh /opt/python/main.sh

image: 
imageTag: latest
imagePullPolicy: IfNotPresent
project: 
job_name: "bq-to-sql"
bigquery_source: "[dataset].[table]”
postgresql:
  user: 
  password: 
  host: 
  port: "5432"
  db: "billing"
  table: "billing_export"
staging_location: "gs://my-bucket-stg"
temp_location: "gs://my-bucket-tmp"  
subnetwork: "regions/us-west1/subnetworks/default"

To facilitate deployment and reuse Cron tasks, use the package installer Kubernetes, Helm:

Hidden text

cronjob.yaml:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: {{ template "bq-to-sql.fullname" . }}
spec:
  schedule: "0 0 * * *"
  jobTemplate:
	spec:
  	template:
    	spec:
    	  restartPolicy: OnFailure
      	containers:
      	- name: {{ template "bq-to-sql.name" . }}
        	image: "{{ .Values.image }}:{{ .Values.imageTag }}"
        	imagePullPolicy: "{{ .Values.imagePullPolicy }}"
        	command: [ "/bin/bash", "-c", "bash /opt/python/main.sh 
            	{{ .Values.project }} 
            	{{ .Values.job_name }} 
            	{{ .Values.bigquery_source }} 
            	{{ .Values.postgresql.user }} 
            	{{ .Values.postgresql.password }} 
            	{{ .Values.postgresql.host }} 
            	{{ .Values.postgresql.port }} 
            	{{ .Values.postgresql.db }} 
            	{{ .Values.postgresql.table }} 
            	{{ .Values.staging_location }} 
            	{{ .Values.temp_location }} 
                {{ .Values.subnetwork }}"]
        	volumeMounts:
        	- name: creds
          	mountPath: /root/.config/gcloud
          	readOnly: true
        	env:
        	- name: GOOGLE_APPLICATION_CREDENTIALS
          	value: /root/.config/gcloud/creds.json
      	volumes:
        	- name: creds
          	secret:
            	secretName: bq-to-sql-creds

Data visualization with Grafana

The last step is to create the long-awaited dashboards in Grafana. There are no restrictions at this stage, you can use any style you like to display. As an example, it could be Data Studio Billing Report Demo.

Hidden text

Of course, all SQL queries will have to be written from scratch 🙂

Of the important things I would like to draw more attention to. For security reasons, it is better to assign different rights to users:

  • In reading mode (for viewing billing data);
  • To connect with Grafana.

Conclusion

The described manual provides an opportunity to take a look at the workflow the purpose of which is to visualize account data. This process supports the addition of new filters and metrics. As a result, the client receives a set of useful and quick dashboards that help to control and optimize future costs for Google cloud.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *