Execute Big Query Processor

In a previous post, we have dealt with Open Air Quality public datasets and fetched pollution measurements from Google Big Query.

We have shown, how to fetch a whole Big Query table into Apache Nifi. In most cases we do not need all table rows and it is reasonable to fetch queries results. This allows us to fetch more accurate data and move flow logic into SQL which may boost pipeline development.

Assume we want to fetch Open Air Quality data only for a given region or country. We can express that easily within Big Query as:

select * 
from `bigquery-public-data.openaq.global_air_quality` 
where country = "DE"

We have created executeBigQueryProcessor.

executeBqProc

Its configuration requires only two fields:

  • Service Account JSON token
  • Query to execute

executeBigQueryProcessor

The processor automatically determines result fields of a query and put JSON content into flowfiles:

[ {
  "location":"DEBY001",
  "city":"Bayern",
  "country":"DE",
  "pollutant": "so2",
  "value":7.66,
  ...
}... ]

BigQuery provides paging mechanism for fetching query results with a maximum of 100,000 rows per page. executeBigQueryProcessor writes each result page into a single FlowFile.

 

Leave a Reply