Google BigQuery Public Datasets delivers several datasets that may be accessed and integrated within your application or system. It is really easy to make use of them once all your Big Data platform is stored on Google Cloud. If not, one need to create pipeline to transfer the data. In this post we show how to make it work with Apache Nifi.
We will use Open Air Quality dataset that collects real-time air quality data from around the world (5490 locations in 47 countries). Dataset includes only current measurement, so will fetch the whole table within each run.
Apache Nifi does not have out-of-the box BigQuery processors. This is something we have created create on our own and we called it GetBigQueryProcessor:
The processor allows to fetch the whole BigQuery table into Apache Nifi. Its configuration is extremely simple as it includes only 4 properties:
- JSON OAuth token to your project
- Project where source dataset is located
- Dataset within a project
- Table name within dataset
The interesting thing is that the processor automatically fetches table schema in a separate request to BigQuery. Processor returns FlowFile containing data which is JSON with all the table field names based on the table definition.
OpenAQ is one of many interesting public datasets available in BigQuery. The processor, we have created and described, works on any BigQuery table. No matter what your usecase is, Apache Nifi + GetBigQueryProcessor allow you to rapidly integrate all your data together without writing any single line of code.