# spark-bigquery **Repository Path**: mirrors_spotify/spark-bigquery ## Basic Information - **Project Name**: spark-bigquery - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-08-18 - **Last Updated**: 2025-09-27 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README MAINTENANCE MODE ================ THIS PROJECT IS IN MAINTENANCE MODE DUE TO THE FACT THAT IT'S NOT WIDELY USED WITHIN SPOTIFY. WE'LL PROVIDE BEST EFFORT SUPPORT FOR ISSUES AND PULL REQUESTS BUT DO EXPECT DELAY IN RESPONSES. spark-bigquery ============== [![Build Status](https://travis-ci.org/spotify/spark-bigquery.svg?branch=master)](https://travis-ci.org/spotify/spark-bigquery) [![GitHub license](https://img.shields.io/github/license/spotify/spark-bigquery.svg)](./LICENSE) [![Maven Central](https://img.shields.io/maven-central/v/com.spotify/spark-bigquery_2.11.svg)](https://maven-badges.herokuapp.com/maven-central/com.spotify/spark-bigquery_2.11) Google BigQuery support for Spark, SQL, and DataFrames. | spark-bigquery version | Spark version | Comment | | :--------------------: | ------------- | ------- | | 0.2.x | 2.x.y | Active development | | 0.1.x | 1.x.y | Development halted | To use the package in a Google [Cloud Dataproc](https://cloud.google.com/dataproc/) cluster: install `org.apache.avro_avro-ipc-1.7.7.jar` to `~/.ivy2/jars` `spark-shell --packages com.spotify:spark-bigquery_2.10:0.2.2` To use it in a local SBT console: ```scala import com.spotify.spark.bigquery._ // Set up GCP credentials sqlContext.setGcpJsonKeyFile("") // Set up BigQuery project and bucket sqlContext.setBigQueryProjectId("") sqlContext.setBigQueryGcsBucket("") // Set up BigQuery dataset location, default is US sqlContext.setBigQueryDatasetLocation("") ``` Usage: ```scala // Load everything from a table val table = sqlContext.bigQueryTable("bigquery-public-data:samples.shakespeare") // Load results from a SQL query // Only legacy SQL dialect is supported for now val df = sqlContext.bigQuerySelect( "SELECT word, word_count FROM [bigquery-public-data:samples.shakespeare]") // Save data to a table df.saveAsBigQueryTable("my-project:my_dataset.my_table") ``` If you'd like to write nested records to BigQuery, be sure to specify an Avro Namespace. BigQuery is unable to load Avro Namespaces with a leading dot (`.nestedColumn`) on nested records. ```scala // BigQuery is able to load fields with namespace 'myNamespace.nestedColumn' df.saveAsBigQueryTable("my-project:my_dataset.my_table", tmpWriteOptions = Map("recordNamespace" -> "myNamespace")) ``` See also [Loading Avro Data from Google Cloud Storage](https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro) for data type mappings and limitations. For example loading arrays of arrays is not supported. # License Copyright 2016 Spotify AB. Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0