Snowflake vs BigQuery: Structure of objects
The hierarchical structure of objects within Snowflake and BigQuery are essentially the same however the names do differ. Here are the basic structures used in each platform:
Snowflake - database>schema>tables/views
BigQuery - project>dataset>tables/views
In Snowflake, at the top of the hierarchy, you have the database whereas in BigQuery it is called the project. This is the top level organisational container used to group related objects.
The next layer in Snowflake is called a schema whereas in BigQuery it’s a dataset. This is a container that holds tables and views.
Lastly are the tables and views. Tables are structured data storage objects that contain rows and columns of data. Here are the table types available in Snowflake and BigQuery as well as the slight differences between them (if any):
|Snowflake Name||Snowflake definition||BigQuery Name||BigQuery Definition|
|Permanent||Standard default tables with time travel capabilities||Standard||Same as Snowflake|
|Transient||The same as permanent tables but with very limited time travel capabilities||N/A||Not available in BigQuery|
|Temporary||Only available within the session they were created in||Temporary||Only available for up to 24 hours|
|External||Allows you to query data stored in an external stage||External||Same as Snowflake|
|Views||Virtual tables defined by SQL queries which reference other views or tables||Views||Same as Snowflake|
Note: Snowflake currently provides two different interfaces for interacting with the platform; Classic Console and Snowsight. For accounts created after May 30 2023, it will no longer be possible to access the Classic Console so this blog will be referring to Snowsight.
Snowflake vs BigQuery: Architecture & Performance
Snowflake uses a multi-cluster, hybrid architecture where storage and compute are separated and an account can be hosted on either AWS, GCP, or Azure. So, what does this actually mean for you? Multi-cluster refers to the ability to create new clusters within a single account which can be scaled independently to handle different workloads. Hybrid refers to it being a mix between shared-disk and shared-nothing. This means the data is shared as it’s stored in a central repository however you can have multiple warehouses each with their own compute power so that all the resources don’t need to be shared by every query.
The Cloud Data Warehouse Benchmark carried out by Fivetran in 2022 found that Snowflake came out on top when it came to query performance however this was only marginal.
Advantages: This type of architecture means it can be managed and allocated based on demand making it more cost effective and scalable.
Disadvantages: A shared architecture can lead to performance issues when many users are using the same warehouse at the same time. However this problem is mitigated by the fact that you can have multiple warehouses or clusters, reducing resource contention among users.
BigQuery uses a serverless, fully managed architecture. This means that users do not need to manage servers, it is handled solely by the provider.
Advantages: It will automatically scale based on demand without users having to worry about handling this. Google Cloud handles all the backend operations, including scaling, maintenance, and updates.
Disadvantages: It may require careful monitoring. Scaling up automatically based on demand could lead to big cost increases.
Snowflake vs BigQuery: Pricing
Snowflake uses a consumption based pricing model where compute and storage are charged separately. For compute, users buy credits which are consumed based on the number and size of virtual warehouses running concurrently. For storage, there is a monthly fee based on the amount of data you have, typically on a per-terabyte-per-month basis. It also varies slightly depending on which cloud provider and region you choose.
BigQuery’s pricing model is also consumption based. Compute charges based on the amount of data processed during queries. Storage charges you for the amount of data you load in, like snowflake it is typically on a per-terabyte-per-month basis.
Snowflake vs BigQuery: Query Language
Snowflake uses SQL however it also makes use of some unique features and extensions which may go beyond the capabilities of standard SQL.
As well as SQL, Snowflake also supports Python, Java, and Scala through Snowpark which can be written in a Python worksheet. Snowpark provides a set of tools, libraries, and connectors that allow you to interact with Snowflake through your preferred programming language. This allows you to perform data transformations and carry out testing in Snowflake without the need to move data to a different system or to install dependent libraries.
BigQuery also uses SQL. It supports the GoogleSQL dialect as well as a legacy SQL dialect however it doesn’t currently support any other languages.
Snowflake vs BigQuery: Permissions
Snowflake combines Discretionary Access Control (DAC) and Role-based Access Control (RBAC) allowing for granular permissions for schemas, tables, views, procedures, and other objects. These can be configured by assigning roles to an object (DAC). Privileges are granted to roles and roles are granted to users (RBAC).
Privileges specify the operations that users can perform on objects. There are less than 50 different privileges and a list of them all and their descriptions can be found in the Snowflake documentation. There are a small number of system-defined roles that come with any Snowflake account and these roles cannot be dropped or have their privileges edited. However you can create your own roles which you can grant specific privileges to.
Column-level security is available in Snowflake with Enterprise edition or higher. It allows the application of a masking policy to a column and can be created using the CREATE MASKING POLICY command.
Like in Snowflake, permissions can go down to the same granularity and users are granted access to objects through their roles. Access is granted in a very similar way however in BigQuery it is referred to as Identity and Access Management (IAM). This works by defining who has what access to which resource. There are around 150 BigQuery permissions which are all listed in the BigQuery documentation along with which roles they are granted by.
Creating roles is also very similar. There are lots of predefined roles and you can also create your own with a customised list of permissions.
It is also possible to have column-level security in BigQuery using policy tags. These policies will check whether a user has proper access when they run a query.
Note: I found Snowflake to be more simple and intuitive in some ways. For example you can create a role and assign it to users all in one place in Snowflake whereas in BigQuery you have to navigate around different sections to do related tasks making it feel a bit more disjointed.
Snowflake vs BigQuery: Query History
Snowflake’s query history can be found under the Activity heading and shows a variety of information pertaining to the queries including timestamp, duration, error messages, ability to open as a new query, and user. These details can also be used to filter the list of queries. Snowflake stores history for up to 14 days however due to data retention policy, the user is not shown for queries that were run more than 7 days in the past.
Snowflake also offers the ability to access historical data through the use of Time Travel. This allows you to access and restore data that has been changed or deleted. For standard edition you are able to time travel back 1 day and with Enterprise edition or higher it allows for up to 90 days of time travel.
Similarly to Snowflake, although found under Personal History or Project History, BigQuery allows you to see and filter the queries based on these details. BigQuery stores history for up to 6 months with a 1000 jobs limit.
BigQuery also has the ability to time travel allowing you to query or restore deleted tables for up to a maximum of 7 days in the past.
Snowflake vs BigQuery: Integrations & Features
Depending on which edition you purchase, you can access different features. Here is the overview taken from Snowflake’s website:
Integration with Transformation Tools
Snowflake works with a variety of data transformation tools including dbt, Matillion, and Talend. Focusing on integration with dbt, to connect to Snowflake requires your Account name, the Database, and the Warehouse.
The Snowflake Utils package in dbt contains Snowflake specific macros providing support for Snowflake features like time travel and data sharing as well as simplifying common tasks like type conversion.
Integration with Other Tools
Snowflake’s Partner Connect feature allows you to easily set up trial accounts with their partners and integrate these accounts with Snowflake. This includes data ingestion, BI, and AI & machine learning tools.
Snowflake Marketplace allows you to access third party data and services as well as market your own data products. You can find data sets like historical data, traffic conditions, and exchange rates.
BigQuery doesn’t offer different editions which allow you access to different features, all features are available to all accounts.
Integration with Transformation Tools
BigQuery also integrates with a variety of data transformation tools. To connect to BigQuery from dbt many pieces of information are required however these can easily be filled in for you if you choose to use a service account JSON. The ways to connect to Snowflake and BigQuery vary slightly however, once you’ve integrated with these tools there is practically no difference as a user apart from any syntax that differs with each platform.
Similarly to the Snowflake Utils package, there is a package to support BigQuery users in dbt called dbt-bigquery.
Integration with Other Tools
BigQuery doesn’t offer partner trial accounts however it does integrate easily with the other services and tools provided by Google Cloud’s vast ecosystem.
BigQuery also provides the ability to access data from other organisations, as well as share your own through the Analytics Hub.
Snowflake vs BigQuery: Loading in Data
There are 3 main steps to load data from your local machine into Snowflake:
- Depending on whether your file type is supported or whether you need to add any extra requirements you may need to create a file format for the data you want to load in using a SQL statement. Here you need to specify the type of file i.e. CSV, JSON etc. and any requirements i.e. how many rows to skip, what the separators are etc. If your format is supported then you can skip this step.
- Create the table that the data will be loaded into making sure it matches the structure of your data. This means specifying each column and its data type in a SQL statement.
- Load in the data by selecting your downloaded file.
Note: In Snowflake’s Classic Console it is possible to upload a file in the UI without having to write any SQL queries, but you still need to have created a table and a format (depending on whether yours is supported) and manually go through any errors that may arise.
This can be quite a laborious process to load in a single CSV table however if you’re integrating with dbt to carry out your transformations this can be avoided with the use of dbt seeds.
When dealing with JSON files, Snowflake will load each row into a single column which will then need to be manually flattened by writing a SQL query which uses the FLATTEN function.
The process to load a table into BigQuery is a lot more straightforward. You simply click your data set and then select the file you want to load in. There is no need to create the file format or table beforehand and BigQuery can auto generate the data type for you.
When dealing with JSON files in BigQuery, it will automatically place each key as its own column however there is still a need to flatten the data if your JSON file contains an array i.e. multiple values within a nested structure.
Note: BigQuery only supports NDJSON (New-line Delimited JSON) whereas Snowflake supports both NDJSON and JSON
So, which is the better platform?
There is no clear cut winner, it really depends on your specific use case. Snowflake leads in innovation and the rate of change of new features coming out, it also seems more intuitive and simple to use, however BigQuery is more user-friendly in the sense that it doesn’t require as much SQL knowledge to do the basics. There are a lot of similarities between the two platforms, they both have extensive documentation to help users navigate the platforms, have simple and easy to use query history, and the permissions are configured in a similar way. They also both integrate smoothly with dbt where, once you're hooked in, the user experience is as simple as the syntax of the tool itself.
Both make it easy to scale based on demand but with Snowflake you have to manage this yourself. This however can be a benefit - with automatic scaling usage must be carefully monitored if you want to avoid unexpected costs. The costs of these services can vary significantly depending on things like your workloads and query patterns. Loading in data is much easier and straightforward on BigQuery however it’s not too common to load in single tables anyway and CSVs can easily be loaded in if you're using dbt.
So, the good news, as an Analytics Engineer the tools are pretty similar with nuanced differences as explained above. This is great for you as you can spend more time developing your SQL and transformation skills rather than having to worry about things like syntax or learning how to use a new tool from scratch.