Counting unique values via the Explore API

When analyzing data, you might want to know how many unique values exist for a particular field—for example, how many different species are recorded in a dataset about animals in a given country.

You can do this using the count (distinct) function in the Explore API.

This article explains how to use count(distinct ...) provides examples, and importantly it explains why the returned result is an approximation

What is the count (distinct) function?

The count(distinct ...) function lets you calculate the number of unique (non-null) values for a specific field in your dataset. It can be used in both the select and order_by clauses of your query.

Syntax:

count(distinct <field>)

Or, if you want to include null values under a custom label:

count(distinct ifnull(<field>, <alternative_value>))

Examples

For example, based on a dataset about animals in a given country:

Use case	Query	Description
Count how many unique species exist	`count(distinct species)`	Returns the number of distinct non-null values in the `species` field
Count unique species, treating null as "unknown"	`count(distinct ifnull(species, "'unknown'"))`	Counts null values as `"unknown"` and includes them in the result

You can also use it with group by:

select=count(distinct species)&group_by=region

This returns the number of unique species per region.

Why is the result only an approximation?

When you use count(distinct ...), the system estimates how many different values exist in the field. But instead of checking every single row—which could be billions of records—it uses a clever shortcut to save time and resources.

How it works

Our platform uses Elasticsearch under the hood. For counting unique values, it uses a fast estimation method called "cardinality aggregation," powered by an algorithm known as HyperLogLog++.

This algorithm is like a statistical shortcut:

Imagine you're trying to count how many different types of birds visit a huge forest.
Instead of watching every bird one by one, you observe a few areas and estimate based on what you see.
It’s fast, and usually very close to the true number—but it’s not perfect.

The results may not be perfectly accurate

When the dataset contains a very large number of unique values
When values in the field are very random or rare

So why use this method?

It works very quickly, even with large datasets
It saves memory and computing power
It gives a result that is almost always close enough for analysis
- Exact if the number of distinct values in the field is lower than 200
- Otherwise, the error is always less than 6%

So it's possible that the estimate produced by the function may differ slightly from the true count, but for most practical uses this difference is not a problem.

In conclusion

count(distinct ...) is useful for counting unique values in a dataset
It works in select and order_by clauses
You can use ifnull(...) to include null values
Results are an approximation
In most use cases, the result is fast, efficient, and accurate enough