Cosmos DB joins are a no no a go go

Freeing myself from the shackles of standard T-SQL, I’ve been getting to know the Cosmos DB (Was Document DB) No-SQL platform as a service (PaaS) as part of a number of projects. Me being the data guy was asked to create some queries and just add any value I can to the data side of it. So I’ve been finding out the limits and the approach to doing things in to a bunch of stuff with the JSON files.

So for starters the SQL support on Cosmos DB doesn’t support all the cool things that SQL Server can do, partly since this is a service focused on a different approach. It has only been around for a few years, so the level of maturity isn’t quite there for some of the functions you would expect basic SQL to do. However you can create stored procedures and user defined functions that can fill some of that gap.

So what are the limitations that proved a bit of a headache for me? It was joins and aggregations.
One of the first queries I was asked to write was a the total value of sales for a venue. So I started with the base query of:
SELECT VALUE {   "TotalSales":SUM(s.SoldPrice)
             ,   "VenueName":(s.LocationDescription)
FROM sales s
GROUP BY s.LocationDescription

That did not work. Oh, why not. School boy error, I hadn’t read my homework correctly. I knew that Cosmos DB supported aggregations, but not grouping. So the above is not valid. So only the following is valid

SELECT VALUE {"TotalSales":SUM(s.SoldPrice)}
FROM sales s

and so is

SELECT VALUE {   "TotalSales":SUM(s.SoldPrice)
             ,   "NumberOfSales":COUNT(s.SoldPrice)
FROM sales s
So you can mix aggregations, but not group them up. That’s a bit pants.
So to joins, which didn’t work as I expected. As Cosmos DB is a nice no-sql schema less structure, you can dump in what ever JSON files you like in there, and declare the schema in the file. So you can have customer details and order in the same database, just use a field in the JSON to drive which one is which, then use a WHERE clause in the SQL query. But joins are an inter-document join with in the hierarchy of the structure of the JSON file. You can’t link schema’s together, maybe joining on Customer ID’s, keys and GUID’s etc.

"order": [
"customer": [
"products": [
{ "LineItem": "Filter Coffee"}

So doing something like the following:
FROM sales s
Will not return the LineItem of “Filter Coffee”, you have to navigate down the JSON, so it works like this:
FROM sales s
JOIN o IN s.order
JOIN c IN o.customer
JOIN p IN c.products

So that requires a slightly different approach to thinking than regular RDBMS. So you can’t join JSON file in the SQL like you can with tables/views. However that is coming from a de-normalised database view, where as this a more useful on an approach for normalised databases, where you don’t mind duplication of data, rather than my OLAP view of the world.

I have to admit that I have been enjoying discovering Cosmos DB and its functions, it is very fast, and I’ve thrown a lot of data and queries at it, and it has handled it well. It also allowed me to hang out with the cool applications development team, (Increasing the average age, and hopefully IQ too), however I now I feel the need to Instagram my lunch.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s