Pinot Error: LEFT JOIN, JSON_EXTRACT_SCALAR, And OR
Introduction
Hey guys! Ever run into a weird SQL error that just makes you scratch your head? We're diving deep into a tricky issue encountered while using LEFT JOIN with JSON_EXTRACT_SCALAR in Apache Pinot, specifically when dealing with OR conditions in the WHERE clause. This article breaks down the problem, the error messages, and provides a clear understanding of why this happens and how to potentially work around it. If you’re working with Pinot and complex queries, this is definitely something you want to understand.
The Problem: Pinot and OR Conditions with JSON_EXTRACT_SCALAR
So, here's the deal. Imagine you're querying data across two tables using a LEFT JOIN. You've got filters on both tables, including a JSON_EXTRACT_SCALAR function to pull data from a JSON column on the left table. Everything seems fine and dandy when you combine your filter conditions using AND. But, BAM! As soon as you group those same conditions using OR logic, or even nest them under a single AND, Pinot throws an error. This is the exact scenario we're tackling today.
When dealing with complex queries in Apache Pinot, the LEFT JOIN operation combined with the JSON_EXTRACT_SCALAR function can sometimes lead to unexpected issues, especially when OR conditions are involved. The core of the problem lies in how Pinot optimizes and executes queries with these specific combinations. The JSON_EXTRACT_SCALAR function, which is used to extract scalar values from JSON strings, adds a layer of complexity to the query execution plan. When multiple conditions, combined with OR, are applied to the results of this function, Pinot's query engine might struggle to create an efficient execution strategy. This is because OR conditions often require the engine to evaluate multiple possible execution paths, which can be computationally expensive, particularly when dealing with large datasets or complex JSON structures.
Furthermore, the issue might stem from how Pinot's query optimizer handles the combination of LEFT JOIN and JSON_EXTRACT_SCALAR within the context of OR conditions. A LEFT JOIN ensures that all rows from the left table are included in the result, along with the matching rows from the right table. When no match is found, the columns from the right table will contain NULL values. The JSON_EXTRACT_SCALAR function, when applied to a column that might contain NULL due to the LEFT JOIN, can introduce additional complexity in the filtering logic. The query engine needs to handle cases where the JSON extraction might fail due to NULL values, and this handling can become more intricate when OR conditions are involved, potentially leading to inefficiencies or errors. Therefore, it’s crucial to understand these underlying mechanisms to effectively troubleshoot and optimize such queries.
The Query That's Causing Trouble
Let's look at the specific query that's giving us grief. This query involves two tables: workitem
(wt) and workitemcustomfield
(wc). We're doing a LEFT JOIN between them, and the goal is to filter results based on certain conditions.
Here’s the problematic query:
SELECT wt.workItemIdentifier, wt.name
FROM workitem wt
LEFT JOIN workitemcustomfield wc ON wt.id = wc.workItemId
WHERE (
wt.projectId = '8aaa0bd3-7df0-4e76-bfa7-78a5654db9ee'
)
AND (
(
wc.customFieldId = '1a6df50e-a987-4d68-a7cf-3334f8c3ecb0' AND wc.value = '74d3d23a-bffa-4410-808a-0c80acf003f5'
)
OR (
json_extract_scalar(
customAttributes,
'$.d8b5bddd-e21b-4980-8fb9-d72ce74d07f7',
'STRING',
'null'
) != 'null'
AND json_extract_scalar(
customAttributes,
'$.d8b5bddd-e21b-4980-8fb9-d72ce74d07f7',
'STRING',
'null'
) = '06f19d05-b253-4483-ae1a-7842f138573b'
)
)
ORDER BY wt.creationTime DESC
LIMIT 50000;
Notice the OR condition? That's where the trouble starts. This OR condition combines a filter on the workitemcustomfield
table with a filter that uses JSON_EXTRACT_SCALAR on the workitem
table.
Breaking Down the Query:
- LEFT JOIN: We're joining
workitem
andworkitemcustomfield
tables onwt.id = wc.workItemId
. This means we want all rows fromworkitem
and matching rows fromworkitemcustomfield
. If there's no match, we still get theworkitem
row withNULL
values forworkitemcustomfield
columns. - WHERE Clause: This is where the filtering happens.
wt.projectId = '8aaa0bd3-7df0-4e76-bfa7-78a5654db9ee'
: We're filteringworkitem
rows byprojectId
.- The OR Condition:
(wc.customFieldId = '1a6df50e-a987-4d68-a7cf-3334f8c3ecb0' AND wc.value = '74d3d23a-bffa-4410-808a-0c80acf003f5')
: This part filtersworkitemcustomfield
rows based oncustomFieldId
andvalue
.json_extract_scalar(...) != 'null' AND json_extract_scalar(...) = '06f19d05-b253-4483-ae1a-7842f138573b'
: This is the tricky part. We're using JSON_EXTRACT_SCALAR to extract a value from thecustomAttributes
JSON column in theworkitem
table. We're checking if the extracted value is notnull
and equals a specific UUID.
- ORDER BY and LIMIT: We're ordering the results by
creationTime
in descending order and limiting the output to 50000 rows.
The Error Message: A Deep Dive
When we run the query with the OR condition, Pinot throws a rather verbose error. Let's break it down:
Error Code: 200
QueryExecutionError:
Unable to execute query plan for request: 529680001000000009 on server: 172.27.0.6@{37095,34963}, ERROR: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Caught exception while submitting request: 529680001000000009, stage: 2
at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2096)
at org.apache.pinot.query.service.server.QueryServer.forEachStage(QueryServer.java:307)
at org.apache.pinot.query.service.server.QueryServer.submit(QueryServer.java:147)
...
Caused by: java.lang.RuntimeException: Caught exception while submitting request: 529680001000000009, stage: 2
at org.apache.pinot.query.service.server.QueryServer.submitStage(QueryServer.java:270)
at org.apache.pinot.query.service.server.QueryServer.lambda$forEachStage$5(QueryServer.java:300)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
... 3 more
...
Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: Unsupported function: JSONEXTRACTSCALAR
at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2096)
at org.apache.pinot.query.service.server.QueryServer.submitStage(QueryServer.java:262)
... 5 more
...
Caused by: java.lang.IllegalArgumentException: Unsupported function: JSONEXTRACTSCALAR
at org.apache.pinot.query.runtime.operator.operands.FunctionOperand.<init>(FunctionOperand.java:74)
at org.apache.pinot.query.runtime.operator.operands.TransformOperandFactory.getTransformOperand(TransformOperandFactory.java:81)
at org.apache.pinot.query.runtime.operator.operands.TransformOperandFactory.getTransformOperand(TransformOperandFactory.java:33)
at org.apache.pinot.query.runtime.operator.operands.FilterOperand$Predicate.<init>(FilterOperand.java:196)
org.apache.pinot.query.service.dispatch.QueryDispatcher.lambda$submit$1(QueryDispatcher.java:188)
org.apache.pinot.query.service.dispatch.QueryDispatcher.execute(QueryDispatcher.java:248)
org.apache.pinot.query.service.dispatch.QueryDispatcher.submit(QueryDispatcher.java:185)
org.apache.pinot.query.service.dispatch.QueryDispatcher.submitAndReduce(QueryDispatcher.java:132)
The key part here is:
Caused by: java.lang.IllegalArgumentException: Unsupported function: JSONEXTRACTSCALAR
This error suggests that Pinot is having trouble with the JSON_EXTRACT_SCALAR function in the context of the OR condition. It's as if Pinot is saying, "Hey, I don't know how to handle this function when it's combined with an OR like that!"
Why is This Happening?
It's not entirely clear why Pinot behaves this way, but there are a few potential reasons:
- Query Optimization: Pinot's query optimizer might not be able to effectively plan the execution of a query with JSON_EXTRACT_SCALAR and OR conditions. The optimizer might struggle to find an efficient way to evaluate the OR condition when one part of it involves extracting data from a JSON column.
- Function Support: It's possible that the specific version of Pinot (1.3.0 in this case) has limitations in how it supports JSON_EXTRACT_SCALAR within complex WHERE clauses.
- Execution Engine: The underlying execution engine might not be fully optimized for this particular combination of operations.