Pinot Error: LEFT JOIN, JSON_EXTRACT_SCALAR, And OR

by ADMIN 52 views
Iklan Headers

Introduction

Hey guys! Ever run into a weird SQL error that just makes you scratch your head? We're diving deep into a tricky issue encountered while using LEFT JOIN with JSON_EXTRACT_SCALAR in Apache Pinot, specifically when dealing with OR conditions in the WHERE clause. This article breaks down the problem, the error messages, and provides a clear understanding of why this happens and how to potentially work around it. If you’re working with Pinot and complex queries, this is definitely something you want to understand.

The Problem: Pinot and OR Conditions with JSON_EXTRACT_SCALAR

So, here's the deal. Imagine you're querying data across two tables using a LEFT JOIN. You've got filters on both tables, including a JSON_EXTRACT_SCALAR function to pull data from a JSON column on the left table. Everything seems fine and dandy when you combine your filter conditions using AND. But, BAM! As soon as you group those same conditions using OR logic, or even nest them under a single AND, Pinot throws an error. This is the exact scenario we're tackling today.

When dealing with complex queries in Apache Pinot, the LEFT JOIN operation combined with the JSON_EXTRACT_SCALAR function can sometimes lead to unexpected issues, especially when OR conditions are involved. The core of the problem lies in how Pinot optimizes and executes queries with these specific combinations. The JSON_EXTRACT_SCALAR function, which is used to extract scalar values from JSON strings, adds a layer of complexity to the query execution plan. When multiple conditions, combined with OR, are applied to the results of this function, Pinot's query engine might struggle to create an efficient execution strategy. This is because OR conditions often require the engine to evaluate multiple possible execution paths, which can be computationally expensive, particularly when dealing with large datasets or complex JSON structures.

Furthermore, the issue might stem from how Pinot's query optimizer handles the combination of LEFT JOIN and JSON_EXTRACT_SCALAR within the context of OR conditions. A LEFT JOIN ensures that all rows from the left table are included in the result, along with the matching rows from the right table. When no match is found, the columns from the right table will contain NULL values. The JSON_EXTRACT_SCALAR function, when applied to a column that might contain NULL due to the LEFT JOIN, can introduce additional complexity in the filtering logic. The query engine needs to handle cases where the JSON extraction might fail due to NULL values, and this handling can become more intricate when OR conditions are involved, potentially leading to inefficiencies or errors. Therefore, it’s crucial to understand these underlying mechanisms to effectively troubleshoot and optimize such queries.

The Query That's Causing Trouble

Let's look at the specific query that's giving us grief. This query involves two tables: workitem (wt) and workitemcustomfield (wc). We're doing a LEFT JOIN between them, and the goal is to filter results based on certain conditions.

Here’s the problematic query:

SELECT wt.workItemIdentifier, wt.name
FROM workitem wt
LEFT JOIN workitemcustomfield wc ON wt.id = wc.workItemId
WHERE (
 wt.projectId = '8aaa0bd3-7df0-4e76-bfa7-78a5654db9ee'
 )
 AND (
 (
 wc.customFieldId = '1a6df50e-a987-4d68-a7cf-3334f8c3ecb0' AND wc.value = '74d3d23a-bffa-4410-808a-0c80acf003f5'
 )
 OR (
 json_extract_scalar(
 customAttributes,
 '$.d8b5bddd-e21b-4980-8fb9-d72ce74d07f7',
 'STRING',
 'null'
 ) != 'null'
 AND json_extract_scalar(
 customAttributes,
 '$.d8b5bddd-e21b-4980-8fb9-d72ce74d07f7',
 'STRING',
 'null'
 ) = '06f19d05-b253-4483-ae1a-7842f138573b'
 )
 )
ORDER BY wt.creationTime DESC
LIMIT 50000;

Notice the OR condition? That's where the trouble starts. This OR condition combines a filter on the workitemcustomfield table with a filter that uses JSON_EXTRACT_SCALAR on the workitem table.

Breaking Down the Query:

  • LEFT JOIN: We're joining workitem and workitemcustomfield tables on wt.id = wc.workItemId. This means we want all rows from workitem and matching rows from workitemcustomfield. If there's no match, we still get the workitem row with NULL values for workitemcustomfield columns.
  • WHERE Clause: This is where the filtering happens.
    • wt.projectId = '8aaa0bd3-7df0-4e76-bfa7-78a5654db9ee': We're filtering workitem rows by projectId.
    • The OR Condition:
      • (wc.customFieldId = '1a6df50e-a987-4d68-a7cf-3334f8c3ecb0' AND wc.value = '74d3d23a-bffa-4410-808a-0c80acf003f5'): This part filters workitemcustomfield rows based on customFieldId and value.
      • json_extract_scalar(...) != 'null' AND json_extract_scalar(...) = '06f19d05-b253-4483-ae1a-7842f138573b': This is the tricky part. We're using JSON_EXTRACT_SCALAR to extract a value from the customAttributes JSON column in the workitem table. We're checking if the extracted value is not null and equals a specific UUID.
  • ORDER BY and LIMIT: We're ordering the results by creationTime in descending order and limiting the output to 50000 rows.

The Error Message: A Deep Dive

When we run the query with the OR condition, Pinot throws a rather verbose error. Let's break it down:

Error Code: 200

QueryExecutionError:
Unable to execute query plan for request: 529680001000000009 on server: 172.27.0.6@{37095,34963}, ERROR: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Caught exception while submitting request: 529680001000000009, stage: 2
 at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
 at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2096)
 at org.apache.pinot.query.service.server.QueryServer.forEachStage(QueryServer.java:307)
 at org.apache.pinot.query.service.server.QueryServer.submit(QueryServer.java:147)
...
Caused by: java.lang.RuntimeException: Caught exception while submitting request: 529680001000000009, stage: 2
 at org.apache.pinot.query.service.server.QueryServer.submitStage(QueryServer.java:270)
 at org.apache.pinot.query.service.server.QueryServer.lambda$forEachStage$5(QueryServer.java:300)
 at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
 ... 3 more
...
Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: Unsupported function: JSONEXTRACTSCALAR
 at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
 at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2096)
 at org.apache.pinot.query.service.server.QueryServer.submitStage(QueryServer.java:262)
 ... 5 more
...
Caused by: java.lang.IllegalArgumentException: Unsupported function: JSONEXTRACTSCALAR
 at org.apache.pinot.query.runtime.operator.operands.FunctionOperand.<init>(FunctionOperand.java:74)
 at org.apache.pinot.query.runtime.operator.operands.TransformOperandFactory.getTransformOperand(TransformOperandFactory.java:81)
 at org.apache.pinot.query.runtime.operator.operands.TransformOperandFactory.getTransformOperand(TransformOperandFactory.java:33)
 at org.apache.pinot.query.runtime.operator.operands.FilterOperand$Predicate.<init>(FilterOperand.java:196)
org.apache.pinot.query.service.dispatch.QueryDispatcher.lambda$submit$1(QueryDispatcher.java:188)
org.apache.pinot.query.service.dispatch.QueryDispatcher.execute(QueryDispatcher.java:248)
org.apache.pinot.query.service.dispatch.QueryDispatcher.submit(QueryDispatcher.java:185)
org.apache.pinot.query.service.dispatch.QueryDispatcher.submitAndReduce(QueryDispatcher.java:132)

The key part here is:

Caused by: java.lang.IllegalArgumentException: Unsupported function: JSONEXTRACTSCALAR

This error suggests that Pinot is having trouble with the JSON_EXTRACT_SCALAR function in the context of the OR condition. It's as if Pinot is saying, "Hey, I don't know how to handle this function when it's combined with an OR like that!"

Why is This Happening?

It's not entirely clear why Pinot behaves this way, but there are a few potential reasons:

  1. Query Optimization: Pinot's query optimizer might not be able to effectively plan the execution of a query with JSON_EXTRACT_SCALAR and OR conditions. The optimizer might struggle to find an efficient way to evaluate the OR condition when one part of it involves extracting data from a JSON column.
  2. Function Support: It's possible that the specific version of Pinot (1.3.0 in this case) has limitations in how it supports JSON_EXTRACT_SCALAR within complex WHERE clauses.
  3. Execution Engine: The underlying execution engine might not be fully optimized for this particular combination of operations.

The