Ticket #1254 (closed defect: fixed)

Opened 12 months ago

Last modified 9 months ago

IN-Subquery with LIMIT and 1:n relation generates incorrect SQL

Reported by: Tanken Owned by: romanb
Priority: major Milestone: 1.0.3
Component: Query/Hydration Version: 1.0.0
Severity: Keywords:
Cc: Has Test: no
Status: Pending Core Response Has Patch: no

Description (last modified by Tanken) (diff)

I'm doing a somewhat complicated, but not unusual query for the most recent record of a relation (A) grouped by a certain category within this relation. One way to achieve this (according to  http://dev.mysql.com/doc/refman/5.0/en/example-maximum-column-group-row.html), is by doing a subquery for the maximum in the category group:

SELECT cat, name, created_at
FROM   A a1
WHERE  created_at IN (SELECT MAX(a2.created_at)
              FROM A a2
              WHERE a1.cat = a2.cat);

This seems to work, but then I had to join another relation (B) on A and also apply a LIMIT to the outer query, as I needed to paginate. So the query should look like this:

SELECT cat, name, created_at
FROM   A a1
LEFT
JOIN   B b1 ON a1.id = b.a_id
WHERE  created_at IN (SELECT MAX(a2.created_at)
              FROM A a2
              WHERE a1.cat = a2.cat)
LIMIT 10;

But this combination results in a subquery which uses the wrong aliases, like this:

SELECT cat, name, created_at
FROM   A a1
LEFT
JOIN   B b1 ON a1.id = b.a_id
WHERE  created_at IN (SELECT MAX(a1.created_at)
              FROM A a1
              WHERE a1.cat = a1.cat)
LIMIT 10;

without the limit it appears to work and without the join, too.

test case committed:  http://trac.phpdoctrine.org/browser/branches/0.11/tests/Ticket/1254TestCase.php

Change History

Changed 12 months ago by Tanken

  • description modified (diff)

Changed 11 months ago by jwage

  • version changed from 0.11 to 1.0
  • milestone changed from Unknown to 1.0.0-RC1

Changed 11 months ago by romanb

What database system are you using? Are you aware of the following:

1) that the tests run on SQLite which produces different SQL from MySQL and other databases in this specific case (because its a x-many join with a limit, see 3)
2) that the snippets in the ticket are neither the SQL nor the DQL of your problematic case. Take a look at the real SQL Doctrine produces for your case and add it to the ticket. In the case of MySql? it might even be 2 separate queries because MySql? does not support LIMIT in subqueries. 3) to understand what is happening when you apply limit on a joined x-many relation read this  http://www.phpdoctrine.org/documentation/manual/0_11/?one-page#dql-doctrine-query-language:limit-and-offset-clauses:the-limit-subquery-algorithm

Changed 11 months ago by Tanken

the first two SQL statements in the ticket are just a rather abstract outline of what I'm trying to do, they correspond to the example found behind the link. I have rerun the test now on both sqlite and mysql (mysql is the platform I'm actually developing on). I have updated my working copy in between, so these results may be different from those I got before. Still the test case fails in both cases.

The DQL query from the test case:

$q = new Doctrine_Query();
$q->from('RelX x');
$q->leftJoin('x.y xy');
$q->where('x.created_at IN (SELECT MAX(x2.created_at) latestInCategory FROM RelX x2 WHERE x.category = x2.category)');
$q->limit(5);

sqlite: (extracted from Doctrine_Query before it has been executed via getSql() method)

SELECT
  r.id AS r__id, r.name AS r__name, r.category AS r__category, r.created_at AS r__created_at, r2.id AS r2__id, r2.name AS r2__name, r2.rel_x_id AS r2__rel_x_id
FROM
  rel_x r LEFT JOIN rel_y r2 ON r.id = r2.rel_x_id
WHERE
  r.id IN (
    SELECT DISTINCT
      r3.id
    FROM
      rel_x r3 LEFT JOIN rel_y r4 ON r3.id = r4.rel_x_id
    WHERE
      r3.created_at IN (
        SELECT
          MAX(r3.created_at) AS r3__0
        FROM
          rel_x r3
        WHERE
          r3.category = r3.category
      )
    LIMIT 5
  )
AND
  r.created_at IN (
    SELECT
      MAX(r3.created_at) AS r3__0
    FROM
      rel_x r3
    WHERE
      r.category = r3.category
  )

MySql: (recorded using the connection profiler)

SELECT DISTINCT
  r3.id
FROM
  rel_x r3
LEFT JOIN
  rel_y r4
ON
  r3.id = r4.rel_x_id
WHERE 
  r3.created_at IN (
    SELECT
      MAX(r3.created_at) AS r3__0
    FROM
      rel_x r3
    WHERE r3.category = r3.category
  )
LIMIT 5
SELECT
  r.id AS r__id,
  r.name AS r__name,
  r.category AS r__category,
  r.created_at AS r__created_at,
  r2.id AS r2__id,
  r2.name AS r2__name,
  r2.rel_x_id AS r2__rel_x_id
FROM
  rel_x r
LEFT JOIN
  rel_y r2
ON
  r.id = r2.rel_x_id
WHERE
  r.id IN ('5')
AND
  r.created_at IN (
    SELECT
      MAX(r3.created_at) AS r3__0
    FROM
      rel_x r3
    WHERE
      r.category = r3.category
  )

Both, the sqlite and the first mysql query appear to be wrong, as the innermost nested query uses the same alias r3 as its parent query, but the DQL query specifies an equality between the outer alias x and the inner alias x2.

MySql version:

Server version: 5.0.51a-3ubuntu5.1 (Ubuntu)

PHP version:

php -v
PHP 5.2.4-2ubuntu5.3 with Suhosin-Patch 0.9.6.2 (cli) (built: Jul 23 2008 06:46:18) 
Copyright (c) 1997-2007 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2007 Zend Technologies

Changed 11 months ago by romanb

Thanks for the detailled information. I assume that the reason for this issue is that the DQL parser can't deal with correlated subqueries (subqueries that refer to/use columns of the outer query).

Changed 11 months ago by guilhermeblanco

After a lot of research, I can assure DQL parser is able to deal with correlated subqueries.

What's the issue actualy is that it calls parseSubquery and passes the current tableAliases. At that stage, it only has 2. Later, without noticing about the other ones, it calls the limit-subquery algorithm. Since it doesn't know about the processed subqueries, all subsequent tableAliases are rebuild with same names.

So, it is only necessary to address the aliases creation/notification. I'm researching about how this could be addressed, but until now I have no suggestion. I'll keep my research on this ticket for some more time.

Changed 10 months ago by anonymous

  • milestone New deleted

Milestone New deleted

Changed 10 months ago by jwage

  • milestone set to 1.0.3

So the limit subquery needs to be aware of what aliases have already been used, correct?

Changed 9 months ago by jwage

  • status changed from new to closed
  • resolution set to fixed

In r5001 this issue was fixed. Thanks for the excellent ticket.

Note: See TracTickets for help on using tickets.