Ticket #917 (closed defect: fixed)

Opened 16 months ago

Last modified 14 months ago

Offset Oracle and Hydrator

Reported by: Garfield-fr Owned by: romanb
Priority: major Milestone: 0.11.0
Component: Attributes Version: 0.11.0
Severity: Keywords:
Cc: Has Test:
Status: Has Patch:

Description

If i use the offset > 1 with Oracle, the query has a new parameter DCTRN_ROWNUM and the hydator don't work.

Attachments

hydrator.patch (4.5 KB) - added by Garfield-fr 16 months ago.
hydrator_v2.patch (4.3 KB) - added by Garfield-fr 14 months ago.
New patch

Change History

Changed 16 months ago by Garfield-fr

Changed 15 months ago by Garfield-fr

Query example:

SELECT * FROM (SELECT a.*, ROWNUM dctrn_rownum FROM (SELECT u.user_id AS u__user_id, u.user_name AS u__user_name, u.first_name AS u__first_name, u.last_name AS u__last_name FROM users u ORDER BY u.user_name) a WHERE ROWNUM <= 40) WHERE dctrn_rownum >= 21

Changed 15 months ago by jwage

  • milestone changed from 0.10.4 to 0.10.5

Changed 15 months ago by romanb

  • owner changed from jwage to romanb

Changed 15 months ago by bruno.p.reis

Since in oracle the rownum is mandatory, I did the following to solve this: (i think it would be better changing the Connection::modifyLimitQuery method, but this explains what is needed...)

class Doctrine_Connection_Oracle extends Doctrine_Connection
{
    protected $pkNames;

// ....

    public function setPkNames($pkNames) {
      $this->pkNames = $pkNames;
    }

// ....

    public function modifyLimitQuery($query, $limit = false, $offset = false, $isManip = false)
    {
        $limit = (int) $limit;
        $offset = (int) $offset;
        if (preg_match('/^s*SELECT/i', $query)) {
            if ( ! preg_match('/sFROMs/i', $query)) {
                $query .= " FROM dual";
            }
            if ($limit > 0) {
                // taken from http://svn.ez.no/svn/ezcomponents/packages/Database
                $max = $offset + $limit;
                ### Have distinct behavior for distinct querys (joins, etc...) and normal pagination...
                $pk = $this->pkNames?
                      $this->quoteIdentifier($this->pkNames) : 
                      '*';
                if ($offset > 0) {
                    $min = $offset + 1;
                    $query = 'SELECT '.$pk.' FROM (SELECT a.*, ROWNUM dctrn_rownum FROM (' . $query
                           . ') a WHERE ROWNUM <= ' . $max . ') WHERE dctrn_rownum >= ' . $min;
                } else {
                    $query = 'SELECT '.$pk.' FROM (' . $query .') a WHERE ROWNUM <= ' . $max;
                }
            }
        }
        return $query;
    }

and...

class Doctrine_Query extends Doctrine_Query_Abstract implements Countable, Serializable
{
// .....
 public function getLimitSubquery()
    {
        $map    = reset($this->_queryComponents);
        $table  = $map['table'];
        $componentAlias = key($this->_queryComponents);
        // get short alias
        $alias      = $this->getTableAlias($componentAlias);
        // what about composite keys?
        $primaryKey = $alias . '.' . $table->getColumnName($table->getIdentifier());
        // initialize the base of the subquery
        $subquery   = 'SELECT DISTINCT ' . $this->_conn->quoteIdentifier($primaryKey);
        $driverName = $this->_conn->getAttribute(Doctrine::ATTR_DRIVER_NAME);

        #### added this so the oracle_connection knows what to look for in the select
        if ($driverName == 'oci') {
          if( count($this->_sqlParts['orderby']) > 0 ) {
            $subquery .= ','.implode(', ', $this->_sqlParts['orderby']);
          }
          $this->_conn->setPkNames($table->getColumnName($table->getIdentifier()));
        }

AND then I had to change the Hydrator because of the DCTRN_ROWNUM field when the select still as Select * from...

    protected function _gatherRowData(&$data, &$cache, &$id, &$nonemptyComponents)
    {
        $rowData = array();
        

        foreach ($data as $key => $value) {
            // Parse each column name only once. Cache the results.
            if ( ! isset($cache[$key])) {
                if($key=='DCTRN_ROWNUM') continue;

Changed 15 months ago by bruno.p.reis

This worked fine for most cases, but its not working for group by querys yet... I´ll see if I can figure something out.

Changed 14 months ago by romanb

  • version changed from 0.10 to 0.11
  • milestone changed from 0.11.0 to 0.11.1

Changed 14 months ago by Garfield-fr

New patch

Changed 14 months ago by Garfield-fr

I have changed the method to check the field into the hydrator.

Show into the patch v2

Changed 14 months ago by jwage

  • status changed from new to closed
  • resolution set to fixed

(In [4348]) fixes #917 - Fixes hydration issue

Changed 14 months ago by jwage

  • milestone changed from 0.11.1 to 0.11.0

Changed 14 months ago by romanb

  • status changed from closed to reopened
  • resolution fixed deleted

I did not yet address this ticket for 2 reasons:

1) The provided patch is very slow (numRows * numColumn) calls of explode + count 2) If i understood bruno.p.reis correctly then there is more to it than just the field this hydration problem

bruno.p.reis: do you have any new information? is it sufficient to address only the hydration problem? If so i will do that in a much more efficient way.

Changed 14 months ago by bruno.p.reis

I apply Garfield´s patch to my 0.10.4 installation today and see what happens. It will helps to refresh my memory about this issue either.

I remember that:

- In oracle, when you have a group by you have to specify all the select fields that are not aggregated in the group by clause. - You can´t do something like select * from x where x_id in (select id, rownoum from y.... because you have more than one field on the inner query.

but it might not have anything to do with it now... later today I will give more info on this.

Changed 14 months ago by bruno.p.reis

More info: The following will work:

                           SELECT DISTINCT b5.nu_seq_usuario, b5.nome
                               FROM bprieto.usuario b5 LEFT JOIN bprieto.usuario_grupo b6
                                    ON b5.nu_seq_usuario = b6.nu_seq_usuario
                                    LEFT JOIN bprieto.grupos b7
                                    ON b6.nu_seq_grupo = b7.nu_seq_grupo
                                    LEFT JOIN bprieto.equipe b8
                                    ON b5.nu_seq_equipe = b8.nu_seq_equipe
                              WHERE 1 = 1
                           ORDER BY b5.nome

but if I comment the b5.nome, it wont

SELECT DISTINCT b5.nu_seq_usuario--, b5.nome

in this case oracle says that it is not a select expresion, so: The ORDER BY fields have to be on the SELECT DISTINCT FIELDS

That will not have any problems if the order fields are in the main table or a inner join table. But if you have a order by clause using a field from a left join table it might return duplicated results. So I would suggest that, if you do not figure out a way to go aroud this, you should return an exception on this case.

Changed 14 months ago by bruno.p.reis

Well,

I´ve told you before that it was not working with GROUP BY but it seems it is working today... at least the following query is working here inside a paginator:

return Doctrine_Query::create()
      ->select('u.nome,u.num,u.dt_cadastro, count(g.nu_seq_grupo) as num_grupos')
      ->from('Usuario u')
      ->leftJoin('u.Grupos g')
      ->orderBy('u.nome')
      ->groupBy('u.nu_seq_usuario,u.nome,u.num,u.dt_cadastro')
      ->where($where);

It might be that I have changed something else not documented here, but I think I might have done a wrong test before.

The only thing here, with the group by, is that:

if you have your code to add the PK field to the SELECT fields, so I think you should add it to the groupBy either, since all the select fields that are not agregated have to be on the group by. On the query I have shown you can see that I had to add the nu_seq_usuario (id) since the frameworked added it to the SELECT fields.

Changed 14 months ago by bruno.p.reis

Well,

that is what I have to add for today. If I run into anything else I will sure tell you here. I hope that with this informations you are allready able to do a good job as allways. One more thing I can suggest is to test all your patchs with:

left joins, inner joins, group by, order by, AND where clauses (on the main and secondary tables). That is because a lot of times I thought I had fixed something, but I then I found that it was broken using one of these.

If you need any more information that I might know, please let me know.

Changed 14 months ago by bruno.p.reis

Hey,

I just realized that the ORDER BY does not needs to be in the most inner query. So if you just eliminate the order by from this query the outer select will works just fine. Sorry for the late perception. I could have tought about it before.

Bruno

Changed 14 months ago by bruno.p.reis

Ops,

is there any case where this limit subquery's order by is needed?

Changed 14 months ago by bruno.p.reis

As I see, only the main table ORDER BY is needed in the limit subquery.

Changed 14 months ago by bruno.p.reis

So, in the query class I should have something like:

        if ($driverName == 'oci') {
          if( count($this->_sqlParts['orderby']) > 0 ) {
            $sqlOrderBy = '';
            foreach($this->_sqlParts['orderby'] as $part) {
              $ar = explode('.',$part);
              if($ar[0] == $alias) {
                $sqlOrderBy = ($sqlOrderBy ? ', ' : '').$part;
              }
            }
            $subquery .= ','.$sqlOrderBy;
          }
          $this->_conn->setPkNames($table->getColumnName($table->getIdentifier()));
        }

and

        if ($driverName == 'oci') {
            $subquery .= $sqlOrderBy ? ' ORDER BY '.$sqlOrderBy : '';
        }
        else {
            $subquery .= ( ! empty($this->_sqlParts['orderby']))? ' ORDER BY ' . implode(', ', $this->_sqlParts['orderby'])   : '';          
        }

That is working here with order by clauses aplied to related tables.

Changed 14 months ago by bruno.p.reis

I think I might have missed something with the DESC and ASC aplied to these fields.

Changed 14 months ago by romanb

Sorry but i'm really confused now. Lets try it from the beginning. So, we're talking about the LIMIT/OFFSET emulation in Oracle here right? The following is taken directly from Zend_Db:

        $limit_sql = "SELECT z2.*
            FROM (
                SELECT ROWNUM AS zend_db_rownum, z1.*
                FROM (
                    " . $sql . "
                ) z1
            ) z2
            WHERE z2.zend_db_rownum BETWEEN " . ($offset+1) . " AND " . ($offset+$count);

And Doctrine uses exactly the same (i made it look exactly like that in a recent commit, just to be sure):

                    $min = $offset + 1;
                    $query = 'SELECT b.* FROM (
                                 SELECT a.*, ROWNUM AS doctrine_rownum FROM ('
                                  . $query . ') a
                              ) b
                              WHERE b.doctrine_rownum BETWEEN ' . $min .  ' AND ' . $max;

So, now the question again: Apart from the hydration issue, why and in which way is this not correct? Where is the problem with this (apart from hydration) ?

Sorry to bug you again, but I just could not follow along your explanations and I want to get this ticket fixed.

Changed 14 months ago by bruno.p.reis

Don´t worry. I´m here to help(or at least try to...). Sorry if I was not clear enought. English is not my first language and I´m still learning how Doctrine is structured.

I think we have two main things to work here:

The first thing to understand is that the query passed to Doctrine_Connection_Oracle::modifyLimitQuery is already not working when it gets there. Why? Because in Oracle all the fields that goes into an ORDER BY clause, have to go into the SELECT DISTINCT fields. For instance, take a look on the following query generated by today´s check out (the inner one):

            SELECT a.*
              FROM (SELECT DISTINCT b5.nu_seq_usuario
                               FROM bprieto.usuario b5 LEFT JOIN bprieto.usuario_grupo b7
                                    ON b5.nu_seq_usuario = b7.nu_seq_usuario
                                    LEFT JOIN bprieto.grupos b6
                                    ON b6.nu_seq_grupo = b7.nu_seq_grupo
                                    LEFT JOIN bprieto.equipe b8
                                    ON b5.nu_seq_equipe = b8.nu_seq_equipe
                              WHERE 1 = 1
                           ORDER BY b5.nome, b6.NAME) a
             WHERE ROWNUM <= 12)

This query is broken in oracle! Why? Because b5.nome and b6.name are not in the SELECT DISTINCT fields. The only field there is the b5.nu_seq_usuario. So the first thing to fix is this query. How? You must put the fields on the ORDER BY into the SELECT DISTINCT list. That does the job. BUT if you put all the fields on the SELECT DISTINCT, it will alter the query! You might have two diferent names on the b6.name field and it would cause a b5 row to be listed twice. You can´t take out all the order by fields either because you need them. So the solution I figured out is to eliminate from this inner query the order by fields that are not from the main table in the Query class. I did that putting the following code on the Query.php file (alround line 1274 from today´s svn check out):

        // initialize the base of the subquery
        $subquery   = 'SELECT DISTINCT ' . $this->_conn->quoteIdentifier($primaryKey);
        $driverName = $this->_conn->getAttribute(Doctrine::ATTR_DRIVER_NAME);

        #### THIS IS THE PART i ADDED
        if ($driverName == 'oci') {
          if( count($this->_sqlParts['orderby']) > 0 ) {
            $sqlOrderBy = '';
            foreach($this->_sqlParts['orderby'] as $part) {
              $ar = explode('.',$part);
              if($ar[0] == $alias) {## HERE I ADD THE ORDER BY FIELDS ONLY FROM THE MAIN TABLE (I HAVE NOT CONSIDERED IF IT CAMES WITH ASC  OR DESC, HAS TO BE DONE...)
                $sqlOrderBy = ($sqlOrderBy ? ', ' : '').$part;
              }
            }
            $subquery .= ','.$sqlOrderBy;
          }
        }
        
        // pgsql needs the order by fields to be preserved in select clause
        if ($driverName == 'pgsql') {

this above code does the SELECT DISTINCT fields, and the code below does the ORDER BY (aroun line 1344):

        $subquery .= ( ! empty($this->_sqlParts['having']))?  ' HAVING '   . implode(' AND ', $this->_sqlParts['having']) : '';
        
        if ($driverName == 'oci') {
            $subquery .= $sqlOrderBy ? ' ORDER BY '.$sqlOrderBy : '';
        }
        else {
            $subquery .= ( ! empty($this->_sqlParts['orderby']))? ' ORDER BY ' . implode(', ', $this->_sqlParts['orderby'])   : '';          
        }

see that the $sqlOrderBy is declared above on the first code and has only the fields from the main (b5) table. This is how we fix this order by issue. It has nothing to do with the modifyLimitQuery as you can see.

Changed 14 months ago by bruno.p.reis

Well, when we fix the first issue, then we get a second one. Take a look on the following query:

SELECT   b.nu_seq_usuario AS b__nu_seq_usuario, b.nome AS b__nome,
         b.senha AS b__senha, b.num AS b__num,
         b.dt_cadastro AS b__dt_cadastro, b.nu_seq_equipe AS b__nu_seq_equipe,
         b2.nu_seq_grupo AS b2__nu_seq_grupo, b2.NAME AS b2__name,
         b4.nu_seq_equipe AS b4__nu_seq_equipe, b4.ds_equipe AS b4__ds_equipe
    FROM bprieto.usuario b LEFT JOIN bprieto.usuario_grupo b3
         ON b.nu_seq_usuario = b3.nu_seq_usuario
         LEFT JOIN bprieto.grupos b2 ON b2.nu_seq_grupo = b3.nu_seq_grupo
         LEFT JOIN bprieto.equipe b4 ON b.nu_seq_equipe = b4.nu_seq_equipe
   WHERE b.nu_seq_usuario IN (
            SELECT a.*
              FROM (SELECT DISTINCT b5.nu_seq_usuario, b5.nome
                               FROM bprieto.usuario b5 LEFT JOIN bprieto.usuario_grupo b7
                                    ON b5.nu_seq_usuario = b7.nu_seq_usuario
                                    LEFT JOIN bprieto.grupos b6
                                    ON b6.nu_seq_grupo = b7.nu_seq_grupo
                                    LEFT JOIN bprieto.equipe b8
                                    ON b5.nu_seq_equipe = b8.nu_seq_equipe
                              WHERE 1 = 1
                           ORDER BY b5.nome) a
             WHERE ROWNUM <= 12)
     AND 1 = 1
ORDER BY b.nome, b2.NAME

lets focus on the following part:

WHERE b.nu_seq_usuario IN (
            SELECT a.*
              FROM (SELECT DISTINCT b5.nu_seq_usuario, b5.nome

Its easy to see that SELECT a.* won´t return only the nu_seq_usuario the most outer query is expecting. And this will not work in Oracle, as it will throw a 'too many values' error or something like that. So the only thing we need to do is:

WHERE b.nu_seq_usuario IN (
            SELECT nu_seq_usuario
              FROM (SELECT DISTINCT b5.nu_seq_usuario, b5.nome

That will work. To do that I had to pass the primary key names to the inside of the Doctrine_Connection_Oracle::modifyLimitSubquery function, so instead of:

                              $query = 'SELECT b.* FROM (
                                 SELECT a.*, ROWNUM AS doctrine_rownum FROM ('
                                  . $query . ') a
                              ) b
                              WHERE b.doctrine_rownum BETWEEN ' . $min .  ' AND ' . $max;

I have something like...

                              $query = 'SELECT '.$primaryKeys.' FROM (
                                 SELECT a.*, ROWNUM AS doctrine_rownum FROM ('
                                  . $query . ') a
                              ) b
                              WHERE b.doctrine_rownum BETWEEN ' . $min .  ' AND ' . $max;

and, with this change I had to do nothing with hydration, I think. Please fell free to ask anything else you need. We sure will benefit from this work here in our job.

Changed 14 months ago by romanb

Ok, i'm starting to get it, but this is really a problem of the limit-subquery algorithm on oracle. If you're not familar with the limit-subquery thing, read this:  http://www.phpdoctrine.org/documentation/manual/0_11?one-page#dql-doctrine-query-language:limit-and-offset-clauses:the-limit-subquery-algorithm I see that the modifyLimitQuery does not work well when used inside this algorithm and that's the problem.

Concerning the first problem:
This seems to be the case for postgres, too and there's already code in there that handles this. See the code that starts with:

// pgsql needs the order by fields to be preserved in select clause
        if ($driverName == 'pgsql') { ...

Why can't we just add $driverName == 'oracle' ? If not, why?

Concerning the second problem:
We can't just replace b.* with the primary keys, then it'll work fine inside the limit-subquery but not in other regular limit queries.

So, in my eyes, the current modifyLimitQuery of the Oracle connection is absolutely correct, we should not alter it. Instead we may need to override/replace this method with the one you propose only in the limit-subquery creation or do something similar.

All in all, these 2 are really separate issues from this ticket here. If you don't mind, create 2 new tickets and just copy & paste your last description and our last 2 comments so that we get this properly separated.

Changed 14 months ago by bruno.p.reis

Done.

Created #1038 and #1039 already with the answer to your qustion about the postgres order by code.

Changed 14 months ago by romanb

  • status changed from reopened to closed
  • resolution set to fixed

(In [4366]) Fixed #917.

Note: See TracTickets for help on using tickets.