Skip to content

Conversation

@Ocramius
Copy link
Member

@Ocramius Ocramius commented Dec 3, 2015

This PR simply attempts to get rid of EntityManager#merge() and EntityManager#detach() operations.

While merge and detach can be useful in contexts where entities are being serialized:

  • serializing entities is something we suggest NOT doing
  • the semantics really don't fit PHP
  • the merge operations are quite aggressive and cause a lot of issues
  • cascade merges not configured correctly = hell
  • EntityManager#clear() is a much better API for long-running and multi-threaded processes
  • the initial use-case around EntityManager#merge() was to merge data without further queries (when avoidable): the Second Level Cache now solves that problem in a more elegant way.

TODO:

  • update documentation
  • remove merge/detach mapping information
  • ~~~remove ObjectManager#detach() and ObjectManager#merge()~~~ (or provide empty implementation that throws, for now)

@doctrinebot
Copy link

Hello,

thank you for creating this pull request. I have automatically opened an issue
on our Jira Bug Tracker for you. See the issue link:

http://www.doctrine-project.org/jira/browse/DDC-4105

We use Jira to track the state of pull requests and the versions they got
included in.

@Ocramius
Copy link
Member Author

Ocramius commented Dec 3, 2015

As discussed with @beberlei, this path is viable, but a few things must be provided instead:

  • merge and detach operations are actually operations that are part of a serializer: we need to move the interface there
  • this obviously breaks the authentication system of symfony, and @nicolas-grekas is probably needed for feedback here
  • generally, we won't support serializing entities in future, which means that alternate approaches need to appear

@guilhermeblanco
Copy link
Member

I somehow agree we should get rid of merge, but disagree of detach.
We still manage detached entities internally and there's no way to avoid it.
As I've asked on IRC, we should wait a few weeks (beginning 2016) before actually focus our work on 3.0.

@Ocramius
Copy link
Member Author

Ocramius commented Dec 4, 2015

@guilhermeblanco the idea is to move detach to something similar to clear instead.

As per discussion with @beberlei yesterday, detach can lead to horribly broken object graphs, and should lead to cascaded detach in most cases with bi-directional associations.

We need something safer and more specific.

@DHager
Copy link
Contributor

DHager commented Dec 4, 2015

I find myself thinking of it as:

  1. "Detach-All", which is basically clear(). Useful, easy, worth keeping no matter what.
  2. "Detach-One", which requires finding all the collaborating entities/collections and replacing references to the target with a lazy-loading proxy. This requires accurate metadata and badly-written entities can throw a monkey-wrench into the whole process if they maintain a reference somewhere that Doctrine doesn't know about.
  3. "Detach-Subgraph", where you cascade and affect all referenced entities. Same difficulties as above for the most part. Better, in that you don't need to handle "replacing" references in still-managed entities. Worse, in that effects are less-predictable.

We need something safer and more specific.

If the main driver is (de)-serialization, what about avoiding detachment entirely, and instead creating "clean clones"? A new object object-graph which will not contain any lazy-proxies or managed-collections. That way any third-party serializer component ought to be able to handle it:

$plainEntity = $entityManager->cloneUnmanaged($managedEntity);
$str = $mySerializer->save($plainEntity);

Then, on the way back, do some privileged reflection-magic to overwrite managed entities with state from the un-managed ones:

$plainEntity = $mySerializer->load($str);
$entityManager->overwriteState($managedEntity, $plainEntity);

It may be slower and take more memory, but I think it could be saner in the long-run.

@Ocramius
Copy link
Member Author

"Detach-Subgraph", where you cascade and affect all referenced entities.

Same thoughts here - we would need some sort of detach mechanism that safely applies detach, without breaking portions of the graph, but entirely cutting it out when there is any reference.

@Ocramius Ocramius added this to the 3.0 milestone Dec 11, 2015
@DHager
Copy link
Contributor

DHager commented Dec 11, 2015

My feeling is that avoiding accidental data-bugs is a higher priority than saving memory/CPU-cycles in the PHP runtime. Big organizations can spin up additional PHP instances, and people with high-performance data-dumping needs can probably find other ways to get the data out.

If doctrine returns a copy of the subgraph for serialization, that's easier to control and should have cleaner failure-scenarios.

@Ocramius
Copy link
Member Author

If doctrine returns a copy of the subgraph for serialization, that's easier to control and should have cleaner failure-scenarios.

Agree 100% - that's why we'd need a "serializer", rather than a detach()

@DHager
Copy link
Contributor

DHager commented Dec 11, 2015

Doctrine could also avoid the "S-word" and assume that (de)-serialization is a problem to be solved by some other library depending on how/where the user wants to transfer or archive the data. Doctrine doesn't need to care what happens to the plain old PHP objects:

    /*
     * Get subgraph from the "root" object
     */
    $plainEntity = $entityManager->cloneUnmanaged($managedEntity); 

    /* 
     * Who knows what happens to $plainEntity here? Is it (de)-serialized? 
     * We don't really care as long as it isn't horribly disfigured.
     */
    $plainEntity = SomeSerializerRoundtrip($plainEntity)

    /* 
     * Try to load/overwrite state represented by subgraph, throwing
     * exceptions if things don't line up exactly.
     */
    $entityManager->overwriteState($plainEntity);


    /* 
     * Alternate version, where we also assert that the data coming in
     * needs to be applicable to a specific managed entity or reference:
     */
    $entityManager->overwriteState($plainEntity, $managedEntityOrRef);

@moroine
Copy link

moroine commented Dec 14, 2015

Hi,

I'm currently using a lot of clear and merge operations in a long running operation. I had seen that merge operation is not as robust as persist operation (I encounter 2 bugs for which I made a PR).

So I try the import from external API some entities, in order to keep my data synchronized. I totally agree that doctrine doesn't fit for that type of operation and a raw SQL requests will be awfully much faster. But I have not enough time to manage this kind of operation, and as my script took less than 10 minutes make me happy.

In order to do this I need to manage my memory usage and the only way to do this I :

  • flush all entities
  • clear doctrine in order to garbage collect
  • merge the entities that I need to keep for the next steps

@Ocramius
Copy link
Member Author

@moroine

merge the entities that I need to keep for the next steps

Since this operation would cause queries anyway: wouldn't it be simply safer to just re-fetch the objects?

@moroine
Copy link

moroine commented Dec 16, 2015

@Ocramius Not really.

I made a bundle dedicated to import data from any source such as API, Excel ... The aim of my bundle is to add new source as much simple as possible ! In order to do this, I create a factory which is composed by one builder by entity that I can import. I tried to be flexible as much as possible, so I needed to handle of the complexity in my builders.

The developer need to provider a parser which provide sequentially (by a function next) a pattern which is a representation of my entity (better than a raw array). The problem is that a pattern could be a graph (a big one) so when I traverse my graph I need to flush / clear to garbage collect useless memory. So when I acsend back my graph I need to recover my cleared entities which data are partially imported (so different from database, included relations) so merge operation fit really fine for that case.

@DHager
Copy link
Contributor

DHager commented Dec 17, 2015

when I traverse my graph I need to flush / clear to garbage collect useless memory

@Ocramius I wonder if the identity-map could optionally use weak-references... I'm not sure if the extension exists for PHP7, but I really wish that they were built-in.

@Ocramius
Copy link
Member Author

I'm not sure if the extension exists for PHP7, but I really wish that they were built-in.

That would open a completely different can of worms about how to keep transaction integrity.

Code like following would just break in a very unexpected way:

$em->transactional(function ($em) use ($userIds) {
    array_walk(
        $userIds,
        function ($userId) {
            $user = $em->find(User::class, $userId);

            $user->ban();
        }
    )
});

I suggest looking at #5550 instead (for this sort of operation)

@billschaller
Copy link
Member

I don't think serialization or partial detachment of subgraphs is really within the scope of what the ORM should do. I think within reason we can look at what needs to be possible with the ORM to enable those sorts of use cases, but IMO they're special cases that should be implemented in userland code.

@DHager
Copy link
Contributor

DHager commented Dec 17, 2015

@zeroedin-bill I definitely agree in terms of serialization and "detach-subgraph". Do you think "copy-subgraph" would be a reasonable replacement?

On the other side--with data flowing in--what about a "merge" like:

/* 
 * Try to load/overwrite state represented by subgraph, throwing
 * exceptions if things don't line up exactly. Fails if any entities 
 * in the subgraph do not have a @MergeMethod
 */
$entityManager->overwriteState($plainEntity);

And the ORM could require that any affected entities must designate a merging-method (e.g. @MergeMethod so that it is handled in user-space:

/*
 * @Entity
 */
class Foo{
    /** @Id @Column(type="integer") @GeneratedValue */
    protected $id = null;
    /** @Column(type="string") */
    protected $a = "hello";
    /** @Column(type="string") */
    protected $b = "world";
     /** @OneToMany(targetEntity="Bar", inversedBy="foo") */
    protected $bars;

    public function __construct(){
        $this->bars = new ArrayCollection();
    }

    /*
     * @MergeMethod
     */
    public function myMergeMethod(Foo $other){
        // Can we assume framework already checked that IDs match?
        $this->a = $other->a;
        $this->b = $other->b;
        // Not sure if there are any gotchas in terms of lazy-proxies here
        $this->bars = new ArrayCollection($other->bars->toArray()); 
    }
}

// Class Bar omitted

Alternate configuration mechanisms could be used for a static merge-method like Foo::Merge(Foo $managedRecipient,Foo $unmanagedSource)

@SenseException
Copy link
Member

If an entity object is kept immutable and changes for a data row has to be handled in a new object, can this change also be handled without a EntityManager#merge() method?

@Ocramius
Copy link
Member Author

If an entity object is kept immutable and changes for a data row has to be handled in a new object, can this change also be handled without a EntityManager#merge() method?

Immutability is not handled by the ORM: only one instance of an entity per identifier is allowed in a single UnitOfWork, and its spl_object_hash() is tracked. This is the wrong tool for that job.

Ocramius added a commit to Ocramius/doctrine2 that referenced this pull request Jun 24, 2017
@Ocramius Ocramius force-pushed the prototype/merge-detach-removal branch from 3828baa to 34612dd Compare June 24, 2017 08:36
@Ocramius Ocramius changed the base branch from master to develop June 24, 2017 08:36
@Ocramius Ocramius changed the title [RFC] [BC Break] [3.x] Prototype - merge/detach removal EntityManager#merge() and EntityManager#detach() removal Jun 24, 2017
lcobucci pushed a commit to lcobucci/doctrine2 that referenced this pull request Dec 31, 2017
lcobucci pushed a commit to lcobucci/doctrine2 that referenced this pull request Dec 31, 2017
lcobucci pushed a commit to lcobucci/doctrine2 that referenced this pull request Dec 31, 2017
lcobucci pushed a commit to lcobucci/doctrine2 that referenced this pull request Dec 31, 2017
…rations in newer metadata driver implementations
lcobucci pushed a commit to lcobucci/doctrine2 that referenced this pull request Dec 31, 2017
lcobucci pushed a commit to lcobucci/doctrine2 that referenced this pull request Dec 31, 2017
lcobucci pushed a commit to lcobucci/doctrine2 that referenced this pull request Dec 31, 2017
maglnet pushed a commit to maglnet/doctrine2 that referenced this pull request Oct 10, 2018
maglnet pushed a commit to maglnet/doctrine2 that referenced this pull request Oct 10, 2018
maglnet pushed a commit to maglnet/doctrine2 that referenced this pull request Oct 10, 2018
…doctrine/common is required to completely remove those methods from the `ObjectManager` interface first
maglnet pushed a commit to maglnet/doctrine2 that referenced this pull request Oct 10, 2018
…rs are discouraged from storing entities in sessions
maglnet pushed a commit to maglnet/doctrine2 that referenced this pull request Oct 10, 2018
…cessing docs, adding links to existing batch processing utilities
maglnet pushed a commit to maglnet/doctrine2 that referenced this pull request Oct 10, 2018
maglnet pushed a commit to maglnet/doctrine2 that referenced this pull request Oct 10, 2018
maglnet pushed a commit to maglnet/doctrine2 that referenced this pull request Oct 10, 2018
maglnet pushed a commit to maglnet/doctrine2 that referenced this pull request Oct 10, 2018
maglnet pushed a commit to maglnet/doctrine2 that referenced this pull request Oct 10, 2018
maglnet pushed a commit to maglnet/doctrine2 that referenced this pull request Oct 10, 2018
maglnet pushed a commit to maglnet/doctrine2 that referenced this pull request Oct 10, 2018
maglnet pushed a commit to maglnet/doctrine2 that referenced this pull request Oct 10, 2018
maglnet pushed a commit to maglnet/doctrine2 that referenced this pull request Oct 10, 2018
maglnet pushed a commit to maglnet/doctrine2 that referenced this pull request Oct 10, 2018
…rations in newer metadata driver implementations
maglnet pushed a commit to maglnet/doctrine2 that referenced this pull request Oct 10, 2018
maglnet pushed a commit to maglnet/doctrine2 that referenced this pull request Oct 10, 2018
maglnet pushed a commit to maglnet/doctrine2 that referenced this pull request Oct 10, 2018
@greg0ire greg0ire removed this from the 3.0.0 milestone Jun 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.