Skip to content

Conversation

@ayushtkn
Copy link
Member

@ayushtkn ayushtkn commented Dec 6, 2022

Increase the partition field start to 10K to avoid collisions with columns.
#6368

@gaborkaszab
Copy link
Collaborator

gaborkaszab commented Dec 7, 2022

This seems a reasonable change for me. Just a question for my better understanding: The tables that we have already written will still have their partition field IDs from 1000, right? So in case we have some tables that have more than 1000 cols and written prior to this change will still have the collision with the partition field IDs and will only be fixed if they are, or at lest their metadata is rewritten, right?

@ayushtkn
Copy link
Member Author

ayushtkn commented Dec 7, 2022

written prior to this change will still have the collision with the partition field IDs and will only be fixed if they are, or at lest their metadata is rewritten, right?

yep

@gaborkaszab
Copy link
Collaborator

gaborkaszab commented Dec 7, 2022

Thanks for the answer, @ayushtkn!
I wonder if it would make sense to make the already written tables work as expected even with more than 1000 cols. E.g. when reading their metadata we could convert their field IDs (starting from 1000) to the new way (starting from 10000). They would remain the same in the written metadata files but internally we could keep track of the field IDs starting from 10000. And this would only be needed if we have more than 1000 cols, so most of the tables would work as they do now.

Would it make sense? I can create a separate issue for this for further discussion and if it does make sense I could give the implementation a try. @RussellSpitzer ?

@RussellSpitzer
Copy link
Member

@gaborkaszab I would probably just recommended dropping and recreating the table (via metadata) or having a separate utility for modifying existing tables. I really don't think many folks have 1000 columns since we would have seen this before so I don't think the upgrade procedure really needs to exist inside the Iceberg repo.

I think this change is pretty small and harmless though.

Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, but since this is a pretty core change I'd like at least one other committer to sign off. @szehon-ho or @aokolnychyi ?

Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@aturoczy
Copy link

aturoczy commented Dec 7, 2022

@szehon-ho https://www.youtube.com/watch?v=rR4n-0KYeKQ about the LGTM :) Just for fun.

@RussellSpitzer
Copy link
Member

Double checking the relevant part of the spec and we never actually demand that partition id's start at 1000. So I think we are in the clear hear from a backwards compatibility standpoint as well>

Table partitioning can be evolved by adding, removing, renaming, or reordering partition spec fields.

Changing a partition spec produces a new spec identified by a unique spec ID that is added to the table’s list of partition specs and may be set as the table’s default spec.

When evolving a spec, changes should not cause partition field IDs to change because the partition field IDs are used as the partition tuple field IDs in manifest files.

In v2, partition field IDs must be explicitly tracked for each partition field. New IDs are assigned based on the last assigned partition ID in table metadata.

In v1, partition field IDs were not tracked, but were assigned sequentially starting at 1000 in the reference implementation. This assignment caused problems when reading metadata tables based on manifest files from multiple specs because partition fields with the same ID may contain different data types. For compatibility with old versions, the following rules are recommended for partition evolution in v1 tables:

Do not reorder partition fields
Do not drop partition fields; instead replace the field’s transform with the void transform
Only add partition fields at the end of the previous partition spec```

@aokolnychyi
Copy link
Contributor

I'd be really careful with this change. Even though the spec may not mention it directly, that was always our assumption. I will need to take a closer look in a bit.

@RussellSpitzer
Copy link
Member

The area I'm worried about now, is

static boolean hasSequentialIds(PartitionSpec spec) {
for (int i = 0; i < spec.fields.length; i += 1) {
if (spec.fields[i].fieldId() != PARTITION_DATA_ID_START + i) {
return false;
}
}
return true;
}

Which is a check for V1 Tables which is only used

public TableMetadata buildReplacement(
Schema updatedSchema,
PartitionSpec updatedPartitionSpec,
SortOrder updatedSortOrder,
String newLocation,
Map<String, String> updatedProperties) {
ValidationException.check(
formatVersion > 1 || PartitionSpec.hasSequentialIds(updatedPartitionSpec),
"Spec does not use sequential IDs that are required in v1: %s",
updatedPartitionSpec);

and

Schema schema = schemasById.get(currentSchemaId);
PartitionSpec.checkCompatibility(spec, schema);
ValidationException.check(
formatVersion > 1 || PartitionSpec.hasSequentialIds(spec),
"Spec does not use sequential IDs that are required in v1: %s",
spec);

So I may have to think about this more

Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was feeling nervous so now I think we should hold on this, I think we need a test that starts a table at 1000, then with the new start ID set to 10000 continues to work. Specifically I think this is an issue for V1 tables I think V2 tables are probably fine with the id's being non-sequential.

I think with the current code a v1 table created unpartitioned or partitioned, would become un-modifiable due to it's spec no longer looking sequential ... I think

@gaborkaszab
Copy link
Collaborator

I also checked if PartitionSpec.hasSequentialIds() could cause any issues with existing tables. The first use that you linked seems to be the case when we re-write the table metadata and it checks the new partition field ID. This should be fine in my opinion.
However, the other use-case here (

Schema schema = schemasById.get(currentSchemaId);
PartitionSpec.checkCompatibility(spec, schema);
ValidationException.check(
formatVersion > 1 || PartitionSpec.hasSequentialIds(spec),
"Spec does not use sequential IDs that are required in v1: %s",
spec);
) seems suspicious. First I thought this is the case when we load metadata from JSON, but then it seems that TableMetadata.addPartitionSpecInternal() is used for adding new partition specs to a table, that also seems okay with the new start ID.

I can take another look tomorrow (it's kind of late now to think :) )

@ayushtkn
Copy link
Member Author

ayushtkn commented Dec 7, 2022

Thanx folks, Just thinking about the sequentialId check, why it needs to rely on the start id, Does changing that check like that help

  static boolean hasSequentialIds(PartitionSpec spec) {
    for (int i = 1; i < spec.fields.length; i += 1) {
      if (spec.fields[i].fieldId() != spec.fields[i - 1].fieldId() + 1) {
        return false;
      }
    }
    return true;
  }

@RussellSpitzer
Copy link
Member

RussellSpitzer commented Dec 7, 2022

Yeah I think the hasSequential code needs to be modified. Otherwise I can have a table whose first spec id is 1000 before this patch, then after this patch I try to add another field. The hasSequential would see 1000 != 10000 and would throw an error.

I'm not sure if having, 1000 and then 10000 would be ok in a v1 partition spec, so we need to check that.

@szehon-ho
Copy link
Member

szehon-ho commented Dec 7, 2022

@TuroczyX lol i did see that before :) yep, this is my careless kitchen review of the day

Nice catch, didnt realize it would throw an exception if its not sequential. Hm Im not 100% sure why we need to throw an exception in this case ,versus start id assignment from last assigned id, looks like it came from comment: #845 (comment)

@szehon-ho
Copy link
Member

And some in depth discussion : #280

@ayushtkn
Copy link
Member Author

ayushtkn commented Dec 7, 2022

Went through the discussion, One good thing is it has lastPartitionId, and it is used for next allocations, so that should prevent any old table breaking due to this change.

I couldn't figure out from where 1000 came in, it looks like just like that, And I don't feel if sequential is required, it should specifically start from the 'startId' either 1K or 10K

The reason for sequential isn't mentioned over there, I need to explore a bit more around that area,
but there was a test which was checking the allocation should start from startId not just sequential also(Why?)

@rdblue
Copy link
Contributor

rdblue commented Dec 8, 2022

Looks like @RussellSpitzer, @szehon-ho, and @aokolnychyi are looking at this and have noted the issues with v1 tables.

I think that this is risky because not all v1 readers will use partition field IDs, but we do write them into partition specs now. Currently, we are careful that those IDs are always the same, but this change would cause them to differ. It may be safe, but I'd test very thoroughly and possibly put this behind a flag.

I'd also like to understand why this is needed. Partition field IDs are stored in manifest files, not data files. Partition field IDs should generally not mix with data field IDs from the Iceberg schema.

The only case I can think of right now is projecting the _partition metadata field when reading a table... but in that case I think there needs to be a better solution. Running into a collision at 10,000 fields is still possible with this PR. We should just assign new field IDs to the _partition metadata fields.

@rdblue
Copy link
Contributor

rdblue commented Dec 18, 2022

Any update on this? Should we keep it open or are we pursuing other solutions?

@ShubhamSharmaCSE
Copy link

any update on this?

@ayushtkn
Copy link
Member Author

I think you created : #7221
Similar to my previous one here:
#6368
My first take was to bump this number up to 10k which I felt a safe number and was in a way inline with Hive limits, but the approach had concerns that it might be risky thing to do. I did validated couple of stuff but not in a state to say, ‘I guarantee it can’t cause any issue’ for folks migrating/upgrading from the version having older limits

@github-actions
Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Aug 24, 2024
@github-actions
Copy link

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants