fix upsert with null values#1861
Conversation
|
Thanks for looking into this @kevinjqliu, pretty annoying null-handling here :/ |
|
now im wondering if nulls are properly handled when we convert iceberg expressions to pyarrow expressions iceberg-python/pyiceberg/io/pyarrow.py Lines 721 to 725 in 1a5e32a https://arrow.apache.org/docs/python/generated/pyarrow.compute.equal.html |
I think we need to check the Edit: I think that situation is different since we're comparing two columns in the Upsert logic. For the visitor that you linked, we compare against a literal. |
<!--
Thanks for opening a pull request!
-->
<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->
# Rationale for this change
Closes #1835
Original implementation, `!=`
([not_equal](https://arrow.apache.org/docs/python/generated/pyarrow.compute.not_equal.html#pyarrow.compute.not_equal))
does not account for `null` values. It emits `null` when either sides
are `null`
The new implementation, first checks for `not_equal`. And on null
values, returns `true` only if both sides are `null`
Similar to apache/iceberg-rust#1045
# Are these changes tested?
Yes
# Are there any user-facing changes?
No
<!-- In the case of user-facing changes, please add the changelog label.
-->
<!--
Thanks for opening a pull request!
-->
<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->
# Rationale for this change
Closes apache#1835
Original implementation, `!=`
([not_equal](https://arrow.apache.org/docs/python/generated/pyarrow.compute.not_equal.html#pyarrow.compute.not_equal))
does not account for `null` values. It emits `null` when either sides
are `null`
The new implementation, first checks for `not_equal`. And on null
values, returns `true` only if both sides are `null`
Similar to apache/iceberg-rust#1045
# Are these changes tested?
Yes
# Are there any user-facing changes?
No
<!-- In the case of user-facing changes, please add the changelog label.
-->
Rationale for this change
Closes #1835
Original implementation,
!=(not_equal) does not account fornullvalues. It emitsnullwhen either sides arenullThe new implementation, first checks for
not_equal. And on null values, returnstrueonly if both sides arenullSimilar to apache/iceberg-rust#1045
Are these changes tested?
Yes
Are there any user-facing changes?
No