Avoid taking vm barrier in heap_prepare()#14425
Merged
Merged
Conversation
Comment on lines
+2015
to
+2018
| bool needs_gc = is_incremental_marking(objspace) || (heap->free_pages == NULL && is_lazy_sweeping(objspace)); | ||
| if (needs_gc) { | ||
| gc_enter(objspace, gc_enter_event_continue, &lock_lev); // takes vm barrier, try to avoid | ||
| } |
Contributor
There was a problem hiding this comment.
What about an early return instead?
Suggested change
| bool needs_gc = is_incremental_marking(objspace) || (heap->free_pages == NULL && is_lazy_sweeping(objspace)); | |
| if (needs_gc) { | |
| gc_enter(objspace, gc_enter_event_continue, &lock_lev); // takes vm barrier, try to avoid | |
| } | |
| bool needs_gc = is_incremental_marking(objspace) || (heap->free_pages == NULL && is_lazy_sweeping(objspace)); | |
| if (!needs_gc) { | |
| return; | |
| } |
910514f to
4552e5c
Compare
nobu
reviewed
Sep 14, 2025
| { | ||
| unsigned int lock_lev; | ||
| gc_enter(objspace, gc_enter_event_continue, &lock_lev); | ||
| bool needs_gc = is_incremental_marking(objspace) || (heap->free_pages == NULL && is_lazy_sweeping(objspace)); |
Member
There was a problem hiding this comment.
How about extracting a macro:
/* In lazy sweeping or the previous incremental marking finished and
* did not yield a free page. */
#define needs_continue_sweeping(objspace, heap) \
((heap)->free_pages == NULL && is_lazy_sweeping(objspace))f179f13 to
b3a2555
Compare
This comment has been minimized.
This comment has been minimized.
We can avoid taking this barrier if we're not incremental marking or lazy sweeping. I found this was taking a significant amount of samples when profiling `Psych.load` in multiple ractors due to the vm barrier. With this change, we get significant improvements in ractor benchmarks that allocate lots of objects. -- Psych.load benchmark -- ``` Before: After: r: itr: time r: itr: time 0 ruby#1: 960ms 0 ruby#1: 943ms 0 ruby#2: 979ms 0 ruby#2: 939ms 0 ruby#3: 968ms 0 ruby#3: 948ms 0 ruby#4: 963ms 0 ruby#4: 946ms 0 ruby#5: 964ms 0 ruby#5: 944ms 1 ruby#1: 947ms 1 ruby#1: 940ms 1 ruby#2: 950ms 1 ruby#2: 947ms 1 ruby#3: 962ms 1 ruby#3: 950ms 1 ruby#4: 947ms 1 ruby#4: 945ms 1 ruby#5: 947ms 1 ruby#5: 943ms 2 ruby#1: 1131ms 2 ruby#1: 1005ms 2 ruby#2: 1153ms 2 ruby#2: 996ms 2 ruby#3: 1155ms 2 ruby#3: 1003ms 2 ruby#4: 1205ms 2 ruby#4: 1012ms 2 ruby#5: 1179ms 2 ruby#5: 1012ms 4 ruby#1: 1555ms 4 ruby#1: 1209ms 4 ruby#2: 1509ms 4 ruby#2: 1244ms 4 ruby#3: 1529ms 4 ruby#3: 1254ms 4 ruby#4: 1512ms 4 ruby#4: 1267ms 4 ruby#5: 1513ms 4 ruby#5: 1245ms 6 ruby#1: 2122ms 6 ruby#1: 1584ms 6 ruby#2: 2080ms 6 ruby#2: 1532ms 6 ruby#3: 2079ms 6 ruby#3: 1476ms 6 ruby#4: 2021ms 6 ruby#4: 1463ms 6 ruby#5: 1999ms 6 ruby#5: 1461ms 8 ruby#1: 2741ms 8 ruby#1: 1630ms 8 ruby#2: 2711ms 8 ruby#2: 1632ms 8 ruby#3: 2688ms 8 ruby#3: 1654ms 8 ruby#4: 2641ms 8 ruby#4: 1684ms 8 ruby#5: 2656ms 8 ruby#5: 1752ms ```
b3a2555 to
004d042
Compare
tenderlove
added a commit
to tenderlove/ruby
that referenced
this pull request
Nov 3, 2025
* master: (381 commits) ZJIT: Implement include_p for opt_(new|dup)array_send YARV insns (ruby#14885) Avoid taking vm barrier in heap_prepare() (ruby#14425) [ruby/json] ext/json/ext/json.h: Add missing newline at end of file [ruby/json] Fix duplicate 'inline' declaration specifier [ruby/json] Fix check_dependency [ruby/json] parser.c: Always inline `json_eat_whitespace` [ruby/json] parser.c: use `rb_str_to_interned_str` over `rb_funcall` [ruby/json] parser.c: Extract `json_string_cacheable_p` [ruby/json] parser.c: simplify sorted insert loop in rstring_cache_fetch [ruby/json] parser.c: Skip checking for escape sequences in `rstring_cache_fetch` [ruby/json] Centralize macro definitions [DOC] Tweaks for String#to_f pend on Windows for timeouts Fix incorrect RUBY_DEBUG range Make Namespace.root visible not only for debugging Use CFUNC namespace only for IFUNC frames, its behavior should be unchanged Fix use of inappropriate debug flag Add flag to ignore EXPERIMENTAL warnings No need to call rb_define_class/module_under_id Add basic namespace tests ...
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
We can avoid taking this barrier if we're not incremental marking or lazy sweeping. I found this was taking a significant amount of samples when profiling
Psych.loadin multiple ractors due to the vm barrier. With this change, we get significant improvements in ractor benchmarks that allocate lots of objects.-- Psych.load benchmark --