Talk:Random binary tree

GA Review

This review is transcluded from Talk:Random binary tree/GA1. The edit link for this section can be used to add comments to the review.

Nominator: David Eppstein (talk · contribs)

Reviewer: Czarking0 (talk · contribs) 02:28, 28 March 2024 (UTC)[reply]

Hello and thanks for your contribution. At a glance this seems like a cool article. I will attempt to provide a though and well paced review. I am placing this table below too keep track of the criteria but feel free to suggest another format.

Some replies:

Re 1a: "Derivation": you mean a formal proof? Really? Or am I misunderstanding this point? "Almost surely": should have been with high probability (that is, probability tending to 1 in the limit of large $n$ rather than probability 1 even for fixed $n$ ; fixed. Re "(or extended...)": removed.

After reading it again I changed my mind and I think it is fine the way it is. Done Czarking0 (talk) 01:51, 29 March 2024 (UTC)[reply]

Re 1b: There is only one sentence about treaps in the lead. The paragraph it is in is mostly not about treaps. It is intended as a brief summary of the long "from random permutations" section, most of which is motivated by the average-case analysis of insertion-only binary search trees without any rebalancing. Treaps are a trick to get that same analysis to work for worst-case inputs with both insertions and deletions; I think they are worth mentioning in the lead, but not for more than a sentence. Re critical Galton-Watson trees: it's difficult to split out the p=1/2 cases from the p<1/2 and p>1/2 cases in the analysis, but I added a paragraph break before the algorithmic application as suggested, and added subsection headers to the whole section.

Done Czarking0 (talk) 01:51, 29 March 2024 (UTC)[reply]

Re 2a and "only the weaker upper bound": removed "only" and "actually" to avoid the implication that this is the last word on the subject (although I think it is).

I like this change and I think it is fine the way it is. The problem I am trying to avoid is when an article claims something that was true at the time of writing but a new proof is found later then years go by with no change. That is probably outside the context of a GAN.

Re 3a: "Is it important that the distribution is uniform?" (for treap priorities) No. It is important that they be independent and (likely) distinct, but uniformity doesn't matter. That's why I omitted it from this part. Some of the later works cited in the last paragraph of this section address the issue that getting computer representations of uniform reals to enough precision to make sure they're all distinct needs a lot of bits of randomness per number (and then all these bits need to be stored in the treap), leading to unnecessary inefficiencies. Instead they use more complicated schemes to save bits, but then they don't get trees whose distribution perfectly matches the uniform insertion order distribution.

Just to be clear, and because I find this surprising, I would still be correct if I replaced random with exponentially distributed or beta distributed (to maintain the unit interval) such that it read: "By choosing the priorities to be independent exponentially distributed real numbers in the unit interval, and by maintaining the Cartesian tree structure using tree rotations after any insertion or deletion of a node, it is possible to maintain a data structure that behaves like a random binary search tree." I wanted to make sure I am on the same page with you in that it is not just that it does not need to be uniform or even approximately uniform. Czarking0 (talk) 01:51, 29 March 2024 (UTC)[reply]

Ok after reading more I get it. Since "behaves like" is talking about the structure of the tree and not making claims about the priorities themselves it does not really matter how the priorities are assigned. The treap article covers this well. I think we are

Done here. Czarking0 (talk) 03:13, 29 March 2024 (UTC)[reply]

Re: "Why does the reader want to know about radix trees?": If the reader cares about probability distributions over random trees, the radix tree gives you a predetermined number of internal nodes, unlike the trie. If the reader cares about data structures, they need to know about both tries and radix trees as both are standard ideas in this area. If the reader cares about algorithms, both tries and radix trees model the behavior of certain binary-representation-based divide-and-conquer sorting algorithms on random inputs, but different algorithms. The trie models a sorting algorithm that divide on the bits of the binary numbers given, in strict succession from most significant to least significant. The radix tree models a sorting algorithm that finds and then divides on the next bit that separates some inputs. My personal interests are in data structures and algorithms, but actually my own motivation for including the section on tries and radix trees in this article was less about algorithms and more from the point of view of probability distributions over random trees: they give a quite different distribution than the other ones described (more strongly balanced, especially in the radix tree case) and I thought that made an interesting contrast to the other distributions.

Great reply, I think the article would be served a little bit by saying some of this. There is already radix tree but an uninformed reader may not know that they want to click on that page just from reading this. Something like, "Radix trees provide a baseline for comparing data structures. In divide-and-conquer algorithms, radix trees model the behavior on random inputs" I am not sure if those statements are 100% accurate I am just compressing your reply. Citations will likely be needed unless your sources, which I will review shortly, have some info on that. Czarking0 (talk) 01:51, 29 March 2024 (UTC)[reply]

Yes, I suspect citations motivating this as a subtopic of random trees more generally (rather than just talking about the data structures and their analysis) are likely to be sparse. —David Eppstein (talk) 06:04, 29 March 2024 (UTC)[reply]