Monitoring of public RPC/WSS Endpoints
2 years ago
Rejected
Dear community,
EDIT: The proposal now seeks $32,640.00 for RPC monitoring only.
I am resubmitting a proposal for Monitoring of public RPC/WSS Endpoints & Node Database Analytics. This proposal was previous submitted as referendum 175 but failed due to HANC's interference.
Unable to gauge his feedback, I have made the followings revisions:
- I am funding provision of servers for the DB Analytics/Restoration, this reduces the overall cost of the proposal by ~25%
- Reduce the slippage from 10% to 5% in light of a recent market correction
Do note that slippage as well as any excess KSM beyond the asking USD amount is refundable to the treasury.
The goals of the proposal remain the same:
- To become an independent monitor of all WSS providers, displaying key metrics in a visual manner (Dashboard)
- To monitor DB growth, sizes, bloat and restoration times for various DB configurations.
Kind Regards,
Comments (10)
Requested

277.48K DOT
Proposal Failed
Summary
0%
Aye
0%
Nay
Aye (165)0.0 DOT
Support0.0 DOT
Nay (114)0.0 DOT
You got my Aye support, good luck, Will
We voted Nay on ref. 175 and I had a personal discussion with Will (Paradox) on Element that I would provide information in the next referendum re-post which will also be again be a Nay, despite the overwhelming community support.
There is the IBP network monitor that checks whether nodes are healthy and on the right version their ping-times. Pretty much a complete dashboard. Even though is checks only the IBP nodes.
This passed Kusama Governance in ref 35 and funded the most performant nodes and as the community go-to servers for all relay-chains and system chains. My point being I don't believe there is a need to monitor all public WSS nodes, this I believe is a waste of the KSM treasury funds and I think it won't have any traction except for the RPC/WSS speedfetishj geeks, that would last for 1 min of fun.
The other main functionality other point to monitor the DB sizes, restoration times of various DB configuration, which doesn't even cover the default setting of the node software in the proposal(256 blocks). Let's say I'm a validator and I require the last 256 blocks of state pruning. On our nodes this is currently ~250GB, if I’m looking to be a member of the 1KV member I’m not going to rent a server with exactly 251GB storage. No you rent/buy something that can last for 1-2 years minimum if you’re serious about it. And the state doesn’t jump +10GB a day. These values can be easily written down somewhere and last for 1-2 years as valid numbers. As is the case now in the Polkadot Wiki This can always be expanded of course by community members.
Regarding the restoration times, this can vary from 5-15 minutes depending on your hardware and connection. Don’t think exact numbers are needed here and would be just statistics that I don’t believe is worth 2700 KSM
If anyone want’s to dig deep into these numbers and all KSM/DOT community members do on a daily rate, we can update the wiki or create a new section that needs community input. But running 24 different database sizing options is would be in my opinion again a waste of KSM public funds over it's value add
Hey StakerSpace,
Thank you for taking the time to respond in light of voting against as you phrased it,
overwhelming community support
.Point 1:
The scope of this project not only checks ping times but other metrics associated with performance and compliance. Examples of this would be execution times, connection saturation, metadata and block depth checks. There's more than just being able to connect, it is how well the service performs and does it meet the requirements of the "unwritten" SLA.
I stand to be corrected but please also note that the allocation granted to the IBP for this task is $75,000 with another $254,000 for other development. The scope of the project I am presenting is greater than basic monitoring, it is imo more holistic and comes in at a lower cost. This is in no way showing a lack of appreciation for what the IBP is trying to do and many of the members support this proposal as an alternative check.
The consensus of one is never a good thing, as we've seen with ref 175.
I am presenting a solution based on my experience both evaluating, using and hosting said services. From a development standpoint we need to rely on a standard for these nodes, we can't have pruned nodes in operation and we need some conformity when it comes to metadata. Quality of service matters and this is not only ping related.
Point 2
To my knowledge and as-was the default for snapshots, validators utilize pruning of the last 1000 states. In an effort to reduce on the number of configurations I omitted 256 state pruning as it is rarely used in practice. I am actually shocked that you use otherwise. Per my previous experience and a show of how this analysis helps, the size different between state pruning of 1000 and 256 is negligible.
I also challenge you with regards to the growth of even pruned databases, as we have seen in the past this growth can and has increased to almost 450GB+ per instance. This 'issue' was transitory but caused problems for some resulting in chilling. This is another one of those 'from experience' points that as I validator I hoped that you would understand.
I am a contributor to the Wiki myself and a person who lends support to the new entrants. The more metrics that can be extracted from real-time data helps keep the wiki relevant. We need to think smart and we have seen the benefits of this in other areas like extractions for chain constants.
Point 3
This is somewhat misrepresenting the figures. 2700 KSM is not allocated to restoration times alone, 2700 KSM encompasses the entire scope including points that you've raised in 1 and 2. I have also utilized one of your references to demonstrate that there are larger budgets allocated to smaller scoped projects.
Speaking directly to the point, monitoring of restoration times is useful especially when your time for reacting without consequence is < 1hr. We can not run on the assumption that the restoration time is constant and only identify deviation at the last moment. That is not good IT.
Last point:
This focuses on a part of the proposal which I am now funding a significant portion of the cost myself. I have expressed the value of having real-time updates to the Wiki, previous cases in which things did not go right and DB sizes increased beyond 'norm' by significant amounts.
Overall your feedback has demonstrated that I need to explain the technical aspects more, even to those who I may have assumed to have technical knowledge. As I mentioned before, I am a regular user, developer, provider and curator so I do have an appreciation for the need from different facets. I'll try to improve on this.
Thank you for your time, I appreciate the opportunity to respond.