Treasury Proposal - Polkadot Search indexer and RPC end-points
Good day,
We are writing this proposal with regard to some of the experiences that we are getting as a user and/or a validator within the eco-system. We do understand that a lot of help is being proposed by the community members, but at the same time, we have to do our own research in regard to information/news that we are looking forward to.
What we want to help the community is to address a typical behavior of Web2 and the nature of our internet environment. IE: INFORMATION is "ALL Over The Place" from either Blogs, video site, other blogs and blogs and discussion website and sometime, social media.
While that the above information is "ALL over the place", we want to help index it and provide a much better version of playbook for the community. We are presenting you polkadotspace.io a search indexer based on Polkadot and Kusama. The solution can be expand to other topics or parachains, but we are starting now on this. In the meantime, while we are working toward a search engine, we are also offering RPC and Archive nodes for public use with the flexibility scale and additional georedundancy.
Our current Full Proposal can be found below.
https://docs.google.com/document/d/19LXxklsulLc87ffkwHPzfgfQwB86efytHl2SoatRXnU
We welcome feedback and improvement on the proposal, as at the end of day, the community will be part of us, will also be part of the scalable solution and the enrichment of the data and recommendations. We also want to offer transparency to the proposal. Afterall, this is supported and funded by the community.
Comments (4)
The content indexer would be a welcome addition to this ecosystem. 👏🏿
However, I am wondering how content is going to be screened in your context.
TL;DR: I think there should be a mechanism for manually reviewing/removing unsuitable/inappropriate content from the indexer, BEFORE visitors see them. Of course, this would mean that you hire somebodies who are going to do these manual checks on a daily basis.
What do you say? 👀
Short Answers: Devils are in the details, i will add these details into the proposal. Thanks for pointing this out. :)
Long Answers:
Part #1 - Top articles and relevancy.
In My Opinion, there is no “TOP X articles” as part of any ecosystem. I am not the authority of the system, who am I to say, this is Top#1 xyz. I will not use that terminology in our indexer. What we will do is “Relevancy by keywords or recommendations”.
The key usage of relevancy is going to be based on several factors:
The representation of the relevancy will be based on a percentage which will be presented by the above factors in certain mechanisms. For the purpose of this proposal, we will open the floor for discussion based on the following ratio: 20: 50: 30 as a “general concept”.
We do understand that, at the beginning of the project, the recommendations are not available and the view times will be based on “availability of data”. Our key focus is not about the TOP 10 or the TOP X of a specific topic but to categorize the articles in such a way that a “Librarian” can use. Our focus is to stipulate the needs of these articles based on the goal of the searcher. The searcher is not necessarily looking for the hottest article at the moment or all-time. They are focusing on finding info to meet their thirst for learning.
Moreover, there will be duplicate articles along the internet. Google cannot prevent it today, I am going to “try” my best to indicate duplication given the initial budget.
Part #2 - Phishing
Great point, there is overly enough information and videos that are fake on the Internet. Our initial thought process is the following and WELCOME improvement or feedback.
We are not yet the “google of polkadot”, the algorithm behind it is enormous - costs zillions of brain cells. However, the following things will be implemented to prevent phishing. However, I haven’t “seen” one reputable article with an On-chain ID who is trying to phish people.
You are absolutely correct that there is going to be a need for “manual review” process. There is no AI presently today capable of not requiring manual intervention. It is currently planned as part of the scope to have monthly data review operation cost. Unfortunately, we are not able to estimate the “outcome of that requirement” presently. When desired, with a known process, we can deliver this function to a 10$/hour help desk to lower the cost, until then we will need to pick this up on-shore which is a higher cost.
Hope that my answers will satisfy you.
This would be great for people looking for information. An Excellent idea to simplifying how to obtain knowledge
thx you for your support!