Polkassembly - Treasury Proposal - Polkadot Search indexer and RPC end-points #1814

Notice: Polkadot has migrated to AssetHub. Balances, data, referenda, and other on-chain activity has moved to AssetHub.Learn more

View All Discussion

Treasury Proposal - Polkadot Search indexer and RPC end-points

lucasyoda

3 years ago

Good day,

We are writing this proposal with regard to some of the experiences that we are getting as a user and/or a validator within the eco-system. We do understand that a lot of help is being proposed by the community members, but at the same time, we have to do our own research in regard to information/news that we are looking forward to.

What we want to help the community is to address a typical behavior of Web2 and the nature of our internet environment. IE: INFORMATION is "ALL Over The Place" from either Blogs, video site, other blogs and blogs and discussion website and sometime, social media.

While that the above information is "ALL over the place", we want to help index it and provide a much better version of playbook for the community. We are presenting you polkadotspace.io a search indexer based on Polkadot and Kusama. The solution can be expand to other topics or parachains, but we are starting now on this. In the meantime, while we are working toward a search engine, we are also offering RPC and Archive nodes for public use with the flexibility scale and additional georedundancy.

Our current Full Proposal can be found below.
https://docs.google.com/document/d/19LXxklsulLc87ffkwHPzfgfQwB86efytHl2SoatRXnU

We welcome feedback and improvement on the proposal, as at the end of day, the community will be part of us, will also be part of the scalable solution and the enrichment of the data and recommendations. We also want to offer transparency to the proposal. Afterall, this is supported and funded by the community.

Comments (4)

3 years ago

The content indexer would be a welcome addition to this ecosystem. 👏🏿

However, I am wondering how content is going to be screened in your context.

There are plenty of "Top 10 blockchains for XYZ" sort of articles/videos published everyday, it wouldn't make sense to index them into the PolkadotSpace.
Opportunists who are looking for an audience/easy target often flood the internet with fake/bogus content to lure newbies into their phishing schemes. A search indexer like yours could potentially be used to these unlawful ends. 😩

TL;DR: I think there should be a mechanism for manually reviewing/removing unsuitable/inappropriate content from the indexer, BEFORE visitors see them. Of course, this would mean that you hire somebodies who are going to do these manual checks on a daily basis.

What do you say? 👀

3 years ago

Short Answers: Devils are in the details, i will add these details into the proposal. Thanks for pointing this out. :)

Long Answers:

Part #1 - Top articles and relevancy.

In My Opinion, there is no “TOP X articles” as part of any ecosystem. I am not the authority of the system, who am I to say, this is Top#1 xyz. I will not use that terminology in our indexer. What we will do is “Relevancy by keywords or recommendations”.

The key usage of relevancy is going to be based on several factors:

Keywords
Recommendation
View times

The representation of the relevancy will be based on a percentage which will be presented by the above factors in certain mechanisms. For the purpose of this proposal, we will open the floor for discussion based on the following ratio: 20: 50: 30 as a “general concept”.

We do understand that, at the beginning of the project, the recommendations are not available and the view times will be based on “availability of data”. Our key focus is not about the TOP 10 or the TOP X of a specific topic but to categorize the articles in such a way that a “Librarian” can use. Our focus is to stipulate the needs of these articles based on the goal of the searcher. The searcher is not necessarily looking for the hottest article at the moment or all-time. They are focusing on finding info to meet their thirst for learning.

Moreover, there will be duplicate articles along the internet. Google cannot prevent it today, I am going to “try” my best to indicate duplication given the initial budget.

Part #2 - Phishing

Great point, there is overly enough information and videos that are fake on the Internet. Our initial thought process is the following and WELCOME improvement or feedback.

We are not yet the “google of polkadot”, the algorithm behind it is enormous - costs zillions of brain cells. However, the following things will be implemented to prevent phishing. However, I haven’t “seen” one reputable article with an On-chain ID who is trying to phish people.

IP address restriction and revision/frequency will be in place.
Modification frequency on articles will also generate alarm.
As part of the source info, we are expecting the source data to be reputable sites who have their own mechanism of phishing prevention (no one is perfect).
Phishing data from Polkadot.js (https://polkadot.js.org/phishing/) will also be used.
As part of the submission/recommendation from Community, we are asking the community member to also make a call to maintain the data by being on-chain. Such an example, if Raul or Bill (sorry to pick your name) decides to confirm a such an article is a scam. The authority in name will help us to address the issue quicker.

You are absolutely correct that there is going to be a need for “manual review” process. There is no AI presently today capable of not requiring manual intervention. It is currently planned as part of the scope to have monthly data review operation cost. Unfortunately, we are not able to estimate the “outcome of that requirement” presently. When desired, with a known process, we can deliver this function to a 10$/hour help desk to lower the cost, until then we will need to pick this up on-shore which is a higher cost.

Hope that my answers will satisfy you.

jacobtwotwo

3 years ago

This would be great for people looking for information. An Excellent idea to simplifying how to obtain knowledge

3 years ago

thx you for your support!

PleaseLogin to comment