features
Ai Data Registry

AI Data Registry

AI Data Standard


  • Develop a standard for data sources (websites? APIs? Whoever it be?) to expose the data for crawling for AI. (There is a need for real-time retrieval of actual data, on any matter. Would be nice if there is a standard way to obtain this data and a registry of resources that make it available).

https://github.com/BerriAI/litellm (opens in a new tab) ?

AI Data Registry

  • a registry of actual (real-time) data for important topics, e.g. best tools for a given task (model, framework, etc.), best approach for smth, newest research on a topic, etc.

  • Should I create such a registry? Is there such already in the market?

  • Consider creation of local registries (sets of links to certain sites, relevant to a given domain, that are to be crawled / made conclusions about, e.g. are there new tools for a task / domain, are there new research, etc.)

  • Ideally all such sites / data sources should regularly (daily) run AI against their data to define such latest interesting stuff and expose it for consumption (including human readable way and APIs)

  • https://analyticsindiamag.com/ai-insights-analysis/rip-rag-rig-is-here/ (opens in a new tab)

  • For example, it might be a good idea to develop a page in Enterprise-helper for possible partnerships / counterparties, that might in any way be of interest. E.g. endowments, sponsorships, grants, bug-hunts, competitions, etc.

  • Would be nice to have open-source data available for training LMs without limitations classified by domains (e.g. business domain > subcategories), & to be able to easily search for such data