AI Data Registry
AI Data Standard
- a standard for data sources (websites? DBs? APIs? etc.) to expose the data for crawling for AI.
- https://github.com/BerriAI/litellm (opens in a new tab) ? Call all LLM APIs using the OpenAI format [Bedrock, Huggingface, VertexAI, TogetherAI, Azure, OpenAI, Groq etc.]
- https://docs.litellm.ai/ (opens in a new tab)
- Develop a standard for data sources (websites? APIs? Whoever it be?) to expose the data for crawling for AI. (There is a need for real-time retrieval of actual data, on any matter. Would be nice if there is a standard way to obtain this data and a registry of resources that make it available).
https://github.com/BerriAI/litellm (opens in a new tab) ?
AI Data Registry
- a registry of actual (real-time) data for important topics, e.g. best tools for a given task (model, framework, etc.), best approach for smth, newest research on a topic, etc.
-
Should I create such a registry? Is there such already in the market?
-
Consider creation of local registries (sets of links to certain sites, relevant to a given domain, that are to be crawled / made conclusions about, e.g. are there new tools for a task / domain, are there new research, etc.)
-
Ideally all such sites / data sources should regularly (daily) run AI against their data to define such latest interesting stuff and expose it for consumption (including human readable way and APIs)
-
https://analyticsindiamag.com/ai-insights-analysis/rip-rag-rig-is-here/ (opens in a new tab)
-
For example, it might be a good idea to develop a page in Enterprise-helper for possible partnerships / counterparties, that might in any way be of interest. E.g. endowments, sponsorships, grants, bug-hunts, competitions, etc.
-
Would be nice to have open-source data available for training LMs without limitations classified by domains (e.g. business domain > subcategories), & to be able to easily search for such data