LiteParse v2.0 is out now, and it is blazing fast + runs everywhere!
We rewrote everything from scratch in Rust, and now:
- up to 100x faster parsing
- install natively in Rust, JS/TS, and Python
- a custom WASM package enables browser and edge runtime usage
pip install liteparse
npm i @llamaindex/liteparse
npm i @llamaindex/liteparse-wasm
cargo install liteparse
The key use cases for building LLM apps over your data consist of question-answering, conversational chat, workflow automation with agents, and structured data extraction.
Learn about these use cases at a high-level before diving into materials.
Section 2: Building an LLM Application
Learn *all* the steps towards building an initial LLM app. This includes the LLM modules, to data loading/indexing/storage.
This also includes putting it together and setting up observability/evals.
We’re excited to release full native support for THREE @huggingface embedding models (s/o @LoganMarkewich):
🧱 Base @huggingface embeddings wrapper
🧑🏫 Instructor embeddings
⚡️ Optimum embeddings (ONNX format)
Instructor embeddings are unified models that have undergone instruction tuning on a ton of tasks (classification, retrieval, etc.). Therefore they can be adapted simply via task instruction, no fine-tuning!
We now have the most comprehensive cookbook on building LLMs with Knowledge Graphs (credits @wey_gu).
✅ Key query techniques: text2cypher, graph RAG
✅ Automated KG construction
✅ vector db RAG vs. KG RAG
Check out the full 1.5 hour tutorial:
The full Colab notebook is here:
There was so much content beyond the live webinar that we recorded a part 2 🔥
With one line of code, you can now seamlessly integrate @llama_index with rich observability/eval tools offered by our partners (@weights_biases, @arizeai, @truera_ai).
Tip for better RAG systems💡: don’t just store raw text chunks, augment them with structured data.
✅Enables metadata filtering
✅Helps bias embeddings
Here’s a guide on how to use the @huggingface span-marker to extract entities for this exact purpose📕: https://t.co/Gwwoeu3i9Hgpt-index.readthedocs.io/en/latest/exam…
In this example, we parse the 2023 IPPC Climate Report.
After text parsing to break the document into chunks, we use the span-marker extractor to extract relevant entities.
These entities can be used as metadata filters (in a vector db) or to help enhance the context embeddings.
In this guide, we do the latter. Adding/embedding the right metadata directly improves the generated answer (left), vs. without (right)