Updated the Windows PC encoding fix with thanks to CG and Jon R

2024-10-29 21:36:29 -04:00
parent 86763f2fcb
commit 284336b3b9
6 changed files with 37 additions and 13 deletions
--- a/week5/day2.ipynb
+++ b/week5/day2.ipynb
@@ -80,8 +80,10 @@
    "\n",
    "folders = glob.glob(\"knowledge-base/*\")\n",
    "\n",
-    "# With thanks to Jon R, a student on the course, for this fix needed for some users \n",
-    "text_loader_kwargs={'autodetect_encoding': True}\n",
+    "# With thanks to CG and Jon R, students on the course, for this fix needed for some users \n",
+    "text_loader_kwargs = {'encoding': 'utf-8'}\n",
+    "# If that doesn't work, some Windows users might need to uncomment the next line instead\n",
+    "# text_loader_kwargs={'autodetect_encoding': True}\n",
    "\n",
    "documents = []\n",
    "for folder in folders:\n",
--- a/week5/day3.ipynb
+++ b/week5/day3.ipynb
@@ -86,8 +86,10 @@
    "\n",
    "folders = glob.glob(\"knowledge-base/*\")\n",
    "\n",
-    "# With thanks to Jon R, a student on the course, for this fix needed for some users \n",
-    "text_loader_kwargs={'autodetect_encoding': True}\n",
+    "# With thanks to CG and Jon R, students on the course, for this fix needed for some users \n",
+    "text_loader_kwargs = {'encoding': 'utf-8'}\n",
+    "# If that doesn't work, some Windows users might need to uncomment the next line instead\n",
+    "# text_loader_kwargs={'autodetect_encoding': True}\n",
    "\n",
    "documents = []\n",
    "for folder in folders:\n",
@@ -145,7 +147,11 @@
    "This model is an example of an \"Auto-Encoding LLM\" which generates an output given a complete input.\n",
    "It's different to all the other LLMs we've discussed today, which are known as \"Auto-Regressive LLMs\", and generate future tokens based only on past context.\n",
    "\n",
-    "Another example of an Auto-Encoding LLMs is BERT from Google. In addition to embedding, Auto-encoding LLMs are often used for classification."
+    "Another example of an Auto-Encoding LLMs is BERT from Google. In addition to embedding, Auto-encoding LLMs are often used for classification.\n",
+    "\n",
+    "### Sidenote\n",
+    "\n",
+    "In week 8 we will return to RAG and vector embeddings, and we will use an open-source vector encoder so that the data never leaves our computer - that's an important consideration when building enterprise systems and the data needs to remain internal."
   ]
  },
  {
--- a/week5/day4.5.ipynb
+++ b/week5/day4.5.ipynb
@@ -87,8 +87,10 @@
    "\n",
    "folders = glob.glob(\"knowledge-base/*\")\n",
    "\n",
-    "# With thanks to Jon R, a student on the course, for this fix needed for some users \n",
-    "text_loader_kwargs={'autodetect_encoding': True}\n",
+    "# With thanks to CG and Jon R, students on the course, for this fix needed for some users \n",
+    "text_loader_kwargs = {'encoding': 'utf-8'}\n",
+    "# If that doesn't work, some Windows users might need to uncomment the next line instead\n",
+    "# text_loader_kwargs={'autodetect_encoding': True}\n",
    "\n",
    "documents = []\n",
    "for folder in folders:\n",
@@ -148,7 +150,9 @@
    "\n",
    "Another example of an Auto-Encoding LLMs is BERT from Google. In addition to embedding, Auto-encoding LLMs are often used for classification.\n",
    "\n",
-    "More details in the resources."
+    "### Sidenote\n",
+    "\n",
+    "In week 8 we will return to RAG and vector embeddings, and we will use an open-source vector encoder so that the data never leaves our computer - that's an important consideration when building enterprise systems and the data needs to remain internal."
   ]
  },
  {
--- a/week5/day4.ipynb
+++ b/week5/day4.ipynb
@@ -88,8 +88,10 @@
    "\n",
    "folders = glob.glob(\"knowledge-base/*\")\n",
    "\n",
-    "# With thanks to Jon R, a student on the course, for this fix needed for some users \n",
-    "text_loader_kwargs={'autodetect_encoding': True}\n",
+    "# With thanks to CG and Jon R, students on the course, for this fix needed for some users \n",
+    "text_loader_kwargs = {'encoding': 'utf-8'}\n",
+    "# If that doesn't work, some Windows users might need to uncomment the next line instead\n",
+    "# text_loader_kwargs={'autodetect_encoding': True}\n",
    "\n",
    "documents = []\n",
    "for folder in folders:\n",
--- a/week5/day5.ipynb
+++ b/week5/day5.ipynb
@@ -95,8 +95,10 @@
    "    doc.metadata[\"doc_type\"] = doc_type\n",
    "    return doc\n",
    "\n",
-    "# With thanks to Jon R, a student on the course, for this fix needed for some users \n",
-    "text_loader_kwargs={'autodetect_encoding': True}\n",
+    "# With thanks to CG and Jon R, students on the course, for this fix needed for some users \n",
+    "text_loader_kwargs = {'encoding': 'utf-8'}\n",
+    "# If that doesn't work, some Windows users might need to uncomment the next line instead\n",
+    "# text_loader_kwargs={'autodetect_encoding': True}\n",
    "\n",
    "documents = []\n",
    "for folder in folders:\n",