Updated the Windows PC encoding fix with thanks to CG and Jon R

This commit is contained in:
Edward Donner
2024-10-29 21:36:29 -04:00
parent 86763f2fcb
commit 284336b3b9
6 changed files with 37 additions and 13 deletions

View File

@@ -80,8 +80,10 @@
"\n",
"folders = glob.glob(\"knowledge-base/*\")\n",
"\n",
"# With thanks to Jon R, a student on the course, for this fix needed for some users \n",
"text_loader_kwargs={'autodetect_encoding': True}\n",
"# With thanks to CG and Jon R, students on the course, for this fix needed for some users \n",
"text_loader_kwargs = {'encoding': 'utf-8'}\n",
"# If that doesn't work, some Windows users might need to uncomment the next line instead\n",
"# text_loader_kwargs={'autodetect_encoding': True}\n",
"\n",
"documents = []\n",
"for folder in folders:\n",

View File

@@ -86,8 +86,10 @@
"\n",
"folders = glob.glob(\"knowledge-base/*\")\n",
"\n",
"# With thanks to Jon R, a student on the course, for this fix needed for some users \n",
"text_loader_kwargs={'autodetect_encoding': True}\n",
"# With thanks to CG and Jon R, students on the course, for this fix needed for some users \n",
"text_loader_kwargs = {'encoding': 'utf-8'}\n",
"# If that doesn't work, some Windows users might need to uncomment the next line instead\n",
"# text_loader_kwargs={'autodetect_encoding': True}\n",
"\n",
"documents = []\n",
"for folder in folders:\n",
@@ -145,7 +147,11 @@
"This model is an example of an \"Auto-Encoding LLM\" which generates an output given a complete input.\n",
"It's different to all the other LLMs we've discussed today, which are known as \"Auto-Regressive LLMs\", and generate future tokens based only on past context.\n",
"\n",
"Another example of an Auto-Encoding LLMs is BERT from Google. In addition to embedding, Auto-encoding LLMs are often used for classification."
"Another example of an Auto-Encoding LLMs is BERT from Google. In addition to embedding, Auto-encoding LLMs are often used for classification.\n",
"\n",
"### Sidenote\n",
"\n",
"In week 8 we will return to RAG and vector embeddings, and we will use an open-source vector encoder so that the data never leaves our computer - that's an important consideration when building enterprise systems and the data needs to remain internal."
]
},
{

View File

@@ -87,8 +87,10 @@
"\n",
"folders = glob.glob(\"knowledge-base/*\")\n",
"\n",
"# With thanks to Jon R, a student on the course, for this fix needed for some users \n",
"text_loader_kwargs={'autodetect_encoding': True}\n",
"# With thanks to CG and Jon R, students on the course, for this fix needed for some users \n",
"text_loader_kwargs = {'encoding': 'utf-8'}\n",
"# If that doesn't work, some Windows users might need to uncomment the next line instead\n",
"# text_loader_kwargs={'autodetect_encoding': True}\n",
"\n",
"documents = []\n",
"for folder in folders:\n",
@@ -148,7 +150,9 @@
"\n",
"Another example of an Auto-Encoding LLMs is BERT from Google. In addition to embedding, Auto-encoding LLMs are often used for classification.\n",
"\n",
"More details in the resources."
"### Sidenote\n",
"\n",
"In week 8 we will return to RAG and vector embeddings, and we will use an open-source vector encoder so that the data never leaves our computer - that's an important consideration when building enterprise systems and the data needs to remain internal."
]
},
{

View File

@@ -88,8 +88,10 @@
"\n",
"folders = glob.glob(\"knowledge-base/*\")\n",
"\n",
"# With thanks to Jon R, a student on the course, for this fix needed for some users \n",
"text_loader_kwargs={'autodetect_encoding': True}\n",
"# With thanks to CG and Jon R, students on the course, for this fix needed for some users \n",
"text_loader_kwargs = {'encoding': 'utf-8'}\n",
"# If that doesn't work, some Windows users might need to uncomment the next line instead\n",
"# text_loader_kwargs={'autodetect_encoding': True}\n",
"\n",
"documents = []\n",
"for folder in folders:\n",

View File

@@ -95,8 +95,10 @@
" doc.metadata[\"doc_type\"] = doc_type\n",
" return doc\n",
"\n",
"# With thanks to Jon R, a student on the course, for this fix needed for some users \n",
"text_loader_kwargs={'autodetect_encoding': True}\n",
"# With thanks to CG and Jon R, students on the course, for this fix needed for some users \n",
"text_loader_kwargs = {'encoding': 'utf-8'}\n",
"# If that doesn't work, some Windows users might need to uncomment the next line instead\n",
"# text_loader_kwargs={'autodetect_encoding': True}\n",
"\n",
"documents = []\n",
"for folder in folders:\n",