Updated the Windows PC encoding fix with thanks to CG and Jon R
This commit is contained in:
@@ -80,8 +80,10 @@
|
||||
"\n",
|
||||
"folders = glob.glob(\"knowledge-base/*\")\n",
|
||||
"\n",
|
||||
"# With thanks to Jon R, a student on the course, for this fix needed for some users \n",
|
||||
"text_loader_kwargs={'autodetect_encoding': True}\n",
|
||||
"# With thanks to CG and Jon R, students on the course, for this fix needed for some users \n",
|
||||
"text_loader_kwargs = {'encoding': 'utf-8'}\n",
|
||||
"# If that doesn't work, some Windows users might need to uncomment the next line instead\n",
|
||||
"# text_loader_kwargs={'autodetect_encoding': True}\n",
|
||||
"\n",
|
||||
"documents = []\n",
|
||||
"for folder in folders:\n",
|
||||
|
||||
@@ -86,8 +86,10 @@
|
||||
"\n",
|
||||
"folders = glob.glob(\"knowledge-base/*\")\n",
|
||||
"\n",
|
||||
"# With thanks to Jon R, a student on the course, for this fix needed for some users \n",
|
||||
"text_loader_kwargs={'autodetect_encoding': True}\n",
|
||||
"# With thanks to CG and Jon R, students on the course, for this fix needed for some users \n",
|
||||
"text_loader_kwargs = {'encoding': 'utf-8'}\n",
|
||||
"# If that doesn't work, some Windows users might need to uncomment the next line instead\n",
|
||||
"# text_loader_kwargs={'autodetect_encoding': True}\n",
|
||||
"\n",
|
||||
"documents = []\n",
|
||||
"for folder in folders:\n",
|
||||
@@ -145,7 +147,11 @@
|
||||
"This model is an example of an \"Auto-Encoding LLM\" which generates an output given a complete input.\n",
|
||||
"It's different to all the other LLMs we've discussed today, which are known as \"Auto-Regressive LLMs\", and generate future tokens based only on past context.\n",
|
||||
"\n",
|
||||
"Another example of an Auto-Encoding LLMs is BERT from Google. In addition to embedding, Auto-encoding LLMs are often used for classification."
|
||||
"Another example of an Auto-Encoding LLMs is BERT from Google. In addition to embedding, Auto-encoding LLMs are often used for classification.\n",
|
||||
"\n",
|
||||
"### Sidenote\n",
|
||||
"\n",
|
||||
"In week 8 we will return to RAG and vector embeddings, and we will use an open-source vector encoder so that the data never leaves our computer - that's an important consideration when building enterprise systems and the data needs to remain internal."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -87,8 +87,10 @@
|
||||
"\n",
|
||||
"folders = glob.glob(\"knowledge-base/*\")\n",
|
||||
"\n",
|
||||
"# With thanks to Jon R, a student on the course, for this fix needed for some users \n",
|
||||
"text_loader_kwargs={'autodetect_encoding': True}\n",
|
||||
"# With thanks to CG and Jon R, students on the course, for this fix needed for some users \n",
|
||||
"text_loader_kwargs = {'encoding': 'utf-8'}\n",
|
||||
"# If that doesn't work, some Windows users might need to uncomment the next line instead\n",
|
||||
"# text_loader_kwargs={'autodetect_encoding': True}\n",
|
||||
"\n",
|
||||
"documents = []\n",
|
||||
"for folder in folders:\n",
|
||||
@@ -148,7 +150,9 @@
|
||||
"\n",
|
||||
"Another example of an Auto-Encoding LLMs is BERT from Google. In addition to embedding, Auto-encoding LLMs are often used for classification.\n",
|
||||
"\n",
|
||||
"More details in the resources."
|
||||
"### Sidenote\n",
|
||||
"\n",
|
||||
"In week 8 we will return to RAG and vector embeddings, and we will use an open-source vector encoder so that the data never leaves our computer - that's an important consideration when building enterprise systems and the data needs to remain internal."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -88,8 +88,10 @@
|
||||
"\n",
|
||||
"folders = glob.glob(\"knowledge-base/*\")\n",
|
||||
"\n",
|
||||
"# With thanks to Jon R, a student on the course, for this fix needed for some users \n",
|
||||
"text_loader_kwargs={'autodetect_encoding': True}\n",
|
||||
"# With thanks to CG and Jon R, students on the course, for this fix needed for some users \n",
|
||||
"text_loader_kwargs = {'encoding': 'utf-8'}\n",
|
||||
"# If that doesn't work, some Windows users might need to uncomment the next line instead\n",
|
||||
"# text_loader_kwargs={'autodetect_encoding': True}\n",
|
||||
"\n",
|
||||
"documents = []\n",
|
||||
"for folder in folders:\n",
|
||||
|
||||
@@ -95,8 +95,10 @@
|
||||
" doc.metadata[\"doc_type\"] = doc_type\n",
|
||||
" return doc\n",
|
||||
"\n",
|
||||
"# With thanks to Jon R, a student on the course, for this fix needed for some users \n",
|
||||
"text_loader_kwargs={'autodetect_encoding': True}\n",
|
||||
"# With thanks to CG and Jon R, students on the course, for this fix needed for some users \n",
|
||||
"text_loader_kwargs = {'encoding': 'utf-8'}\n",
|
||||
"# If that doesn't work, some Windows users might need to uncomment the next line instead\n",
|
||||
"# text_loader_kwargs={'autodetect_encoding': True}\n",
|
||||
"\n",
|
||||
"documents = []\n",
|
||||
"for folder in folders:\n",
|
||||
|
||||
Reference in New Issue
Block a user