Index of Explored Large Language Models

31 Jan 24 22:29 CST Technology Sp-Cy

This is just a reference note for the locally installed LLMs that I have been playing with in order to learn how to use generative AI for my use cases.

Hermes-Trismegistus-Mistral-7B-GGUF

Transcendence is All You Need! Mistral Trismegistus is a model made for people interested in the esoteric, occult, and spiritual. Trismegistus evolved, trained over Hermes 2.5, the model performs far better in all tasks, including esoteric tasks!

The change between Mistral-Trismegistus and Hermes-Trismegistus is that this version trained over hermes 2.5 instead of the base mistral model, this means it is full of task capabilities that it Trismegistus can utilize for all esoteric and occult tasks, and performs them far better than ever before.

Dataset: This model was trained on a 100% synthetic, gpt-4 generated dataset, about ~10,000 examples, on a wide and diverse set of both tasks and knowledge about the esoteric, occult, and spiritual.

CodeLlama-7B-Instruct-GGUF

Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. This is the repository for the 7B instruct-tuned version in the Hugging Face Transformers format. This model is designed for general code synthesis and understanding. Links to other models can be found in the index at the bottom.

Model Details

*Note: Use of this model is governed by the Meta license. Meta developed and publicly released the Code Llama family of large language models (LLMs).

Model Developers Meta

Variations Code Llama comes in three model sizes, and three variants:

Code Llama: base models designed for general code synthesis and understanding
Code Llama - Python: designed specifically for Python
Code Llama - Instruct: for instruction following and safer deployment

Intended Use

Intended Use Cases Code Llama and its variants is intended for commercial and research use in English and relevant programming languages. The base model Code Llama can be adapted for a variety of code synthesis and understanding tasks, Code Llama - Python is designed specifically to handle the Python programming language, and Code Llama - Instruct is intended to be safer to use for code assistant and generation applications.

Out-of-Scope Uses Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English. Use in any other way that is prohibited by the Acceptable Use Policy and Licensing Agreement for Code Llama and its variants.

Training Data

All experiments reported here and the released models have been trained and fine-tuned using the same data as Llama 2 with different weights (see Section 2 and Table 1 in the research paper for details).

deepseek-coder-6.7B-instruct-GGUF

1. Introduction of Deepseek Coder

Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on project-level code corpus by employing a window size of 16K and a extra fill-in-the-blank task, to support project-level code completion and infilling. For coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks.

Massive Training Data: Trained from scratch fon 2T tokens, including 87% code and 13% linguistic data in both English and Chinese languages.
Highly Flexible & Scalable: Offered in model sizes of 1.3B, 5.7B, 6.7B, and 33B, enabling users to choose the setup most suitable for their requirements.
Superior Model Performance: State-of-the-art performance among publicly available code models on HumanEval, MultiPL-E, MBPP, DS-1000, and APPS benchmarks.
Advanced Code Completion Capabilities: A window size of 16K and a fill-in-the-blank task, supporting project-level code completion and infilling tasks.

2. Model Summary

deepseek-coder-6.7b-instruct is a 6.7B parameter model initialized from deepseek-coder-6.7b-base and fine-tuned on 2B tokens of instruction data.

Home Page: DeepSeek
Repository: deepseek-ai/deepseek-coder
Chat With DeepSeek Coder: DeepSeek-Coder

3. How to Use

Here give some examples of how to use our model.

Chat Model Inference

from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True).cuda() messages=[ { 'role': 'user', 'content': "write a quick sort algorithm in python."} ] inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device) # 32021 is the id of <|EOT|> token outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=32021) print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))

4. License

This code repository is licensed under the MIT License. The use of DeepSeek Coder models is subject to the Model License. DeepSeek Coder supports commercial use.

See the LICENSE-MODEL for more details.

phi-2-GGUF

Model Summary

Phi-2 is a Transformer with 2.7 billion parameters. It was trained using the same data sources as Phi-1.5, augmented with a new data source that consists of various NLP synthetic texts and filtered websites (for safety and educational value). When assessed against benchmarks testing common sense, language understanding, and logical reasoning, Phi-2 showcased a nearly state-of-the-art performance among models with less than 13 billion parameters.

Our model hasn’t been fine-tuned through reinforcement learning from human feedback. The intention behind crafting this open-source model is to provide the research community with a non-restricted small model to explore vital safety challenges, such as reducing toxicity, understanding societal biases, enhancing controllability, and more.

Intended Uses

Given the nature of the training data, the Phi-2 model is best suited for prompts using the QA format, the chat format, and the code format.

QA Format:

You can provide the prompt as a standalone question as follows:

Write a detailed analogy between mathematics and a lighthouse.

where the model generates the text after “.” . To encourage the model to write more concise answers, you can also try the following QA format using “Instruct: \nOutput:”

Instruct: Write a detailed analogy between mathematics and a lighthouse. Output: Mathematics is like a lighthouse. Just as a lighthouse guides ships safely to shore, mathematics provides a guiding light in the world of numbers and logic. It helps us navigate through complex problems and find solutions. Just as a lighthouse emits a steady beam of light, mathematics provides a consistent framework for reasoning and problem-solving. It illuminates the path to understanding and helps us make sense of the world around us.

where the model generates the text after “Output:”.

Chat Format

Alice: I don't know why, I'm struggling to maintain focus while studying. Any suggestions? Bob: Well, have you tried creating a study schedule and sticking to it? Alice: Yes, I have, but it doesn't seem to help much. Bob: Hmm, maybe you should try studying in a quiet environment, like the library. Alice: ...

where the model generates the text after the first “Bob:”.

Code Format

def print_prime(n): """ Print all primes between 1 and n """ primes = [] for num in range(2, n+1): is_prime = True for i in range(2, int(math.sqrt(num))+1): if num % i == 0: is_prime = False break if is_prime: primes.append(num) print(primes)

where the model generates the text after the comments.

Notes:

Phi-2 is intended for QA, chat, and code purposes. The model-generated text/code should be treated as a starting point rather than a definitive solution for potential use cases. Users should be cautious when employing these models in their applications.
Direct adoption for production tasks without evaluation is out of scope of this project. As a result, the Phi-2 model has not been tested to ensure that it performs adequately for any production-level application. Please refer to the limitation sections of this document for more details.
If you are using transformers<4.37.0, always load the model with trust_remote_code=True to prevent side-effects.

Sample Code

import torch from transformers import AutoModelForCausalLM, AutoTokenizer torch.set_default_device("cuda") model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", trust_remote_code=True) inputs = tokenizer('''def print_prime(n): """ Print all primes between 1 and n """''', return_tensors="pt", return_attention_mask=False) outputs = model.generate(**inputs, max_length=200) text = tokenizer.batch_decode(outputs)[0] print(text)

Limitations of Phi-2

Generate Inaccurate Code and Facts: The model may produce incorrect code snippets and statements. Users should treat these outputs as suggestions or starting points, not as definitive or accurate solutions.
Limited Scope for code: Majority of Phi-2 training data is based in Python and use common packages such as “typing, math, random, collections, datetime, itertools”. If the model generates Python scripts that utilize other packages or scripts in other languages, we strongly recommend users manually verify all API uses.
Unreliable Responses to Instruction: The model has not undergone instruction fine-tuning. As a result, it may struggle or fail to adhere to intricate or nuanced instructions provided by users.
Language Limitations: The model is primarily designed to understand standard English. Informal English, slang, or any other languages might pose challenges to its comprehension, leading to potential misinterpretations or errors in response.
Potential Societal Biases: Phi-2 is not entirely free from societal biases despite efforts in assuring training data safety. There’s a possibility it may generate content that mirrors these societal biases, particularly if prompted or instructed to do so. We urge users to be aware of this and to exercise caution and critical thinking when interpreting model outputs.
Toxicity: Despite being trained with carefully selected data, the model can still produce harmful content if explicitly prompted or instructed to do so. We chose to release the model to help the open-source community develop the most effective ways to reduce the toxicity of a model directly after pretraining.
Verbosity: Phi-2 being a base model often produces irrelevant or extra text and responses following its first answer to user prompts within a single turn. This is due to its training dataset being primarily textbooks, which results in textbook-like responses.

Model

Architecture: a Transformer-based model with next-word prediction objective
Context length: 2048 tokens
Dataset size: 250B tokens, combination of NLP synthetic data created by AOAI GPT-3.5 and filtered web data from Falcon RefinedWeb and SlimPajama, which was assessed by AOAI GPT-4.
Training tokens: 1.4T tokens
GPUs: 96xA100-80G
Training time: 14 days

meditron-7B-chat-GGUF

meditron-7b-chat is a finetuned version of epfl-llm/meditron-7b using SFT Training on the Alpaca Dataset. This model can answer information about different excplicit ideas in medicine.

Meditron is a suite of open-source medical Large Language Models (LLMs). Meditron-7B is a 7 billion parameters model adapted to the medical domain from Llama-2-7B through continued pretraining on a comprehensively curated medical corpus, including selected PubMed articles, abstracts, a new dataset of internationally-recognized medical guidelines, and general domain data from RedPajama-v1. Meditron-7B, finetuned on relevant training data, outperforms Llama-2-7B and PMC-Llama on multiple medical reasoning tasks.

Advisory Notice

While Meditron is designed to encode medical knowledge from sources of high-quality evidence, it is not yet adapted to deliver this knowledge appropriately, safely, or within professional actionable constraints. We recommend against deploying Meditron in medical applications without extensive use-case alignment, as well as additional testing, specifically including randomized controlled trials in real-world practice settings.

Uses

Meditron-7B is being made available for further testing and assessment as an AI assistant to enhance clinical decision-making and enhance access to an LLM for healthcare use. Potential use cases may include but are not limited to:

Medical exam question answering
Supporting differential diagnosis
Disease information (symptoms, cause, treatment) query
General health information query

Direct Use

It is possible to use this model to generate text, which is useful for experimentation and understanding its capabilities. It should not be used directly for production or work that may impact people.

Dr_Samantha-7B-GGUF

Overview

Dr. Samantha is a language model made by merging Severus27/BeingWell_llama2_7b and ParthasarathyShanmugam/llama-2-7b-samantha using mergekit.

Has capabilities of a medical knowledge-focused model (trained on USMLE databases and doctor-patient interactions) with the philosophical, psychological, and relational understanding of the Samantha-7b model.

As both a medical consultant and personal counselor, Dr.Samantha could effectively support both physical and mental wellbeing - important for whole-person care.

OpenLLM Leaderboard Performance

T	Model	Average	ARC	Hellaswag	MMLU	TruthfulQA	Winogrande	GSM8K
1	sethuiyer/Dr_Samantha-7b	52.95	53.84	77.95	47.94	45.58	73.56	18.8
2	togethercomputer/LLaMA-2-7B-32K-Instruct	50.02	51.11	78.51	46.11	44.86	73.88	5.69
3	togethercomputer/LLaMA-2-7B-32K	47.07	47.53	76.14	43.33	39.23	71.9	4.32

Subject-wise Accuracy

Subject	Accuracy (%)
Clinical Knowledge	52.83
Medical Genetics	49.00
Human Aging	58.29
Human Sexuality	55.73
College Medicine	38.73
Anatomy	41.48
College Biology	52.08
College Medicine	38.73
High School Biology	53.23
Professional Medicine	38.73
Nutrition	50.33
Professional Psychology	46.57
Virology	41.57
High School Psychology	66.60
Average	48.85%

Pros

Demonstrates extensive medical knowledge through accurate identification of potential causes for various symptoms.
Responses consistently emphasize the importance of seeking professional diagnoses and treatments.
Advice to consult specialists for certain concerns is well-reasoned.
Practical interim measures provided for symptom management in several cases.
Consistent display of empathy, support, and reassurance for patients’ well-being.
Clear and understandable explanations of conditions and treatment options.
Prompt responses addressing all aspects of medical inquiries.

Cons

Could occasionally place stronger emphasis on urgency when symptoms indicate potential emergencies.
Discussion of differential diagnoses could explore a broader range of less common causes.
Details around less common symptoms and their implications need more depth at times.
Opportunities exist to gather clarifying details on symptom histories through follow-up questions.
Consider exploring full medical histories to improve diagnostic context where relevant.
Caution levels and risk factors associated with certain conditions could be underscored more.

dolphin-2.7-mixtral-8x7b-GGUF

This model is based on Mixtral-8x7b

The base model has 32k context, I finetuned it with 16k.

This Dolphin is really good at coding, trained with a lot of coding data. It is very obedient but it is not DPO tuned - so you still might need to encourage it in the system prompt as I show in the below examples.