About AI – Teach@CUNY AI Toolkit

Flowchart Network Welcome to the About AI section of the Teach@CUNY AI Toolkit.

Understanding the concepts and principles of artificial intelligence (AI) is integral to AI literacy. The materials compiled below assume an FAQ format to support your working vocabulary of key terms in the field today. This is followed by visual supports meant to demystify generative AI models for more visual learners.

Artificial intelligence 

What is artificial intelligence?

AI is not a recent or looming development. Rather, from predictive policing to targeted advertising, it is a past and present reality that has imprinted itself on various aspects of our lives.

The term “Artificial Intelligence” was coined in the 1950s to encapsulate theories and efforts to create systems capable of performing tasks that typically require human intelligence. The goal was to engineer a computer capable of undertaking activities that were previously limited to the cognitive abilities of humans, be it chess strategy or affect recognition (Woolridge 2021).

Today, we still commonly and somewhat simplistically refer to artificial intelligence as the science of building computer systems capable of executing tasks that previously required the intellectual processes and knowledge work of human beings (Dobrin 2023).

Issues raised by AI systems in education not only concern classroom instruction but also encompass broader aspects of the educational ecosystem, such as student privacy and data rights, curriculum design and development, administrative processes, labor politics, and more.

What is training data?

Training data refers to the information that is used to teach an AI system how to perform certain tasks. Think of it as the curricular content from which AI systems learn to make certain predictions or decisions. If that curriculum is flawed or biased, AI will be as well.

Consider the strikingly biased outcomes produced by Amazon’s AI-driven recruitment tool. This system independently developed a preference for male candidates, and it lowered the rankings of resumes featuring terms like “women’s,” such as “women’s chess club captain,” or resumes from graduates of exclusively women’s colleges.

Now suppose a college or university employs an AI system to assist in the admissions process. That AI system would rely heavily on historical admissions data to predict which applicants are most likely to succeed at the institution. Biased historical data used by AI to admit or reject applicants would further disadvantage historically marginalized groups and communities.

Faculty and staff should consider these dangers in terms of CUNY’s historic mission to educate the children of the whole people, including those who have been disproportionately marginalized or underrepresented in higher education.

What is machine learning?

Machine learning is a subset of AI that gives computers the ability to learn and adapt from experience without being explicitly programmed. Think of it as similar to learning how to ride a bike. With practice and iteration, you get better at it. Machine learning enables a computer to refine its performance as it processes more and more data over time.

Georgia State University is cited as a success story in using machine learning to identify students at risk of dropping out and provide timely interventions. While the university’s efforts have led to gains in graduation rates, the jury’s still out on whether such institutional initiatives lead to the disproportionate flagging of students from underrepresented minority groups, or privacy issues related to its extensive surveillance and data monitoring practices.

What is a neural network?

A neural network is a series of algorithms that attempts to recognize underlying relationships in a set of data through a process modeled on how the human brain operates. Neural networks are designed to cluster and classify information. Think of them as a web of interconnected nodes that work together to understand complex patterns in data.

For game-based learning on neural networks, check out Quick, Draw! — an AI sketch game that trains on human input to infer figures sketched by users in real time.

Generative artificial intelligence 

What is generative artificial intelligence?

Generative artificial intelligence refers to a subset of AI technologies with the capability to generate diverse forms of content — such as alphabetic texts, visual art, and audio — by recognizing and reproducing various patterns in datasets.

When a user inputs a command asking for something specific — be it a written piece, a musical piece, a graphic, or a solution to a mathematical problem — generative AI begins by sifting through its database. It searches for recurring themes and pertinent information related to the requested item, then rearranges gathered information into a new sequence that it deems most fitting in response to the user’s initial request (Dobrin 2023).

What is AI text generation?

AI text generation is the process by which a computer program uses methods in artificial intelligence to produce written content autonomously. Using patterns and structures learned from vast databases of human language, today’s text generators can formulate coherent and contextual material ranging from simple advertisements to lengthy articles. The technology of AI text generation underlies the chatbots you interact with online along with the predictive text features on your smartphone or email service.

As AI text generation has become more prevalent, critical perspectives focus on its tendency to flatten language differences, dehumanize composing practices, and co-opt vast quantities of online writing and text communication. In particular, educators should take note of how these tools could lead to a narrowing of linguistic expression and a subsequent devaluing of the diverse perspectives and voices that CUNY students in particular bring to their written work.

What is AI image generation?

AI image generation involves the creation of visual media from scratch or existing images via machine learning models. These AI systems learn from vast collections of visual media and can generate novel images that may be difficult to distinguish from those created by humans.

This form of generative AI invites scrutiny around the ethics of representation, the dangers of ready-made deepfakes, and infringements on intellectual property rights across a range of visual media.

What are large language models?

The term “large language models” stems from the development process for AI; this involves training on extensive collections of textual data to predict and approximate human language patterns. Laquintano and colleagues explain that for creating text, these models are fed vast amounts of information, predominantly sourced from the internet, and are trained using advanced machine learning methods. Following their initial training, these models undergo additional refinement and enhancement through processes like “reinforcement learning from human feedback” (RLHF) to improve the quality and relevance of their outputs. This approach has significantly advanced the generative AI sector over the past decade, with a notable surge since the end of 2017. Such advancements have led to the emergence of ‘foundation models’ that are versatile at generating text, images, videos, or audio for a variety of applications.

Training large language models (LLMs) through RLHF raises considerable ethical and political concerns around the material conditions of these labor practices. These include inadequate compensation, poor working conditions for data labelers, and a lack of transparency and recognition for the essential role these workers play in the development and accuracy of LLMs.

A TIME report indicated that OpenAI outsourced tasks to Kenyan workers who were paid less than $2 per hour to help make ChatGPT less toxic. These workers reported being mentally scarred by the disturbing content they had to process, and the problematic work environment led to the early termination of the contract with Sama, the firm that employed these workers, in February 2022. These revelations highlight important ethical considerations in the AI development process, particularly regarding labor practices and the psychological wellbeing of those involved in content moderation and model training.

What does ChatGPT stand for and what does it mean?

ChatGPT stands for “Chat Generative Pre-trained Transformer” and denotes a specific type of AI technology designed to understand and predictively generate text in a conversational context.

“Generative” signifies its ability to create text, extrapolating from given prompts to produce new content.
“Pre-trained” indicates that the model has been initially trained on a vast array of text data, enabling it to understand language patterns.
“Transformer” describes the foundational neural network structure of the model, which excels in interpreting sequences of text and discerning the semantic connections between different textual elements.

Let’s reframe as building a sandcastle on the beach.

“Generative” signifies the ability for ChatGPT to turn the raw materials of sand into “new” structures that resemble sandcastles built by humans
“Pre-trained” involves the process by which ChatGPT recognizes and adopt patterns like how wet the sand should be or which molds work best on a humid or rainy day.
“Transformer” describes the framework of buckets, shovels, and techniques used by ChatGPT to cohere different sections of the sandcastle into a uniform whole.

Just as these elements work together to create a really cool sandcastle, so does ChatGPT’s combination of generative capabilities, pre-training practices, and transformer architecture allow it to produce contextual outputs in response to user input.

Visual Resources 

The Importance of Human Feedback

In OpenAI’s work on aligning language models to follow instructions with human feedback, the company identifies reinforcement learning from human feedback (RLHF) as a core technique used to train ChatGPT-X and InstructGPT models.

In their research post on the topic, OpenAI authors Ryan Lowe and Jan Leike include a three-step visualization to explain this technique, which “uses human preferences as a reward signal to fine-tune [their] models” in ways that “aren’t fully captured by simple automatic metrics.”

Shown below are those three steps:

Illustration of methods in reinforcement learning from human feedback (RLHF) — Methods used by OpenAI researchers to train InstructGPT through reinforcement learning from human feedback (RLHF). See the source here.

Generative AI Explained by Generative AI

In an article produced exclusively with ChatGPT and Midjourney, Nick Routley and Mark Belan turn questions of generative AI on the technology itself. Depicted below is the result of their experiment: an infographic that visualizes the nuts and bolts of generative AI. Check it out at your own pace, and don’t forget AI-generated content is always subject to error.

Infographic explaining generative ai created using generative ai tools — Infographic on the basics of generative AI, made exclusively with AI image and text generation tools (i.e. Midjourney and ChatGPT). See the source here.