Creating tables, joining them, and selecting only certain columns from them in SQL and Python

I'll walk you through the process of selecting specific columns from a table using SQL and Python pandas.

First, let's create a sample table in SQL with some data. We'll create two tables, "Customers" and "Orders," and then perform an inner join on them.

-- Creating Customers table
CREATE TABLE Customers (
  CustomerID INT,
  CustomerName VARCHAR(255),
  City VARCHAR(255)
);

-- Inserting data into Customers table
INSERT INTO Customers (CustomerID, CustomerName, City)
VALUES (1, 'John', 'New York'),
       (2, 'Alice', 'London'),
       (3, 'Bob', 'Paris');

-- Creating Orders table
CREATE TABLE Orders (
  OrderID INT,
  CustomerID INT,
  ProductName VARCHAR(255),
  Quantity INT
);

-- Inserting data into Orders table
INSERT INTO Orders (OrderID, CustomerID, ProductName, Quantity)
VALUES (1, 1, 'Product A', 5),
       (2, 2, 'Product B', 10),
       (3, 3, 'Product C', 3);

Now, let's perform the inner join and select specific columns using SQL:

SELECT Customers.CustomerName, Orders.ProductName, Orders.Quantity
FROM Customers
INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID;

This SQL query selects the "CustomerName" column from the "Customers" table and the "ProductName" and "Quantity" columns from the "Orders" table, joining them on the "CustomerID" column.

Now, let's achieve the same result using Python pandas. First, we'll create the dataframes with the same data:

import pandas as pd

# Creating Customers dataframe
customers_data = {
    'CustomerID': [1, 2, 3],
    'CustomerName': ['John', 'Alice', 'Bob'],
    'City': ['New York', 'London', 'Paris']
}
customers_df = pd.DataFrame(customers_data)

# Creating Orders dataframe
orders_data = {
    'OrderID': [1, 2, 3],
    'CustomerID': [1, 2, 3],
    'ProductName': ['Product A', 'Product B', 'Product C'],
    'Quantity': [5, 10, 3]
}
orders_df = pd.DataFrame(orders_data)

Now, let's perform the inner join and select specific columns using Python pandas:

result_df = pd.merge(customers_df, orders_df, on='CustomerID', how='inner')
selected_columns = ['CustomerName', 'ProductName', 'Quantity']
selected_df = result_df[selected_columns]

In the above code, we merge the "customers_df" and "orders_df" dataframes on the "CustomerID" column using the pd.merge() function. Then, we select the desired columns by indexing the resulting dataframe with the list of column names and store it in the "selected_df" variable.

Now, you have the result with the selected columns in the "selected_df" dataframe.

I hope this helps! Let me know if you have any further questions.