100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Summary Python Data Operations 1: Data frames $7.63   Add to cart

Summary

Summary Python Data Operations 1: Data frames

 13 views  0 purchase
  • Course
  • Institution
  • Book

Notes of Pandas data operations covered in the Principles of Programming course, part of the Computer Science and AI bachelor degree. The notes are initially written in Jupyter Notebook. They contain practical examples of data operations in python and images to explain the structures and processes....

[Show more]

Preview 4 out of 117  pages

  • No
  • Data wrangling
  • December 9, 2022
  • 117
  • 2022/2023
  • Summary
avatar-seller
2022-05-15 22:28 S1 _solved


In [1]:
import pandas as pd
import numpy as np




Pandas -
What is a Dataframe?
DataFrame is a data type provided by the library pandas
In python is the most relevant data type to work with tables and data
List are also important to work with data , as obviously you already know, but we are
going to focus in df (dataframes)
Imagine dataframe as a table created by rows and colummns:
Each row and column is an object type pandas.Series
An object type pandas.Series is a vector (list). In each element contains a label
Create a DataFrame
The main ways to do it:
Using data manually
Lists of lists
Nested dictionaries
Reading the information from .csv file
Using the function pd.read_csv() path of the file is mandatory.
In [1]:
data_lst = [
['A3', 0, -1, 0, 'si'],
['B1', 1, None, 0, 'no'],
['B3', 4, None, 0, 'no'],
['B3', 5, 1, 0, 'si'],
['A1', 4, 0, None, None],
['A3', 1, 2, 1, 'si'],
['C2', 4, 1, 1, 'no']
]

data_lst

[['A3', 0, -1, 0, 'si'],
Out[1]:
['B1', 1, None, 0, 'no'],
['B3', 4, None, 0, 'no'],
['B3', 5, 1, 0, 'si'],
['A1', 4, 0, None, None],
['A3', 1, 2, 1, 'si'],
['C2', 4, 1, 1, 'no']]

In [12]:
col0 = []
for row in data_lst:
col0.append(row[0])

col0

['A3', 'B1', 'B3', 'B3', 'A1', 'A3', 'C2']
Out[12]:
file:///Users/bestricemossberg/Downloads/S1 _solved.html 1/14

,2022-05-15 22:28 S1 _solved




Test
In [7]:
test_df = pd.DataFrame(
data_lst
)
test_df


Out[7]: 0 1 2 3 4
0 A3 0 -1.0 0.0 si
1 B1 1 NaN 0.0 no
2 B3 4 NaN 0.0 no
3 B3 5 1.0 0.0 si
4 A1 4 0.0 NaN None
5 A3 1 2.0 1.0 si
6 C2 4 1.0 1.0 no

In [10]:
test_df = pd.DataFrame(
data_lst,
columns=['A', 'B', 'C', 'D', 'E'],
index=[f'row{i}' for i in range(1, 8)]
)
test_df


Out[10]: A B C D E
row1 A3 0 -1.0 0.0 si
row2 B1 1 NaN 0.0 no
row3 B3 4 NaN 0.0 no
row4 B3 5 1.0 0.0 si
row5 A1 4 0.0 NaN None
row6 A3 1 2.0 1.0 si
row7 C2 4 1.0 1.0 no

DataFrame structure
In [9]:
# .index como .columns are iterable objects
print('ROWS:')
for index in test_df.index:
print(index)

print()
print('COLUMNS:')
for col in test_df.columns:
print(col)

ROWS:
row1

file:///Users/bestricemossberg/Downloads/S1 _solved.html 2/14

,2022-05-15 22:28 S1 _solved
row2
row3
row4
row5
row6
row7

COLUMNS:
A
B
C
D
E
DataFrames can be understood as a matrix of values with an index for rows and an indes for




columns.
Any bi-dimensional subset will be consider as a DataFrame and any one-dimensional will be
consider as a Series data type




Although the DataFrame has explicit indices (labels) for rows and columns, both DataFrame
and Series still have a positional ("hidden") index.


file:///Users/bestricemossberg/Downloads/S1 _solved.html 3/14

, 2022-05-15 22:28 S1 _solved




In [29]:
# first 3 lines
test_df.head(3)


Out[29]: A B C D E
0 A3 0 -1.0 0.0 si
1 B1 1 NaN 0.0 no
2 B3 4 NaN 0.0 no

In [30]:
# last 2 lines
test_df.tail(2)


Out[30]: A B C D E
5 A3 1 2.0 1.0 si
6 C2 4 1.0 1.0 no

Select values
.iloc vs .loc
One of the advantages of the DataFrame is that it allows us to access the elements (rows,
columns, cells...) in two ways:
1. through the position (numerical index), e.g. first row, eighth column, etc...
2. through the labels, e.g. the column named "name", the file with index "FHX129M", etc...
In order to be clear with which type of index we want to use, there are two methods: .iloc
and .loc .
1. .iloc is used to access elements via (numeric) positions.
file:///Users/bestricemossberg/Downloads/S1 _solved.html 4/14

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller beatricemossberg. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $7.63. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

75759 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
$7.63
  • (0)
  Add to cart