Python Sequences

Here python list, tulple basic operations are discussed.

List
Tuples
- Unpacking
- Named Tuples
Slicing
dict comprehensions
Hack for JSON

There are two types of sequences.

Container : list, tuple and collections.deque. Hold the references to the objects.
Flat : str, bytes and array.array

Flat sequences are compact but limited to holding primitives such as bytes, integers and floats.

The most fundamental mutable container is list. The list comprehensions(listcomps) and generator expressions(Genexps) can be used because of this mutability.

The built-n sequence types are one dimensional.

List

a = 'Ojitha'
[l for l in a]

['O', 'j', 'i', 't', 'h', 'a']

Some use of list API:

Concatenation supported by both tuple and list

[1,2] + [2,3]

[1, 2, 2, 3]

but if you concatenate list and tuple

[1.2] + (2,3)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/Users/ojitha/workspace/pywork/spark/PySparkExamples/code/python_data_structures.ipynb Cell 9' in <module>
----> <a href='vscode-notebook-cell:/Users/ojitha/workspace/pywork/spark/PySparkExamples/code/python_data_structures.ipynb#ch0000088?line=0'>1</a> [1.2] + (2,3)

TypeError: can only concatenate list (not "tuple") to list

To append

b =[1,2,3]
b.append(4)
b

[1, 2, 3, 4]

clear elements

b.clear()
b

[]

Return True if a element is in the list

b =[1,2,3,2, 2]
b.__contains__(1)

True

Count the number of times the element is occurred in the list

b.count(2)

delete it from the list position

b.__delitem__(2)
b

[1, 2, 2, 2]

Append from iterable

b.extend([i for i in range(5,10)])
b

[1, 2, 2, 2, 5, 6, 7, 8, 9]

Get item at index

b.__getitem__(2)

Find the first position of the occurrence.

b.index(2)

Find the length of the list.

len(b)
b.__len__()

Insert element before the position

b.insert(5,4)
b

[1, 2, 2, 2, 5, 4, 6, 7, 8, 9]

Remove the last element from the list

b.pop()

Remove the element from the specific position

b.pop(3)

Remove the first occurrence of the element.

b.remove(2)
b

[1, 2, 5, 4, 6, 7, 8]

Reverse the order of the list

b.reverse()
b

[8, 7, 6, 4, 5, 2, 1]

Sort the list

#b.sort(reverse=True)
b

[8, 7, 6, 4, 5, 2, 1]

list comprehensions

Listcomps do everything filter and map do

[i for i in range(10) if i > 5 ]

[6, 7, 8, 9]

list(filter(lambda i: i > 5,range(10)))

[6, 7, 8, 9]

[i+10 for i in range(10)]

[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

list(map(lambda i:i+10, range(10)))

[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

Listcomps `RxS`

Listcomp can produce a cartesian product of two or more iterables

[(n1, n2, n3) for n1 in (0,1) for n2 in (0,1) for n3 in (0,1)]

[(0, 0, 0),
 (0, 0, 1),
 (0, 1, 0),
 (0, 1, 1),
 (1, 0, 0),
 (1, 0, 1),
 (1, 1, 0),
 (1, 1, 1)]

Generator Expressions

Genexps save memory because it yield items one by one instead of creating a complete list and feed at once as Listcomp do.

tuple((n1, n2, n3) for n1 in (0,1) for n2 in (0,1) for n3 in (0,1))

((0, 0, 0),
 (0, 0, 1),
 (0, 1, 0),
 (0, 1, 1),
 (1, 0, 0),
 (1, 0, 1),
 (1, 1, 0),
 (1, 1, 1))

Tuples

Tuples are immutable lists as well as records with values and their positions. Tuple uses less memory compared to the same length of list. For the API operations, refer to the Table 2-1 in the Comparing Tuple and List Methods of the book 2. An Array of Sequences | Fluent Python, 2nd Edition.

state, city, postCode = ('NSW', 'Sydney', 2000)
postCode # position = 3

example, API use to get the element at the position

t = ('a','b','c')
t[2]

'c'

Find the index of an element

t.index('b')

repeated concatenation (the list also support)

t * 2

('a', 'b', 'c', 'a', 'b', 'c')

concatenation support both list and tuple

t + (1,2)

('a', 'b', 'c', 1, 2)

t+[1,2]

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/Users/ojitha/workspace/pywork/spark/PySparkExamples/code/python_data_structures.ipynb Cell 56' in <module>
----> <a href='vscode-notebook-cell:/Users/ojitha/workspace/pywork/spark/PySparkExamples/code/python_data_structures.ipynb#ch0000083?line=0'>1</a> t+[1,2]

TypeError: can only concatenate tuple (not "list") to tuple

The PEP 3132 -- Extended Iterable Unpacking | Python.org is well known.

a, *b, c = range(5,10)
a, c

(5, 9)

Swapping the variables without temporary variable:

a, c = c, a
a, c

(9, 5)

If you use split, the return is a list

"ojitha hewa".split(" ")

['ojitha', 'hewa']

but you can convert to tuple

tuple("ojitha hewa".split(" "))

('ojitha', 'hewa')

This can be used in the string format as follows:

"my first '%s' and last '%s'" % tuple("ojitha hewa".split(" "))

"my first 'ojitha' and last 'hewa'"

Tuples hold the object references. Therefore, reference can be mutable such as a list. That means the tuple which has mutable references can not be hash.

immutable_t = (10,(1,3))
hash(immutable_t)

3408708537663480494

muttable_t = (10,[1,3])
hash(muttable_t)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/Users/ojitha/workspace/pywork/spark/PySparkExamples/code/python_data_structures.ipynb Cell 53' in <module>
      <a href='vscode-notebook-cell:/Users/ojitha/workspace/pywork/spark/PySparkExamples/code/python_data_structures.ipynb#ch0000022?line=0'>1</a> muttable_t = (10,[1,3])
----> <a href='vscode-notebook-cell:/Users/ojitha/workspace/pywork/spark/PySparkExamples/code/python_data_structures.ipynb#ch0000022?line=1'>2</a> hash(muttable_t)

TypeError: unhashable type: 'list'

Unpacking

Let us first see the use of the * when passing parameters.

def f(a,b,c):
    return a, b, c

The simple parameter pass is

f([1,2], 3, 4)

([1, 2], 3, 4)

But if you use *

f(*[1,2], 3, 4)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/Users/ojitha/workspace/pywork/spark/PySparkExamples/code/python_data_structures.ipynb Cell 79' in <module>
----> <a href='vscode-notebook-cell:/Users/ojitha/workspace/pywork/spark/PySparkExamples/code/python_data_structures.ipynb#ch0000100?line=0'>1</a> f(*[1,2], 3, 4)

TypeError: f() takes 3 positional arguments but 4 were given

This is because first argument is unpacked, therefore, above function can call as

f(*[1,2], 3)

(1, 2, 3)

*range(3),10

(0, 1, 2, 10)

Nested unpacking

a, b, (c,d) = 'a', 'b', (1,2)
d

Named Tuples

The namedtuple is a factory method that provide a subclass of tuple with field names and class name, for example

from collections import namedtuple
a = namedtuple('State', 'city postcode')
syd = a('Sydney', ('Sydney',2000))
syd

State(city='Sydney', postcode=('Sydney', 2000))

Slicing

The well-known operation of [n1:n2] on all sequence types (list, tuple or str).

r=[i for i in  range(10)] # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
r[3:8] # 8 - 3 = 5 elemennts          s              e         s - start e - end 

[3, 4, 5, 6, 7]

r[-8:-3] # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
         #  

[2, 3, 4, 5, 6]

As shown above, For example, split sequence to two

'{left} and {right}'.format(left=r[:5], right=r[5:])

'[0, 1, 2, 3, 4] and [5, 6, 7, 8, 9]'

Compare the above with the following

'{left} and {right}'.format(left=r[:-5], right=r[-5:])

'[0, 1, 2, 3, 4] and [5, 6, 7, 8, 9]'

you can use stride or step, for example, 2:

r[:5:2]

[0, 2, 4]

r[5::2]

[5, 7, 9]

r[:5:-2]

[9, 7]

dict comprehensions

First create listcomps

import string
myList = [chr(chNum) for chNum in list(range(ord('a'),ord('z')+1))]
codes =  [(i,x) for i, x in enumerate(myList) if i > 10]

From the above listcomp, create dictcomps:

d = {letter: i+1 for i, letter in codes if i > 20}

use this in a function

def dump(**kwargs):
    return kwargs
dump(a=1,**d)   
# {'a': 1, 'v': 22, 'w': 23, 'x': 24, 'y': 25, 'z': 26}

above is simlar to the {'a':1, **{'v': 22, 'w': 23, 'x': 24, 'y': 25, 'z': 26}}, but notice the first element.

Hack for JSON

To hack to copy and paste jon to python:

true, false, null = True, False, None # hack
fruit = {
     "type": "banana",
     "avg_weight": 123.2,
     "edible_peel": false,
     "species": ["acuminata", "balbisiana", "paradisiaca"],
     "issues": null,
 }