66 :suppress:
77
88 import numpy as np
9- import random
10- np.random.seed(123456 )
11- from pandas import *
12- options.display.max_rows= 15
139 import pandas as pd
14- randn = np.random.randn
15- randint = np.random.randint
10+ np.random.seed(123456 )
1611 np.set_printoptions(precision = 4 , suppress = True )
17- from pandas.compat import range , zip
12+ pd.options.display.max_rows = 15
1813
1914******************************
2015MultiIndex / Advanced Indexing
@@ -80,10 +75,10 @@ demo different ways to initialize MultiIndexes.
8075 tuples = list (zip (* arrays))
8176 tuples
8277
83- index = MultiIndex.from_tuples(tuples, names = [' first' , ' second' ])
78+ index = pd. MultiIndex.from_tuples(tuples, names = [' first' , ' second' ])
8479 index
8580
86- s = Series(randn(8 ), index = index)
81+ s = pd. Series(np.random. randn(8 ), index = index)
8782 s
8883
8984 When you want every pairing of the elements in two iterables, it can be easier
@@ -92,7 +87,7 @@ to use the ``MultiIndex.from_product`` function:
9287.. ipython :: python
9388
9489 iterables = [[' bar' , ' baz' , ' foo' , ' qux' ], [' one' , ' two' ]]
95- MultiIndex.from_product(iterables, names = [' first' , ' second' ])
90+ pd. MultiIndex.from_product(iterables, names = [' first' , ' second' ])
9691
9792 As a convenience, you can pass a list of arrays directly into Series or
9893DataFrame to construct a MultiIndex automatically:
@@ -101,9 +96,9 @@ DataFrame to construct a MultiIndex automatically:
10196
10297 arrays = [np.array([' bar' , ' bar' , ' baz' , ' baz' , ' foo' , ' foo' , ' qux' , ' qux' ]),
10398 np.array([' one' , ' two' , ' one' , ' two' , ' one' , ' two' , ' one' , ' two' ])]
104- s = Series(randn(8 ), index = arrays)
99+ s = pd. Series(np.random. randn(8 ), index = arrays)
105100 s
106- df = DataFrame(randn(8 , 4 ), index = arrays)
101+ df = pd. DataFrame(np.random. randn(8 , 4 ), index = arrays)
107102 df
108103
109104 All of the ``MultiIndex `` constructors accept a ``names `` argument which stores
@@ -119,9 +114,9 @@ of the index is up to you:
119114
120115.. ipython :: python
121116
122- df = DataFrame(randn(3 , 8 ), index = [' A' , ' B' , ' C' ], columns = index)
117+ df = pd. DataFrame(np.random. randn(3 , 8 ), index = [' A' , ' B' , ' C' ], columns = index)
123118 df
124- DataFrame(randn(6 , 6 ), index = index[:6 ], columns = index[:6 ])
119+ pd. DataFrame(np.random. randn(6 , 6 ), index = index[:6 ], columns = index[:6 ])
125120
126121 We've "sparsified" the higher levels of the indexes to make the console output a
127122bit easier on the eyes.
@@ -131,7 +126,7 @@ tuples as atomic labels on an axis:
131126
132127.. ipython :: python
133128
134- Series(randn(8 ), index = tuples)
129+ pd. Series(np.random. randn(8 ), index = tuples)
135130
136131 The reason that the ``MultiIndex `` matters is that it can allow you to do
137132grouping, selection, and reshaping operations as we will describe below and in
@@ -282,16 +277,16 @@ As usual, **both sides** of the slicers are included as this is label indexing.
282277 def mklbl (prefix ,n ):
283278 return [" %s%s " % (prefix,i) for i in range (n)]
284279
285- miindex = MultiIndex.from_product([mklbl(' A' ,4 ),
286- mklbl(' B' ,2 ),
287- mklbl(' C' ,4 ),
288- mklbl(' D' ,2 )])
289- micolumns = MultiIndex.from_tuples([(' a' ,' foo' ),(' a' ,' bar' ),
290- (' b' ,' foo' ),(' b' ,' bah' )],
291- names = [' lvl0' , ' lvl1' ])
292- dfmi = DataFrame(np.arange(len (miindex)* len (micolumns)).reshape((len (miindex),len (micolumns))),
293- index = miindex,
294- columns = micolumns).sortlevel().sortlevel(axis = 1 )
280+ miindex = pd. MultiIndex.from_product([mklbl(' A' ,4 ),
281+ mklbl(' B' ,2 ),
282+ mklbl(' C' ,4 ),
283+ mklbl(' D' ,2 )])
284+ micolumns = pd. MultiIndex.from_tuples([(' a' ,' foo' ),(' a' ,' bar' ),
285+ (' b' ,' foo' ),(' b' ,' bah' )],
286+ names = [' lvl0' , ' lvl1' ])
287+ dfmi = pd. DataFrame(np.arange(len (miindex)* len (micolumns)).reshape((len (miindex),len (micolumns))),
288+ index = miindex,
289+ columns = micolumns).sortlevel().sortlevel(axis = 1 )
295290 dfmi
296291
297292 Basic multi-index slicing using slices, lists, and labels.
@@ -418,9 +413,9 @@ instance:
418413
419414.. ipython :: python
420415
421- midx = MultiIndex(levels = [[' zero' , ' one' ], [' x' ,' y' ]],
422- labels = [[1 ,1 ,0 ,0 ],[1 ,0 ,1 ,0 ]])
423- df = DataFrame(randn(4 ,2 ), index = midx)
416+ midx = pd. MultiIndex(levels = [[' zero' , ' one' ], [' x' ,' y' ]],
417+ labels = [[1 ,1 ,0 ,0 ],[1 ,0 ,1 ,0 ]])
418+ df = pd. DataFrame(np.random. randn(4 ,2 ), index = midx)
424419 df
425420 df2 = df.mean(level = 0 )
426421 df2
@@ -471,7 +466,7 @@ labels will be sorted lexicographically!
471466.. ipython :: python
472467
473468 import random; random.shuffle(tuples)
474- s = Series(randn(8 ), index = MultiIndex.from_tuples(tuples))
469+ s = pd. Series(np.random. randn(8 ), index = pd. MultiIndex.from_tuples(tuples))
475470 s
476471 s.sortlevel(0 )
477472 s.sortlevel(1 )
@@ -509,13 +504,13 @@ an exception. Here is a concrete example to illustrate this:
509504.. ipython :: python
510505
511506 tuples = [(' a' , ' a' ), (' a' , ' b' ), (' b' , ' a' ), (' b' , ' b' )]
512- idx = MultiIndex.from_tuples(tuples)
507+ idx = pd. MultiIndex.from_tuples(tuples)
513508 idx.lexsort_depth
514509
515510 reordered = idx[[1 , 0 , 3 , 2 ]]
516511 reordered.lexsort_depth
517512
518- s = Series(randn(4 ), index = reordered)
513+ s = pd. Series(np.random. randn(4 ), index = reordered)
519514 s.ix[' a' :' a' ]
520515
521516 However:
@@ -540,15 +535,15 @@ index positions. ``take`` will also accept negative integers as relative positio
540535
541536.. ipython :: python
542537
543- index = Index(randint(0 , 1000 , 10 ))
538+ index = pd. Index(np.random. randint(0 , 1000 , 10 ))
544539 index
545540
546541 positions = [0 , 9 , 3 ]
547542
548543 index[positions]
549544 index.take(positions)
550545
551- ser = Series(randn(10 ))
546+ ser = pd. Series(np.random. randn(10 ))
552547
553548 ser.iloc[positions]
554549 ser.take(positions)
@@ -558,7 +553,7 @@ row or column positions.
558553
559554.. ipython :: python
560555
561- frm = DataFrame(randn(5 , 3 ))
556+ frm = pd. DataFrame(np.random. randn(5 , 3 ))
562557
563558 frm.take([1 , 4 , 3 ])
564559
@@ -569,11 +564,11 @@ intended to work on boolean indices and may return unexpected results.
569564
570565.. ipython :: python
571566
572- arr = randn(10 )
567+ arr = np.random. randn(10 )
573568 arr.take([False , False , True , True ])
574569 arr[[0 , 1 ]]
575570
576- ser = Series(randn(10 ))
571+ ser = pd. Series(np.random. randn(10 ))
577572 ser.take([False , False , True , True ])
578573 ser.ix[[0 , 1 ]]
579574
@@ -583,14 +578,14 @@ faster than fancy indexing.
583578
584579.. ipython ::
585580
586- arr = randn(10000, 5)
581+ arr = np.random. randn(10000, 5)
587582 indexer = np.arange(10000)
588583 random.shuffle(indexer)
589584
590585 timeit arr[indexer]
591586 timeit arr.take(indexer, axis=0)
592587
593- ser = Series(arr[:, 0])
588+ ser = pd. Series(arr[:, 0])
594589 timeit ser.ix[indexer]
595590 timeit ser.take(indexer)
596591
@@ -608,10 +603,9 @@ setting the index of a ``DataFrame/Series`` with a ``category`` dtype would conv
608603
609604.. ipython :: python
610605
611- df = DataFrame({' A' : np.arange(6 ),
612- ' B' : Series(list (' aabbca' )).astype(' category' ,
613- categories = list (' cab' ))
614- })
606+ df = pd.DataFrame({' A' : np.arange(6 ),
607+ ' B' : list (' aabbca' )})
608+ df[' B' ] = df[' B' ].astype(' category' , categories = list (' cab' ))
615609 df
616610 df.dtypes
617611 df.B.cat.categories
@@ -669,18 +663,18 @@ values NOT in the categories, similarly to how you can reindex ANY pandas index.
669663
670664 .. code-block :: python
671665
672- In [10 ]: df3 = DataFrame({' A' : np.arange(6 ),
673- ' B' : Series(list (' aabbca' )).astype(' category' ,
674- categories = list ( ' abc ' ))
675- }) .set_index(' B' )
666+ In [9 ]: df3 = pd. DataFrame({' A' : np.arange(6 ),
667+ ' B' : pd. Series(list (' aabbca' )).astype(' category' )})
668+
669+ In [ 11 ]: df3 = df3 .set_index(' B' )
676670
677671 In [11 ]: df3.index
678672 Out[11 ]:
679673 CategoricalIndex([u ' a' , u ' a' , u ' b' , u ' b' , u ' c' , u ' a' ],
680674 categories = [u ' a' , u ' b' , u ' c' ],
681675 ordered = False )
682676
683- In [12 ]: pd.concat([df2,df3]
677+ In [12 ]: pd.concat([df2, df3]
684678 TypeError : categories must match existing categories when appending
685679
686680.. _indexing.float64index:
@@ -705,9 +699,9 @@ same.
705699
706700.. ipython:: python
707701
708- indexf = Index([1.5 , 2 , 3 , 4.5 , 5 ])
702+ indexf = pd. Index([1.5 , 2 , 3 , 4.5 , 5 ])
709703 indexf
710- sf = Series(range (5 ),index = indexf)
704+ sf = pd. Series(range (5 ), index = indexf)
711705 sf
712706
713707Scalar selection for `` [],.ix,.loc`` will always be label based. An integer will match an equal float index (e.g. `` 3 `` is equivalent to `` 3.0 `` )
@@ -749,17 +743,17 @@ In non-float indexes, slicing using floats will raise a ``TypeError``
749743
750744.. code- block:: python
751745
752- In [1 ]: Series(range (5 ))[3.5 ]
746+ In [1 ]: pd. Series(range (5 ))[3.5 ]
753747 TypeError : the label [3.5 ] is not a proper indexer for this index type (Int64Index)
754748
755- In [1 ]: Series(range (5 ))[3.5 :4.5 ]
749+ In [1 ]: pd. Series(range (5 ))[3.5 :4.5 ]
756750 TypeError : the slice start [3.5 ] is not a proper indexer for this index type (Int64Index)
757751
758752Using a scalar float indexer will be deprecated in a future version, but is allowed for now.
759753
760754.. code- block:: python
761755
762- In [3 ]: Series(range (5 ))[3.0 ]
756+ In [3 ]: pd. Series(range (5 ))[3.0 ]
763757 Out[3 ]: 3
764758
765759Here is a typical use- case for using this type of indexing. Imagine that you have a somewhat
@@ -768,12 +762,12 @@ example be millisecond offsets.
768762
769763.. ipython:: python
770764
771- dfir = concat([DataFrame(randn(5 ,2 ),
772- index = np.arange(5 ) * 250.0 ,
773- columns = list (' AB' )),
774- DataFrame(randn(6 ,2 ),
775- index = np.arange(4 ,10 ) * 250.1 ,
776- columns = list (' AB' ))])
765+ dfir = pd. concat([pd. DataFrame(np.random. randn(5 ,2 ),
766+ index = np.arange(5 ) * 250.0 ,
767+ columns = list (' AB' )),
768+ pd. DataFrame(np.random. randn(6 ,2 ),
769+ index = np.arange(4 ,10 ) * 250.1 ,
770+ columns = list (' AB' ))])
777771 dfir
778772
779773Selection operations then will always work on a value basis, for all selection operators.
0 commit comments