directory
Based on occupation and employer statistics sponsorship information
According to state statistics sponsorship information
Data source:
https://github.com/wesm/pydata-book
>>> fec = pd.read_csv('D:\python\DataAnalysis\data\P00000001-ALL.csv',low_memory = False)
>>> fec
cmte_id cand_id ... form_tp file_num
0 C00410118 P20002978 ... SA17A 736166
1 C00410118 P20002978 ... SA17A 736166
2 C00410118 P20002978 ... SA17A 749073
3 C00410118 P20002978 ... SA17A 749073
4 C00410118 P20002978 ... SA17A 736166
5 C00410118 P20002978 ... SA17A 736166
6 C00410118 P20002978 ... SA17A 736166
7 C00410118 P20002978 ... SA17A 749073
8 C00410118 P20002978 ... SA17A 736166
9 C00410118 P20002978 ... SA17A 736166
10 C00410118 P20002978 ... SA17A 736166
11 C00410118 P20002978 ... SA17A 736166
12 C00410118 P20002978 ... SA17A 736166
13 C00410118 P20002978 ... SA17A 736166
14 C00410118 P20002978 ... SA17A 749073
15 C00410118 P20002978 ... SA17A 749073
16 C00410118 P20002978 ... SA17A 736166
17 C00410118 P20002978 ... SA17A 749073
18 C00410118 P20002978 ... SA17A 736166
19 C00410118 P20002978 ... SA17A 736166
20 C00410118 P20002978 ... SA17A 736166
21 C00410118 P20002978 ... SA17A 736166
22 C00410118 P20002978 ... SA17A 736166
23 C00410118 P20002978 ... SA17A 736166
24 C00410118 P20002978 ... SA17A 749073
25 C00410118 P20002978 ... SA17A 749073
26 C00410118 P20002978 ... SA17A 749073
27 C00410118 P20002978 ... SA17A 749073
28 C00410118 P20002978 ... SA17A 749073
29 C00410118 P20002978 ... SA17A 749073
... ... ... ... ...
1001701 C00500587 P20003281 ... SA17A 772060
1001702 C00500587 P20003281 ... SA17A 772060
1001703 C00500587 P20003281 ... SA17A 772060
1001704 C00500587 P20003281 ... SA17A 772060
1001705 C00500587 P20003281 ... SA17A 772060
1001706 C00500587 P20003281 ... SA17A 772060
1001707 C00500587 P20003281 ... SA17A 772060
1001708 C00500587 P20003281 ... SA17A 772060
1001709 C00500587 P20003281 ... SA17A 772060
1001710 C00500587 P20003281 ... SB28A 779144
1001711 C00500587 P20003281 ... SA17A 751678
1001712 C00500587 P20003281 ... SA17A 751678
1001713 C00500587 P20003281 ... SA17A 751678
1001714 C00500587 P20003281 ... SA17A 751678
1001715 C00500587 P20003281 ... SA17A 751678
1001716 C00500587 P20003281 ... SA17A 751678
1001717 C00500587 P20003281 ... SA17A 772060
1001718 C00500587 P20003281 ... SA17A 772060
1001719 C00500587 P20003281 ... SA17A 772060
1001720 C00500587 P20003281 ... SA17A 772060
1001721 C00500587 P20003281 ... SA17A 772060
1001722 C00500587 P20003281 ... SA17A 772060
1001723 C00500587 P20003281 ... SA17A 772060
1001724 C00500587 P20003281 ... SA17A 751678
1001725 C00500587 P20003281 ... SA17A 751678
1001726 C00500587 P20003281 ... SA17A 751678
1001727 C00500587 P20003281 ... SA17A 751678
1001728 C00500587 P20003281 ... SA17A 751678
1001729 C00500587 P20003281 ... SA17A 751678
1001730 C00500587 P20003281 ... SA17A 751678
[1001731 rows x 16 columns]
>>> fec.ix[123456]
cmte_id C00431445
cand_id P80003338
cand_nm Obama, Barack
contbr_nm ELLMAN, IRA
contbr_city TEMPE
contbr_st AZ
contbr_zip 852816719
contbr_employer ARIZONA STATE UNIVERSITY
contbr_occupation PROFESSOR
contb_receipt_amt 50
contb_receipt_dt 01-DEC-11
receipt_desc NaN
memo_cd NaN
memo_text NaN
form_tp SA17A
file_num 772372
Name: 123456, dtype: object
All candidates list
>>> unique_cands = fec.cand_nm.unique()
>>> unique_cands
array(['Bachmann, Michelle', 'Romney, Mitt', 'Obama, Barack',
"Roemer, Charles E. 'Buddy' III", 'Pawlenty, Timothy',
'Johnson, Gary Earl', 'Paul, Ron', 'Santorum, Rick',
'Cain, Herman', 'Gingrich, Newt', 'McCotter, Thaddeus G',
'Huntsman, Jon', 'Perry, Rick'], dtype=object)
Use the dictionary to explain the relationship between the party
>>> parties = {'Bachmann, Michelle':'Republican','Cain, Herman':'Republican','Gingrich, Newt':'Republican','Huntsman, Jon':'Republican','Johnson, Gary Earl':'Republican','McCotter, Thaddeus G':'Republican','Obama, Barack':'Democrat','Paul, Ron':'Republican','Pawlenty, Timothy':'Republican','Perry, Rick':'Republican',"Roemer, Charles E. 'Buddy' III":'Republican','Romney, Mitt':'Republican','Santorum, Rick':'Republican'}
>>> fec.cand_nm[123456:123461].map(parties)
123456 Democrat
123457 Democrat
123458 Democrat
123459 Democrat
123460 Democrat
Name: cand_nm, dtype: object
Simple statistics
>>> fec['party'] = fec.cand_nm.map(parties)
>>> fec['party'].value_counts()
Democrat 593746
Republican 407985
Name: party, dtype: int64
View sponsor information
>>> (fec.contb_receipt_amt > 0).value_counts()
True 991475
False 10256
Name: contb_receipt_amt, dtype: int64
But the data includes both refund and sponsorship, and the data is simplified below.
>>> fec = fec[fec.contb_receipt_amt > 0]
The following is a data for the two main candidates
>>> fec_mrbo = fec[fec.cand_nm.isin(['Obama, Barack','Romney, Mitt'])]
Soning information according to occupation and employer statistics
Professional sponsorship information statistics is another statistical task that is often studied. For example, lawyers like Democrats, business owners like Republican Party, watch data
>>> fec.contbr_occupation.value_counts()[:10]
RETIRED 233990
INFORMATION REQUESTED 35107
ATTORNEY 34286
HOMEMAKER 29931
PHYSICIAN 23432
INFORMATION REQUESTED PER BEST EFFORTS 21138
ENGINEER 14334
TEACHER 13990
CONSULTANT 13273
PROFESSOR 12555
Name: contbr_occupation, dtype: int64
Integrate the same occupation and allow occupations that have no mapping relationship to pass
>>> occ_mapping = {'INFORMATION REQUESTED PER BEST EFFORTS':'NOT PROVIDED','INFORMATION REQUESTED':'NOT PROVIDED','INFORMATION REQUESTED (BEST EFFORTS)':'NOT PROVIDED','C.E.O':'CEO'}
>>> f = lambda x : occ_mapping.get(x,x)
>>> fec.contbr_occupation = fec.contbr_occupation.map(f)
The same processing of employer information
>>> emp_mapping = {'INFORMATION REQUESTED PER BEST EFFORTS':'NOT PROVIDED','INFORMATION REQUESTED':'NOT PROVIDED','SELF':'SELF-EMPLOYED','SELF EMPLOYED':'SELF-EMPLOYED',}
>>> f = lambda x: emp_mapping.get(x,x)
>>> fec.contbr_employer = fec.contbr_employer.map(f)
According to the party and occupation aggregation data, then filter out the data with a total amount of less than $ 2 million:
>>> by_occupation = fec.pivot_table('contb_receipt_amt',index='contbr_occupation',columns='party',aggfunc='sum')
>>> over_2mm = by_occupation[by_occupation.sum(1) > 2000000]
>>> over_2mm
party Democrat Republican
contbr_occupation
ATTORNEY 11141982.97 7.477194e+06
C.E.O. 1690.00 2.592983e+06
CEO 2074284.79 1.640758e+06
CONSULTANT 2459912.71 2.544725e+06
ENGINEER 951525.55 1.818374e+06
EXECUTIVE 1355161.05 4.138850e+06
HOMEMAKER 4248875.80 1.363428e+07
INVESTOR 884133.00 2.431769e+06
LAWYER 3160478.87 3.912243e+05
MANAGER 762883.22 1.444532e+06
NOT PROVIDED 4866973.96 2.056547e+07
OWNER 1001567.36 2.408287e+06
PHYSICIAN 3735124.94 3.594320e+06
PRESIDENT 1878509.95 4.720924e+06
PROFESSOR 2165071.08 2.967027e+05
REAL ESTATE 528902.09 1.625902e+06
RETIRED 25305116.38 2.356124e+07
SELF-EMPLOYED 672393.40 1.640253e+06
Make a horizontal column diagram
>>> over_2mm.plot(kind='barh')
The following lists with the highest funds for two candidates are listed.
and then aggregate according to occupation and employers
def get_top_amounts(group,key,n=5):
totals = group.groupby(key)['contb_receipt_amt'].sum()
return totals.sort_values(ascending=False)[n:]
grouped = fec_mrbo.groupby('cand_nm')
grouped.apply(get_top_amounts,'contbr_occupation',n=7)
cand_nm contbr_occupation
Obama, Barack PROFESSOR 2165071.08
CEO 2074284.79
PRESIDENT 1878509.95
NOT EMPLOYED 1709188.20
EXECUTIVE 1355161.05
TEACHER 1250969.15
WRITER 1084188.88
OWNER 1001567.36
ENGINEER 951525.55
INVESTOR 884133.00
ARTIST 763125.00
MANAGER 762883.22
SELF-EMPLOYED 672393.40
STUDENT 628099.75
REAL ESTATE 528902.09
CHAIRMAN 496547.00
ARCHITECT 483859.89
DIRECTOR 471741.73
BUSINESS OWNER 449979.30
EDUCATOR 436600.89
PSYCHOLOGIST 427299.92
SOFTWARE ENGINEER 396985.65
PARTNER 395759.50
SALES 392886.91
EXECUTIVE DIRECTOR 348180.94
MANAGING DIRECTOR 329688.25
SOCIAL WORKER 326844.43
VICE PRESIDENT 325647.15
ADMINISTRATOR 323079.26
SCIENTIST 319227.88
...
Romney, Mitt NON-PROFIT VETERANS ORG. CHAIR/ANNUITA 10.00
CHRMN/CEO 10.00
LEAN SIX SIGMA PRACTIONER 10.00
PRESIDENT EMERITUS 10.00
SIGN CONTRACTOR 10.00
JUDGE ADVOCATE ATTORNEY 10.00
SALES SUPPORT 10.00
CONTRACTS SPECIALIST 9.00
TEACHER & FREE-LANCE JOURNALIST 9.00
FOUNDATION CONSULTANT 6.00
ELAYNE WELLS HARMER 6.00
MAIL HANDLER 6.00
SECRETARY/BOOKKEPPER 6.00
TREASURER & DIRECTOR OF FINANCE 6.00
DISTRICT REPRESENTATIVE 5.00
HORTICULTURIST 5.00
FINANCIAL INSTITUTION - CEO 5.00
PLANNING AND OPERATIONS ANALYST 5.00
MD - UROLOGIST 5.00
VILLA NOVA 5.00
SCOTT GREENBAUM 5.00
ENGINEER/RISK EXPERT 5.00
EDUCATION ADMIN 5.00
DIRECTOR REISCHAUER CENTER FOR EAST A 5.00
CHICKEN GRADER 5.00
IFC CONTRACTING SOLUTIONS 3.00
INDEPENDENT PROFESSIONAL 3.00
AFFORDABLE REAL ESTATE DEVELOPER 3.00
REMODELER & SEMI RETIRED 3.00
3RD GENERATION FAMILY BUSINESS OWNER 3.00
Name: contb_receipt_amt, Length: 35973, dtype: float64
>>> grouped.apply(get_top_amounts,'contbr_employer',n=10)
cand_nm contbr_employer
Obama, Barack DLA PIPER 148235.00
HARVARD UNIVERSITY 131368.94
IBM 128490.93
GOOGLE 125302.88
MICROSOFT CORPORATION 108849.00
KAISER PERMANENTE 104949.95
JONES DAY 103712.50
STANFORD UNIVERSITY 101630.75
COLUMBIA UNIVERSITY 96325.12
UNIVERSITY OF CHICAGO 88575.00
AT&T 88132.12
US GOVERNMENT 87689.00
MORGAN & MORGAN 87250.00
VERIZON 85318.30
UNIVERSITY OF MICHIGAN 84856.33
DISABLED 78417.87
UCLA 78092.50
ARNOLD & PORTER LLP 76330.00
OBAMA FOR AMERICA 72028.89
UNIVERSITY OF WASHINGTON 70445.86
NORTHWESTERN UNIVERSITY 69489.05
DEPARTMENT OF DEFENSE 67253.40
COMCAST 65158.00
US ARMY 64768.91
FEDERAL GOVERNMENT 64590.26
WELLS FARGO 62749.60
UNIVERSITY OF CALIFORNIA 62432.00
SKADDEN ARPS 61904.00
DEBEVOISE & PLIMPTON LLP 61268.00
MAYER BROWN LLP 60315.40
...
Romney, Mitt U.S. POST OFFICE 6.00
ENERGY ALLOYS 6.00
VILLA NOVA FINANCING GROUP LLC 5.00
LEAVITT INSURANCE AGENCY 5.00
SAIS/JOHNS HOPKINS UNIVERSITY 5.00
EASTHAM CAPITAL 5.00
PAULA HAWKINS & ASSOCIATES 5.00
APPLIANCE INSTALLATIONS INC. 5.00
DIRECT LENDERS LLC 5.00
PEACE FROGS INC. 5.00
INTERSTELLAR HOLDINGS LLC 5.00
GOLDIE'S SALON 5.00
PICKET FENCES INC. 5.00
R. A. RAUCH & ASSOCIATES 5.00
PLUM HEALTHCARE 5.00
GREGORY GALLIVAN 5.00
LEGACY SCHOOL 5.00
AA FLIPPEN ASSOCIATES 5.00
PACIFIC BIOSCIENCES 5.00
CA STATE SENATE 5.00
SCOTT GREENBAUM 5.00
RST GLOBAL LICENSING LLC 5.00
MORGAN STANLEY SMITH BARNEY LLC 5.00
LOUGH INVESTMENT ADVISORY LLC 4.00
HONOLD COMMUNICTAIONS 3.00
WILL MERRIFIELD 3.00
UN 3.00
UPTOWN CHEAPSKATE 3.00
INDEPENDENT PROFESSIONAL 3.00
WATERWORKS INDUSRTIES 3.00
Name: contb_receipt_amt, Length: 95888, dtype: float64
Tour of capital contribution
>>> bins = np.array([0,1,10,100,1000,10000,100000,1000000,10000000])
>>> labels = pd.cut(fec_mrbo.contb_receipt_amt,bins)
>>> labels
411 (10, 100]
412 (100, 1000]
413 (100, 1000]
414 (10, 100]
415 (10, 100]
416 (10, 100]
417 (100, 1000]
418 (10, 100]
419 (100, 1000]
420 (10, 100]
421 (10, 100]
422 (100, 1000]
423 (100, 1000]
424 (100, 1000]
425 (100, 1000]
426 (100, 1000]
427 (1000, 10000]
428 (100, 1000]
429 (100, 1000]
430 (10, 100]
431 (1000, 10000]
432 (100, 1000]
433 (100, 1000]
434 (100, 1000]
435 (100, 1000]
436 (100, 1000]
437 (10, 100]
438 (100, 1000]
439 (100, 1000]
440 (10, 100]
...
701356 (10, 100]
701357 (1, 10]
701358 (10, 100]
701359 (10, 100]
701360 (10, 100]
701361 (10, 100]
701362 (100, 1000]
701363 (10, 100]
701364 (10, 100]
701365 (10, 100]
701366 (10, 100]
701367 (10, 100]
701368 (100, 1000]
701369 (10, 100]
701370 (10, 100]
701371 (10, 100]
701372 (10, 100]
701373 (10, 100]
701374 (10, 100]
701375 (10, 100]
701376 (1000, 10000]
701377 (10, 100]
701378 (10, 100]
701379 (100, 1000]
701380 (1000, 10000]
701381 (10, 100]
701382 (100, 1000]
701383 (1, 10]
701384 (10, 100]
701385 (100, 1000]
Name: contb_receipt_amt, Length: 694282, dtype: category
Categories (8, interval[int64]): [(0, 1] < (1, 10] < (10, 100] < (100, 1000] < (1000, 10000] <
(10000, 100000] < (100000, 1000000] < (1000000, 10000000]]
and then group the data based on the name of the candidate and the facade label:
In small sponsorship, there are more Obama
>>> grouped = fec_mrbo.groupby(['cand_nm',labels])
>>> grouped.size().unstack(0)
cand_nm Obama, Barack Romney, Mitt
contb_receipt_amt
(0, 1] 493.0 77.0
(1, 10] 40070.0 3681.0
(10, 100] 372280.0 31853.0
(100, 1000] 153991.0 43357.0
(1000, 10000] 22284.0 26186.0
(10000, 100000] 2.0 1.0
(100000, 1000000] 3.0 NaN
(1000000, 10000000] 4.0 NaN
The sum of the capital contribution and the specifications within the surface of the face.
>>> bucket_sums = grouped.contb_receipt_amt.sum().unstack(0)
>>> bucket_sums
cand_nm Obama, Barack Romney, Mitt
contb_receipt_amt
(0, 1] 318.24 77.00
(1, 10] 337267.62 29819.66
(10, 100] 20288981.41 1987783.76
(100, 1000] 54798531.46 22363381.69
(1000, 10000] 51753705.67 63942145.42
(10000, 100000] 59100.00 12700.00
(100000, 1000000] 1490683.08 NaN
(1000000, 10000000] 7148839.76 NaN
sponsorship ratio
>>> normed_sums = bucket_sums.div(bucket_sums.sum(axis=1),axis=0)
>>> normed_sums
cand_nm Obama, Barack Romney, Mitt
contb_receipt_amt
(0, 1] 0.805182 0.194818
(1, 10] 0.918767 0.081233
(10, 100] 0.910769 0.089231
(100, 1000] 0.710176 0.289824
(1000, 10000] 0.447326 0.552674
(10000, 100000] 0.823120 0.176880
(100000, 1000000] 1.000000 NaN
(1000000, 10000000] 1.000000 NaN
normed_sums[:-2].plot(kind='barh',stacked=True)
According to state statistics sponsorship information
>>> grouped = fec_mrbo.groupby(['cand_nm','contbr_st'])
>>> totals = grouped.contb_receipt_amt.sum().unstack(0).fillna(0)
>>> totals = totals[totals.sum(1)>100000]
>>> totals[:10]
cand_nm Obama, Barack Romney, Mitt
contbr_st
AK 281840.15 86204.24
AL 543123.48 527303.51
AR 359247.28 105556.00
AZ 1506476.98 1888436.23
CA 23824984.24 11237636.60
CO 2132429.49 1506714.12
CT 2068291.26 3499475.45
DC 4373538.80 1025137.50
DE 336669.14 82712.00
FL 7318178.58 8338458.81
Except for the total sponsorship amount to each bank, the total sponsorship ratio of various candidates in each state will be obtained:
>>> percent = totals.div(totals.sum(1),axis=0)
>>> percent[:10]
cand_nm Obama, Barack Romney, Mitt
contbr_st
AK 0.765778 0.234222
AL 0.507390 0.492610
AR 0.772902 0.227098
AZ 0.443745 0.556255
CA 0.679498 0.320502
CO 0.585970 0.414030
CT 0.371476 0.628524
DC 0.810113 0.189887
DE 0.802776 0.197224
FL 0.467417 0.532583