SpringBoot integrated Druid

2023-01-23   ES  

directory

Data source:

Based on occupation and employer statistics sponsorship information

Group of capital contribution

According to state statistics sponsorship information


Data source:

https://github.com/wesm/pydata-book

>>> fec = pd.read_csv('D:\python\DataAnalysis\data\P00000001-ALL.csv',low_memory = False)
>>> fec
           cmte_id    cand_id   ...    form_tp file_num
0        C00410118  P20002978   ...      SA17A   736166
1        C00410118  P20002978   ...      SA17A   736166
2        C00410118  P20002978   ...      SA17A   749073
3        C00410118  P20002978   ...      SA17A   749073
4        C00410118  P20002978   ...      SA17A   736166
5        C00410118  P20002978   ...      SA17A   736166
6        C00410118  P20002978   ...      SA17A   736166
7        C00410118  P20002978   ...      SA17A   749073
8        C00410118  P20002978   ...      SA17A   736166
9        C00410118  P20002978   ...      SA17A   736166
10       C00410118  P20002978   ...      SA17A   736166
11       C00410118  P20002978   ...      SA17A   736166
12       C00410118  P20002978   ...      SA17A   736166
13       C00410118  P20002978   ...      SA17A   736166
14       C00410118  P20002978   ...      SA17A   749073
15       C00410118  P20002978   ...      SA17A   749073
16       C00410118  P20002978   ...      SA17A   736166
17       C00410118  P20002978   ...      SA17A   749073
18       C00410118  P20002978   ...      SA17A   736166
19       C00410118  P20002978   ...      SA17A   736166
20       C00410118  P20002978   ...      SA17A   736166
21       C00410118  P20002978   ...      SA17A   736166
22       C00410118  P20002978   ...      SA17A   736166
23       C00410118  P20002978   ...      SA17A   736166
24       C00410118  P20002978   ...      SA17A   749073
25       C00410118  P20002978   ...      SA17A   749073
26       C00410118  P20002978   ...      SA17A   749073
27       C00410118  P20002978   ...      SA17A   749073
28       C00410118  P20002978   ...      SA17A   749073
29       C00410118  P20002978   ...      SA17A   749073
            ...        ...   ...        ...      ...
1001701  C00500587  P20003281   ...      SA17A   772060
1001702  C00500587  P20003281   ...      SA17A   772060
1001703  C00500587  P20003281   ...      SA17A   772060
1001704  C00500587  P20003281   ...      SA17A   772060
1001705  C00500587  P20003281   ...      SA17A   772060
1001706  C00500587  P20003281   ...      SA17A   772060
1001707  C00500587  P20003281   ...      SA17A   772060
1001708  C00500587  P20003281   ...      SA17A   772060
1001709  C00500587  P20003281   ...      SA17A   772060
1001710  C00500587  P20003281   ...      SB28A   779144
1001711  C00500587  P20003281   ...      SA17A   751678
1001712  C00500587  P20003281   ...      SA17A   751678
1001713  C00500587  P20003281   ...      SA17A   751678
1001714  C00500587  P20003281   ...      SA17A   751678
1001715  C00500587  P20003281   ...      SA17A   751678
1001716  C00500587  P20003281   ...      SA17A   751678
1001717  C00500587  P20003281   ...      SA17A   772060
1001718  C00500587  P20003281   ...      SA17A   772060
1001719  C00500587  P20003281   ...      SA17A   772060
1001720  C00500587  P20003281   ...      SA17A   772060
1001721  C00500587  P20003281   ...      SA17A   772060
1001722  C00500587  P20003281   ...      SA17A   772060
1001723  C00500587  P20003281   ...      SA17A   772060
1001724  C00500587  P20003281   ...      SA17A   751678
1001725  C00500587  P20003281   ...      SA17A   751678
1001726  C00500587  P20003281   ...      SA17A   751678
1001727  C00500587  P20003281   ...      SA17A   751678
1001728  C00500587  P20003281   ...      SA17A   751678
1001729  C00500587  P20003281   ...      SA17A   751678
1001730  C00500587  P20003281   ...      SA17A   751678

[1001731 rows x 16 columns]

>>> fec.ix[123456]
cmte_id                             C00431445
cand_id                             P80003338
cand_nm                         Obama, Barack
contbr_nm                         ELLMAN, IRA
contbr_city                             TEMPE
contbr_st                                  AZ
contbr_zip                          852816719
contbr_employer      ARIZONA STATE UNIVERSITY
contbr_occupation                   PROFESSOR
contb_receipt_amt                          50
contb_receipt_dt                    01-DEC-11
receipt_desc                              NaN
memo_cd                                   NaN
memo_text                                 NaN
form_tp                                 SA17A
file_num                               772372
Name: 123456, dtype: object

All candidates list

>>> unique_cands = fec.cand_nm.unique()
>>> unique_cands
array(['Bachmann, Michelle', 'Romney, Mitt', 'Obama, Barack',
       "Roemer, Charles E. 'Buddy' III", 'Pawlenty, Timothy',
       'Johnson, Gary Earl', 'Paul, Ron', 'Santorum, Rick',
       'Cain, Herman', 'Gingrich, Newt', 'McCotter, Thaddeus G',
       'Huntsman, Jon', 'Perry, Rick'], dtype=object)

Use the dictionary to explain the relationship between the party

>>> parties = {'Bachmann, Michelle':'Republican','Cain, Herman':'Republican','Gingrich, Newt':'Republican','Huntsman, Jon':'Republican','Johnson, Gary Earl':'Republican','McCotter, Thaddeus G':'Republican','Obama, Barack':'Democrat','Paul, Ron':'Republican','Pawlenty, Timothy':'Republican','Perry, Rick':'Republican',"Roemer, Charles E. 'Buddy' III":'Republican','Romney, Mitt':'Republican','Santorum, Rick':'Republican'}
>>> fec.cand_nm[123456:123461].map(parties)
123456    Democrat
123457    Democrat
123458    Democrat
123459    Democrat
123460    Democrat
Name: cand_nm, dtype: object

Simple statistics

>>> fec['party'] = fec.cand_nm.map(parties)
>>> fec['party'].value_counts()
Democrat      593746
Republican    407985
Name: party, dtype: int64

View sponsor information

>>> (fec.contb_receipt_amt > 0).value_counts()
True     991475
False     10256
Name: contb_receipt_amt, dtype: int64

But the data includes both refund and sponsorship, and the data is simplified below.

>>> fec = fec[fec.contb_receipt_amt > 0]

The following is a data for the two main candidates

>>> fec_mrbo = fec[fec.cand_nm.isin(['Obama, Barack','Romney, Mitt'])]

Soning information according to occupation and employer statistics

Professional sponsorship information statistics is another statistical task that is often studied. For example, lawyers like Democrats, business owners like Republican Party, watch data

>>> fec.contbr_occupation.value_counts()[:10]
RETIRED                                   233990
INFORMATION REQUESTED                      35107
ATTORNEY                                   34286
HOMEMAKER                                  29931
PHYSICIAN                                  23432
INFORMATION REQUESTED PER BEST EFFORTS     21138
ENGINEER                                   14334
TEACHER                                    13990
CONSULTANT                                 13273
PROFESSOR                                  12555
Name: contbr_occupation, dtype: int64

Integrate the same occupation and allow occupations that have no mapping relationship to pass

>>> occ_mapping = {'INFORMATION REQUESTED PER BEST EFFORTS':'NOT PROVIDED','INFORMATION REQUESTED':'NOT PROVIDED','INFORMATION REQUESTED (BEST EFFORTS)':'NOT PROVIDED','C.E.O':'CEO'}
>>> f = lambda x : occ_mapping.get(x,x)
>>> fec.contbr_occupation = fec.contbr_occupation.map(f)

The same processing of employer information

>>> emp_mapping = {'INFORMATION REQUESTED PER BEST EFFORTS':'NOT PROVIDED','INFORMATION REQUESTED':'NOT PROVIDED','SELF':'SELF-EMPLOYED','SELF EMPLOYED':'SELF-EMPLOYED',}
>>> f = lambda x: emp_mapping.get(x,x)
>>> fec.contbr_employer = fec.contbr_employer.map(f)

According to the party and occupation aggregation data, then filter out the data with a total amount of less than $ 2 million:

>>> by_occupation = fec.pivot_table('contb_receipt_amt',index='contbr_occupation',columns='party',aggfunc='sum')
>>> over_2mm = by_occupation[by_occupation.sum(1) > 2000000]
>>> over_2mm
party                 Democrat    Republican
contbr_occupation                           
ATTORNEY           11141982.97  7.477194e+06
C.E.O.                 1690.00  2.592983e+06
CEO                 2074284.79  1.640758e+06
CONSULTANT          2459912.71  2.544725e+06
ENGINEER             951525.55  1.818374e+06
EXECUTIVE           1355161.05  4.138850e+06
HOMEMAKER           4248875.80  1.363428e+07
INVESTOR             884133.00  2.431769e+06
LAWYER              3160478.87  3.912243e+05
MANAGER              762883.22  1.444532e+06
NOT PROVIDED        4866973.96  2.056547e+07
OWNER               1001567.36  2.408287e+06
PHYSICIAN           3735124.94  3.594320e+06
PRESIDENT           1878509.95  4.720924e+06
PROFESSOR           2165071.08  2.967027e+05
REAL ESTATE          528902.09  1.625902e+06
RETIRED            25305116.38  2.356124e+07
SELF-EMPLOYED        672393.40  1.640253e+06

Make a horizontal column diagram

>>> over_2mm.plot(kind='barh')

The following lists with the highest funds for two candidates are listed.

and then aggregate according to occupation and employers

def get_top_amounts(group,key,n=5):
     totals = group.groupby(key)['contb_receipt_amt'].sum()
     return totals.sort_values(ascending=False)[n:]
 grouped = fec_mrbo.groupby('cand_nm')
grouped.apply(get_top_amounts,'contbr_occupation',n=7)
cand_nm        contbr_occupation                     
Obama, Barack  PROFESSOR                                 2165071.08
               CEO                                       2074284.79
               PRESIDENT                                 1878509.95
               NOT EMPLOYED                              1709188.20
               EXECUTIVE                                 1355161.05
               TEACHER                                   1250969.15
               WRITER                                    1084188.88
               OWNER                                     1001567.36
               ENGINEER                                   951525.55
               INVESTOR                                   884133.00
               ARTIST                                     763125.00
               MANAGER                                    762883.22
               SELF-EMPLOYED                              672393.40
               STUDENT                                    628099.75
               REAL ESTATE                                528902.09
               CHAIRMAN                                   496547.00
               ARCHITECT                                  483859.89
               DIRECTOR                                   471741.73
               BUSINESS OWNER                             449979.30
               EDUCATOR                                   436600.89
               PSYCHOLOGIST                               427299.92
               SOFTWARE ENGINEER                          396985.65
               PARTNER                                    395759.50
               SALES                                      392886.91
               EXECUTIVE DIRECTOR                         348180.94
               MANAGING DIRECTOR                          329688.25
               SOCIAL WORKER                              326844.43
               VICE PRESIDENT                             325647.15
               ADMINISTRATOR                              323079.26
               SCIENTIST                                  319227.88
                                                            ...    
Romney, Mitt   NON-PROFIT VETERANS ORG. CHAIR/ANNUITA         10.00
               CHRMN/CEO                                      10.00
               LEAN SIX SIGMA PRACTIONER                      10.00
               PRESIDENT EMERITUS                             10.00
               SIGN CONTRACTOR                                10.00
               JUDGE ADVOCATE ATTORNEY                        10.00
               SALES SUPPORT                                  10.00
               CONTRACTS SPECIALIST                            9.00
               TEACHER & FREE-LANCE JOURNALIST                 9.00
               FOUNDATION CONSULTANT                           6.00
               ELAYNE WELLS HARMER                             6.00
               MAIL HANDLER                                    6.00
               SECRETARY/BOOKKEPPER                            6.00
               TREASURER & DIRECTOR OF FINANCE                 6.00
               DISTRICT REPRESENTATIVE                         5.00
               HORTICULTURIST                                  5.00
               FINANCIAL INSTITUTION - CEO                     5.00
               PLANNING AND OPERATIONS ANALYST                 5.00
               MD - UROLOGIST                                  5.00
               VILLA NOVA                                      5.00
               SCOTT GREENBAUM                                 5.00
               ENGINEER/RISK EXPERT                            5.00
               EDUCATION ADMIN                                 5.00
               DIRECTOR REISCHAUER CENTER FOR EAST A           5.00
               CHICKEN GRADER                                  5.00
               IFC CONTRACTING SOLUTIONS                       3.00
               INDEPENDENT PROFESSIONAL                        3.00
               AFFORDABLE REAL ESTATE DEVELOPER                3.00
               REMODELER & SEMI RETIRED                        3.00
               3RD GENERATION FAMILY BUSINESS OWNER            3.00
Name: contb_receipt_amt, Length: 35973, dtype: float64


>>> grouped.apply(get_top_amounts,'contbr_employer',n=10)
cand_nm        contbr_employer                
Obama, Barack  DLA PIPER                          148235.00
               HARVARD UNIVERSITY                 131368.94
               IBM                                128490.93
               GOOGLE                             125302.88
               MICROSOFT CORPORATION              108849.00
               KAISER PERMANENTE                  104949.95
               JONES DAY                          103712.50
               STANFORD UNIVERSITY                101630.75
               COLUMBIA UNIVERSITY                 96325.12
               UNIVERSITY OF CHICAGO               88575.00
               AT&T                                88132.12
               US GOVERNMENT                       87689.00
               MORGAN & MORGAN                     87250.00
               VERIZON                             85318.30
               UNIVERSITY OF MICHIGAN              84856.33
               DISABLED                            78417.87
               UCLA                                78092.50
               ARNOLD & PORTER LLP                 76330.00
               OBAMA FOR AMERICA                   72028.89
               UNIVERSITY OF WASHINGTON            70445.86
               NORTHWESTERN UNIVERSITY             69489.05
               DEPARTMENT OF DEFENSE               67253.40
               COMCAST                             65158.00
               US ARMY                             64768.91
               FEDERAL GOVERNMENT                  64590.26
               WELLS FARGO                         62749.60
               UNIVERSITY OF CALIFORNIA            62432.00
               SKADDEN ARPS                        61904.00
               DEBEVOISE & PLIMPTON LLP            61268.00
               MAYER BROWN LLP                     60315.40
                                                    ...    
Romney, Mitt   U.S. POST OFFICE                        6.00
               ENERGY ALLOYS                           6.00
               VILLA NOVA FINANCING GROUP LLC          5.00
               LEAVITT INSURANCE AGENCY                5.00
               SAIS/JOHNS HOPKINS UNIVERSITY           5.00
               EASTHAM CAPITAL                         5.00
               PAULA HAWKINS & ASSOCIATES              5.00
               APPLIANCE INSTALLATIONS INC.            5.00
               DIRECT LENDERS LLC                      5.00
               PEACE FROGS INC.                        5.00
               INTERSTELLAR HOLDINGS LLC               5.00
               GOLDIE'S SALON                          5.00
               PICKET FENCES INC.                      5.00
               R. A. RAUCH & ASSOCIATES                5.00
               PLUM HEALTHCARE                         5.00
               GREGORY GALLIVAN                        5.00
               LEGACY SCHOOL                           5.00
               AA FLIPPEN ASSOCIATES                   5.00
               PACIFIC BIOSCIENCES                     5.00
               CA STATE SENATE                         5.00
               SCOTT GREENBAUM                         5.00
               RST GLOBAL LICENSING LLC                5.00
               MORGAN STANLEY SMITH BARNEY LLC         5.00
               LOUGH INVESTMENT ADVISORY LLC           4.00
               HONOLD COMMUNICTAIONS                   3.00
               WILL MERRIFIELD                         3.00
               UN                                      3.00
               UPTOWN CHEAPSKATE                       3.00
               INDEPENDENT PROFESSIONAL                3.00
               WATERWORKS INDUSRTIES                   3.00
Name: contb_receipt_amt, Length: 95888, dtype: float64

Tour of capital contribution

>>> bins = np.array([0,1,10,100,1000,10000,100000,1000000,10000000])
>>> labels = pd.cut(fec_mrbo.contb_receipt_amt,bins)
>>> labels
411           (10, 100]
412         (100, 1000]
413         (100, 1000]
414           (10, 100]
415           (10, 100]
416           (10, 100]
417         (100, 1000]
418           (10, 100]
419         (100, 1000]
420           (10, 100]
421           (10, 100]
422         (100, 1000]
423         (100, 1000]
424         (100, 1000]
425         (100, 1000]
426         (100, 1000]
427       (1000, 10000]
428         (100, 1000]
429         (100, 1000]
430           (10, 100]
431       (1000, 10000]
432         (100, 1000]
433         (100, 1000]
434         (100, 1000]
435         (100, 1000]
436         (100, 1000]
437           (10, 100]
438         (100, 1000]
439         (100, 1000]
440           (10, 100]
              ...      
701356        (10, 100]
701357          (1, 10]
701358        (10, 100]
701359        (10, 100]
701360        (10, 100]
701361        (10, 100]
701362      (100, 1000]
701363        (10, 100]
701364        (10, 100]
701365        (10, 100]
701366        (10, 100]
701367        (10, 100]
701368      (100, 1000]
701369        (10, 100]
701370        (10, 100]
701371        (10, 100]
701372        (10, 100]
701373        (10, 100]
701374        (10, 100]
701375        (10, 100]
701376    (1000, 10000]
701377        (10, 100]
701378        (10, 100]
701379      (100, 1000]
701380    (1000, 10000]
701381        (10, 100]
701382      (100, 1000]
701383          (1, 10]
701384        (10, 100]
701385      (100, 1000]
Name: contb_receipt_amt, Length: 694282, dtype: category
Categories (8, interval[int64]): [(0, 1] < (1, 10] < (10, 100] < (100, 1000] < (1000, 10000] <
                                  (10000, 100000] < (100000, 1000000] < (1000000, 10000000]]

and then group the data based on the name of the candidate and the facade label:

In small sponsorship, there are more Obama

>>> grouped = fec_mrbo.groupby(['cand_nm',labels])
>>> grouped.size().unstack(0)
cand_nm              Obama, Barack  Romney, Mitt
contb_receipt_amt                               
(0, 1]                       493.0          77.0
(1, 10]                    40070.0        3681.0
(10, 100]                 372280.0       31853.0
(100, 1000]               153991.0       43357.0
(1000, 10000]              22284.0       26186.0
(10000, 100000]                2.0           1.0
(100000, 1000000]              3.0           NaN
(1000000, 10000000]            4.0           NaN

The sum of the capital contribution and the specifications within the surface of the face.

>>> bucket_sums = grouped.contb_receipt_amt.sum().unstack(0)
>>> bucket_sums
cand_nm              Obama, Barack  Romney, Mitt
contb_receipt_amt                               
(0, 1]                      318.24         77.00
(1, 10]                  337267.62      29819.66
(10, 100]              20288981.41    1987783.76
(100, 1000]            54798531.46   22363381.69
(1000, 10000]          51753705.67   63942145.42
(10000, 100000]           59100.00      12700.00
(100000, 1000000]       1490683.08           NaN
(1000000, 10000000]     7148839.76           NaN

sponsorship ratio

>>> normed_sums = bucket_sums.div(bucket_sums.sum(axis=1),axis=0)
>>> normed_sums
cand_nm              Obama, Barack  Romney, Mitt
contb_receipt_amt                               
(0, 1]                    0.805182      0.194818
(1, 10]                   0.918767      0.081233
(10, 100]                 0.910769      0.089231
(100, 1000]               0.710176      0.289824
(1000, 10000]             0.447326      0.552674
(10000, 100000]           0.823120      0.176880
(100000, 1000000]         1.000000           NaN
(1000000, 10000000]       1.000000           NaN
normed_sums[:-2].plot(kind='barh',stacked=True)

According to state statistics sponsorship information

>>> grouped = fec_mrbo.groupby(['cand_nm','contbr_st'])
>>> totals = grouped.contb_receipt_amt.sum().unstack(0).fillna(0)
>>> totals = totals[totals.sum(1)>100000]
>>> totals[:10]
cand_nm    Obama, Barack  Romney, Mitt
contbr_st                             
AK             281840.15      86204.24
AL             543123.48     527303.51
AR             359247.28     105556.00
AZ            1506476.98    1888436.23
CA           23824984.24   11237636.60
CO            2132429.49    1506714.12
CT            2068291.26    3499475.45
DC            4373538.80    1025137.50
DE             336669.14      82712.00
FL            7318178.58    8338458.81

Except for the total sponsorship amount to each bank, the total sponsorship ratio of various candidates in each state will be obtained:

>>> percent = totals.div(totals.sum(1),axis=0)
>>> percent[:10]
cand_nm    Obama, Barack  Romney, Mitt
contbr_st                             
AK              0.765778      0.234222
AL              0.507390      0.492610
AR              0.772902      0.227098
AZ              0.443745      0.556255
CA              0.679498      0.320502
CO              0.585970      0.414030
CT              0.371476      0.628524
DC              0.810113      0.189887
DE              0.802776      0.197224
FL              0.467417      0.532583

 

source

Related Posts

Android copy text, intens, URI to shear

【吴恩达】机器学习ex3编程练习

More than 5G and Hongmeng, the latest big move in Huawei, throw AI to calculate the nuclear bomb

lightoj 1245 Harmonic Number (II) (simple theory)

SpringBoot integrated Druid

Random Posts

Android — The effect of the barrage achievement

Install webshell open source fortress machine Gateone

java zip/gzip file compress and decompress Ricky

Solution on the vscode error “The terminal will be reused by the task, and the any key is closed”

ubuntu network card ENP0S3 is changed to ETH0